Exo V1.0.61 Docs: Benchmarking, MacOS, And More

by Editorial Team 48 views
Iklan Headers

Hey there, fellow developers! Get ready for a deep dive into the latest documentation updates for Exo v1.0.61. We've been busy adding new features and refining the documentation to make your experience smoother. Let's break down the key updates, including the exo-bench benchmarking tool, the custom namespace UI in the macOS app, and more. Buckle up, and let's explore these changes together!

Exo-Bench: Supercharge Your Model Optimization 🚀

Exo-bench is your new best friend for benchmarking model prefill and token generation speeds. This tool, introduced in PR #1099, lets you measure how fast your models perform across different configurations. It's a game-changer for optimizing your models and validating any performance improvements you make. The best part? There is now documentation!

What Exo-Bench Does: Exo-bench benchmarks model prefill (pp) and token generation (tg) speed. It gives you hard data on how quickly your models can handle prompts and generate responses, helping you fine-tune performance.

Command-Line Usage and Key Parameters:

To use exo-bench, you'll need to know a few command-line parameters:

  • --model: Specify the model you want to benchmark.
  • --pp: Prompt size hints (comma-separated integers). This lets you test different prompt sizes.
  • --tg: Generation lengths (comma-separated integers). Test how long it takes to generate different numbers of tokens.
  • --max-nodes: Limit placements to N nodes. Control the number of nodes used in your benchmark.
  • --instance-meta: Filter by ring/jaccl/both. This lets you select specific instance types.
  • --sharding: Filter by pipeline/tensor/both. Select your preferred sharding strategy.
  • --repeat: Number of repetitions per configuration. Get more reliable results by running each test multiple times.
  • --warmup: Warmup runs per placement. Prime the system for accurate measurements.
  • --json-out: Output file for results. Save your benchmark results to a JSON file for analysis.

Example Command:

Want to see exo-bench in action? Here's a simple example:

exo-bench --model my-model --pp 32,64,128 --tg 64,128 --repeat 3 --json-out results.json

Important Note: Make sure your nodes are set up with uv run exo before running your benchmarks. This ensures everything is ready to go.

Relevant files/PRs:

  • PR #1099: exo-bench (Benchmark model pp & tg speed)
  • bench/exo_bench.py: the implementation
  • src/exo/master/api.py: the /bench/chat/completions endpoint

macOS App: Custom Namespace UI for Cluster Isolation 💻

Commit 4f6fcd9e brought a cool new feature to the macOS app: a custom namespace UI. This is all about cluster isolation. This feature is a lifesaver for developers who want to run multiple Exo clusters or keep their cluster separate from others. The new documentation helps you understand what's up.

What's the Deal? The EXO_LIBP2P_NAMESPACE setting creates separate network namespaces, isolating your clusters. This prevents them from interfering with each other.

How to Find It: You'll find this setting in the macOS app UI, likely in the Advanced settings or something similar.

Use Cases:

  • Running multiple separate Exo clusters on the same network.
  • Isolating development/testing clusters from production clusters.
  • Preventing accidental cluster joining.

Good to Know: You can also set the EXO_LIBP2P_NAMESPACE environment variable when running from source.

Debugging: This setting is logged on startup, which is handy for debugging.

Relevant files/PRs:

  • Commit 4f6fcd9e: feat(macos-app): add custom namespace UI for cluster isolation
  • app/EXO/EXO/ExoProcessController.swift: implementation of namespace computation
  • src/exo/main.py: logging of EXO_LIBP2P_NAMESPACE on start
  • rust/networking/src/swarm.rs: OVERRIDE_VERSION_ENV_VAR constant

--no-worker CLI Flag: Run Exo Your Way ⚙️

PR #1091 introduced the --no-worker CLI flag, giving you more control over your Exo setup. This is super useful for developers who want to run Exo in different configurations, like a coordinator-only mode. It allows workerless machines to be used for networking without GPU jobs.

What Does It Do? The --no-worker flag disables the worker component.

Use Cases:

  • Running a coordinator-only node for networking/orchestration.
  • Setting up nodes that participate in cluster communication but don't execute inference tasks.
  • Perfect for machines without GPUs but with good network connectivity.

Example Command:

uv run exo --no-worker

Quick Tip: Add this CLI flag when you start Exo from source.

Relevant files/PRs:

  • PR #1091: [feat] Add an option to disable the worker
  • Commit 839b67f3: implementation
  • src/exo/main.py: CLI flag implementation

Linux XDG Base Directory Specification: Keeping Things Organized 🐧

PR #988 brought XDG Base Directory Specification compliance to Exo on Linux. This means Exo now follows standard Linux conventions for storing configuration files, data, and cache files. This makes Exo a better fit for Linux users.

Where Things Go:

  • Configuration files: ~/.config/exo/ (or $XDG_CONFIG_HOME/exo/)
  • Data files: ~/.local/share/exo/ (or $XDG_DATA_HOME/exo/)
  • Cache files: ~/.cache/exo/ (or $XDG_CACHE_HOME/exo/)

Overriding Locations: You can override these locations by setting XDG environment variables.

Why This Matters: This helps you understand where to find logs, configuration, and downloaded models.

Relevant files/PRs:

  • PR #988: feat: conform to XDG Base Directory Specification on Linux
  • Commit 70c423f5: implementation

Contributing.md Updates: Formatting for All Code ✍️

Commits 383309e2 and 55463a98 added formatting support for TypeScript and Swift code. If you're contributing to the web dashboard (TypeScript) or macOS app (Swift), you'll want to take note. The CONTRIBUTING.md file has been updated to reflect these changes.

What's New? The 'Code Style' section now explicitly mentions TypeScript and Swift formatting support.

How to Format: Run nix fmt before submitting your PRs. This will handle Python, Rust, TypeScript, and Swift code.

Examples of What Gets Formatted:

  • Python: src/ directory
  • Rust: rust/ directory
  • TypeScript: dashboard/ directory
  • Swift: app/ directory

Relevant files/PRs:

  • Commit 383309e2: fmt: add typescript formatting
  • Commit 55463a98: fmt: add swift formatting

Building the macOS App: Your Guide to the Mac App 🛠️

Commit 379744fe open-sourced the macOS application and its build process. Now, you can build the Exo macOS app yourself. This is great for contributors and anyone wanting to understand how the app works.

New Documentation: Check out docs/building-macos-app.md for detailed instructions.

Prerequisites:

  • Xcode and Xcode Command Line Tools
  • Swift development environment
  • Node.js (for building the dashboard)
  • Rust toolchain (for building Rust bindings)
  • Other macOS-specific build tools

Build Steps:

  • Open the project in Xcode.
  • Build dependencies.
  • Build the app bundle.
  • Run the app in development mode.

App Structure:

  • Swift source code: app/EXO/
  • Integration with the Python backend.
  • Embedded dashboard.

Important: The documentation includes troubleshooting tips for common build issues. It also covers code signing and notarization requirements.

Relevant files/PRs:

  • Commit 379744fe: exo: open source mac app and build process
  • app/EXO/: macOS app source code
  • .github/workflows/build-app.yml: CI workflow for building the app

Architecture Updates: Runner's Two-Phase Initialization 🏗️

Commit 8e9332d6 refactored the Runner component to separate its behavior into two distinct phases: a 'connect' phase and a 'load' phase. This architectural improvement provides better control over the initialization process, allows for more granular error handling, and makes it easier to understand and debug the startup sequence. This is important for developers working on the core system or troubleshooting initialization issues. The current architecture.md describes the Runner but doesn't document this two-phase initialization pattern.

Two-Phase Initialization: The Runner now has two phases:

  • Connect phase: Establishes connections and prepares the runner for work.
  • Load phase: Loads models and prepares for inference execution.

Benefits:

  • Better error handling at each phase.
  • Clearer initialization flow.
  • Easier debugging of startup issues.
  • More granular control over the initialization process.

Relevant files/PRs:

  • Commit 8e9332d6: Separate out the Runner's behaviour into a "connect" phase and a "load" phase (#1006)
  • src/exo/worker/runner/runner.py: implementation of the two-phase pattern

Task Deduplication: Smarter Task Execution 🧠

PR #1062 introduced task deduplication to prevent redundant computation. This is a significant performance improvement that prevents the same task from being executed multiple times across the cluster. This is important for developers as it reduces resource usage and improves overall cluster efficiency. The feature automatically detects and eliminates duplicate tasks, which is particularly valuable in distributed inference scenarios. The current architecture.md doesn't document this optimization.

What's Task Deduplication? It stops the same task from running multiple times.

How It Works:

  • Tasks are identified and tracked.
  • Duplicate tasks are detected before execution.
  • Results are shared when duplicates are found.

Benefits:

  • Reduced computational overhead.
  • Better resource utilization.
  • Improved cluster efficiency in distributed inference scenarios.

Relevant files/PRs:

  • PR #1062: task deduplication implementation
  • Commit 17f9b583: implementation
  • MISSED_THINGS.md mentions: 'Deduplication of tasks in plan_step'

Placement Filters with Tensor Parallel Support: Model Placement Magic ✨

Commit 283c0e39 introduced placement filters that support tensor parallel operations with considerations for tensor dimensions and pipeline parallel configurations, including specific support for DeepSeek v3.1. This is significant for developers working with large models like DeepSeek v3.1, as it enables better control over how model layers are distributed across nodes in the cluster. The current architecture.md doesn't document the placement filter system or how it handles different parallelism strategies.

What are Placement Filters? They control how models are distributed across cluster nodes.

Key Features:

  • Support for different parallelism strategies (tensor parallel, pipeline parallel).
  • Consider tensor dimensions when making placement decisions.
  • Optimized placement for models like DeepSeek v3.1.

Benefits:

  • Better performance for large models.
  • Optimized resource utilization.
  • Support for cutting-edge models.

Relevant files/PRs:

  • Commit 283c0e39: Placement filters for tensor parallel supports_tensor, tensor dimension and pipeline parallel deepseek v3.1 (#1058)

That's a wrap, folks! These updates make Exo more powerful and easier to use. Keep an eye on the documentation for all the latest info! Happy coding!