Deploying the Distributed Scheduler
Deploy the standalone Rust scheduler for improved scalability
This guide shows you how to deploy the Distributed Scheduler, a standalone Rust module that offloads frame dispatching workload from Cuebot, enabling OpenCue to scale to larger render farms.
What is the Distributed Scheduler?
The Distributed Scheduler is a high-performance Rust service that handles frame-to-host matching and dispatching. Unlike Cuebot’s traditional reactive booking system (which responds to each host report), the scheduler operates proactively with an internal loop that continuously searches for pending jobs and intelligently matches them with cached host availability.
Key Benefits
- Reduced Database Load: Host information is cached in memory, dramatically reducing complex booking queries
- Improved Dispatch Latency: Proactive matching reduces time-to-first-frame for new jobs
- Horizontal Scalability: Multiple scheduler instances can share the load by processing different clusters
- Better Resource Utilization: Sophisticated in-memory matching algorithms optimize host selection
For more technical details, see the Scheduler Technical Reference.
System Requirements
To plan your installation of the Distributed Scheduler, consider the following:
- Memory: Minimum 2GB RAM per scheduler instance (scales with number of hosts cached)
- CPU: 2-4 cores recommended per instance
- Network: Low-latency connection to the OpenCue database (same requirements as Cuebot)
- Database: PostgreSQL with the same schema as Cuebot (no additional tables required)
Architecture Overview
The scheduler is organized around clusters, which represent unique combinations of:
- Allocation Clusters: One per facility + show + allocation tag
- Manual Tag Clusters: Groups of manual tags (chunk size configurable)
- Hostname Tag Clusters: Groups of hostname tags (chunk size configurable)
Each scheduler instance processes one or more clusters in a round-robin fashion. The cluster set is built automatically from every show with b_scheduler_managed = true, optionally scoped to one facility via --facility. In distributed deployments, different instances are scoped to different facilities to share the workload.
Before You Begin
Before deploying the scheduler, ensure you have:
- Running OpenCue infrastructure:
- PostgreSQL database (same as used by Cuebot)
- Cuebot instance (version 0.23.0 or later for exclusion list support)
- RQD agents on render hosts
- Network access:
- Scheduler needs database access (same credentials as Cuebot)
- Scheduler needs gRPC access to RQD hosts (default port 8444)
- Installation method:
- Docker (recommended for production) - install Docker
- Pre-built binary (for testing/development)
- Build from source (for customization)
Installation Options
Option 1: Run with Docker (Recommended)
The easiest way to deploy the scheduler in production is using the pre-built Docker image.
1. Download the Docker Image
docker pull opencue/scheduler
2. Create a Configuration File
Create /etc/cue-scheduler/scheduler.yaml with your environment-specific settings:
logging:
level: info,sqlx=warn
database:
pool_size: 20
db_host: your-postgres-host
db_name: cuebot
db_user: cuebot
db_pass: your_password
db_port: 5432
rqd:
grpc_port: 8444
dry_run_mode: false # Set to true for testing without actual dispatch
queue:
monitor_interval: 5s
worker_threads: 4
dispatch_frames_per_layer_limit: 20
manual_tags_chunk_size: 100
hostname_tags_chunk_size: 300
cluster_reload_interval: 120s # How often to reload the managed-show cluster set from the DB
scheduler:
# Optional: Filter to a specific facility
# facility: spi
# Optional: Tags to exclude from all loaded clusters
# ignore_tags:
# - deprecated_tag
The scheduler automatically loads every cluster (allocation, manual, hostname, and hardware host-tags) for all shows where b_scheduler_managed = true. There is no per-show or per-tag selection in the config; ownership is driven solely by the show.b_scheduler_managed DB column (see Configuring Cuebot Exclusion List).
3. Run the Scheduler Container
docker run -d \
--name opencue-scheduler \
--restart unless-stopped \
-v /etc/cue-scheduler/scheduler.yaml:/etc/cue-scheduler/scheduler.yaml:ro \
-p 9090:9090 \
opencue/scheduler
The scheduler will:
- Read configuration from the mounted YAML file
- Expose Prometheus metrics on port 9090
- Automatically restart if it crashes
Option 2: Build and Run with Docker from Source
If you need to customize the scheduler or are developing locally:
1. Check Out the Source Code
Make sure you’ve checked out the source code and your current directory is the root of the checked out source.
2. Build the Docker Image
docker build -t opencue/scheduler -f rust/Dockerfile.scheduler .
This multi-stage build:
- Compiles the Rust scheduler in release mode
- Creates a minimal runtime image with just the binary
- Includes necessary runtime dependencies
3. Run the Container
Follow the same steps as Option 1, step 3 above.
Option 3: Run Pre-built Binary
For testing or development environments:
1. Download the Binary
Download the appropriate pre-built binary from the OpenCue releases page:
- Linux (GNU):
cue-scheduler-VERSION-x86_64-unknown-linux-gnu - Linux (MUSL):
cue-scheduler-VERSION-x86_64-unknown-linux-musl(static, no dependencies) - macOS (Intel):
cue-scheduler-VERSION-x86_64-apple-darwin - macOS (Apple Silicon):
cue-scheduler-VERSION-aarch64-apple-darwin
2. Make Executable and Install
chmod +x cue-scheduler-VERSION-PLATFORM
sudo mv cue-scheduler-VERSION-PLATFORM /usr/local/bin/cue-scheduler
3. Create Configuration
Create ~/.config/cue-scheduler/scheduler.yaml (or any path you prefer).
See the sample configuration in Option 1, step 2 above.
4. Run the Scheduler
# With default config location
cue-scheduler
# Or specify config path
OPENCUE_SCHEDULER_CONFIG=/path/to/scheduler.yaml cue-scheduler
# Or use command-line overrides
cue-scheduler \
--facility spi \
--ignore_tags=deprecated,old
Option 4: Build from Source
For development or customization:
1. Install Prerequisites
Rust toolchain:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Protobuf compiler:
# macOS
brew install protobuf
# Ubuntu/Debian
sudo apt-get install protobuf-compiler
# RHEL/CentOS/Rocky
sudo yum install protobuf-compiler
2. Build the Scheduler
cd OpenCue/rust
cargo build --release -p scheduler
The binary will be at target/release/cue-scheduler.
3. Run
target/release/cue-scheduler --facility spi
Configuring Cuebot Exclusion List
To prevent Cuebot and the Scheduler from competing for the same work, you must configure Cuebot to exclude the clusters handled by the scheduler.
Understanding Exclusion Configuration
Cuebot supports two exclusion mechanisms:
- Global Booking Disable (in
opencue.properties): Turn off all booking in Cuebotdispatcher.turn_off_booking=true - Per-show Scheduler Ownership (
show.b_scheduler_managedcolumn, toggled by operators): Mark a show as managed by the standalone scheduler. Cuebot then skips that show in dispatch and the standalone scheduler dispatches its frames. Toggle withcueadmin:cueadmin -scheduler-managed myshow on cueadmin -scheduler-managed myshow offThis is the sole onboarding mechanism. Granularity is whole-show: when a show is scheduler-managed, the scheduler automatically loads all of its clusters (every allocation, manual, hostname, and hardware host-tag), so there is nothing else to configure per show or per allocation. The set of managed shows is reloaded from the DB periodically (
queue.cluster_reload_interval, default 120s), so flipping the flag takes effect without restarting the scheduler.
Migration Strategy
We recommend a gradual migration approach:
Phase 1: Test with One Show
- Deploy scheduler (optionally scoped to a facility):
cue-scheduler --facility spiYou do not list the show’s allocations or tags anywhere — the scheduler covers all of a managed show’s clusters automatically.
- Hand the show to the scheduler:
cueadmin -scheduler-managed testshow onThe scheduler picks up all of
testshow’s clusters (allocation, manual, hostname, and hardware host-tags) withinqueue.cluster_reload_interval(default 120s), with no restart required. - Monitor both systems:
- Watch scheduler metrics at
http://scheduler-host:9090/metrics - Verify Cuebot no longer dispatches new frames for
testshow - Confirm frames dispatch successfully from the scheduler
- Watch scheduler metrics at
Phase 2: Expand Coverage
Mark each additional show as scheduler-managed:
cueadmin -scheduler-managed mainshow on
No scheduler-side configuration change is needed: the scheduler reloads the managed-show
set from the DB every cluster_reload_interval and automatically picks up all of the
newly managed show’s clusters.
Phase 3: Full Migration (Optional)
Once confident, disable Cuebot booking entirely:
dispatcher.turn_off_booking=true
At this point, all dispatching is handled by the scheduler.
Notes on Granularity
- Marking a show as scheduler-managed applies to the whole show across all of its clusters (allocation, manual, hostname, and hardware host-tags). There is no per-show, per-allocation, or per-tag selection — a show is wholly scheduler-managed or wholly Cuebot-managed.
- The scheduler reloads the managed-show set from the DB every
queue.cluster_reload_interval(default 120s), so toggling a show — or changing its host-tags / subscriptions — is picked up without a restart. - To return a show to Cuebot dispatch, run
cueadmin -scheduler-managed myshow off.
Configuration Reference
Database Settings
database:
pool_size: 20 # Connection pool size (default: 20)
db_host: localhost # PostgreSQL host
db_name: cuebot # Database name
db_user: cuebot # Database user
db_pass: password # Database password
db_port: 5432 # PostgreSQL port
Scheduler Cluster Selection
The scheduler automatically loads all clusters for every show where
b_scheduler_managed = true (toggled with cueadmin -scheduler-managed <show> on|off).
The only scheduler-side knobs are an optional facility scope and a tag-exclusion list:
scheduler:
# Optional: Filter clusters to a specific facility
facility: spi
# Optional: Tags to ignore (exclude from all loaded clusters)
ignore_tags:
- deprecated_tag
- old_allocation
Queue and Performance Tuning
queue:
monitor_interval: 5s # How often to check for work
worker_threads: 4 # Concurrent workers
dispatch_frames_per_layer_limit: 20 # Max frames per layer per cycle
manual_tags_chunk_size: 100 # Manual tags per cluster
hostname_tags_chunk_size: 300 # Hostname tags per cluster
cluster_reload_interval: 120s # How often to reload the managed-show cluster set from the DB
host_candidate_attemps_per_layer: 10 # Host matching retries
job_back_off_duration: 300s # Backoff after failures
# Optional: Exit after N idle cycles (useful for testing)
# empty_job_cycles_before_quiting: 10
Host Cache Configuration
host_cache:
concurrent_groups: 3 # Parallel cache groups
memory_key_divisor: 2GiB # Memory bucketing granularity
checkout_timeout: 12s # Host checkout timeout
monitoring_interval: 1s # Cache monitoring frequency
group_idle_timeout: 10800s # Evict idle cache after 3 hours
Command-Line Overrides
CLI arguments override YAML configuration:
cue-scheduler \
--facility spi \
--ignore_tags=deprecated,old
Monitoring the Scheduler
Prometheus Metrics
The scheduler exposes metrics on port 9090 at /metrics:
curl http://localhost:9090/metrics
Key Metrics:
scheduler_jobs_queried_total- Total jobs fetched from databasescheduler_jobs_processed_total- Total jobs successfully processedscheduler_frames_dispatched_total- Total frames dispatched to hostsscheduler_candidates_per_layer- Distribution of hosts needed per layerscheduler_time_to_book_seconds- Latency from frame creation to dispatchscheduler_no_candidate_iterations_total- Failed matching attempts
Log Output
The scheduler uses structured logging. Configure verbosity:
logging:
level: info # Options: trace, debug, info, warn, error
# Or filter specific modules:
level: info,sqlx=warn # Reduce sqlx noise
Example output:
2025-12-12T10:00:00.123Z INFO scheduler: Starting scheduler feed
2025-12-12T10:00:01.456Z DEBUG scheduler: Found job: Job(id=abc123, name=render_shot_010)
2025-12-12T10:00:02.789Z DEBUG scheduler: Layer layer_id fully consumed.
2025-12-12T10:00:03.012Z INFO scheduler: Processed 5 layers, dispatched 120 frames
Verifying Your Installation
1. Check Scheduler Startup
If running in Docker:
docker logs opencue-scheduler
You should see:
Starting scheduler feed
2. Verify Database Connectivity
The scheduler will log errors if it can’t connect to PostgreSQL:
ERROR Failed to connect to database: connection refused
3. Monitor Frame Dispatch
Watch the metrics endpoint:
watch -n 1 'curl -s http://localhost:9090/metrics | grep scheduler_frames_dispatched_total'
4. Check Cuebot Exclusion
In Cuebot logs, verify exclusions are working:
docker logs cuebot | grep exclusion
Production Deployment Recommendations
Running as a System Service
For non-Docker deployments, create a systemd service:
/etc/systemd/system/opencue-scheduler.service:
[Unit]
Description=OpenCue Distributed Scheduler
After=network.target postgresql.service
[Service]
Type=simple
User=opencue
Group=opencue
Environment=OPENCUE_SCHEDULER_CONFIG=/etc/cue-scheduler/scheduler.yaml
ExecStart=/usr/local/bin/cue-scheduler
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl enable opencue-scheduler
sudo systemctl start opencue-scheduler
sudo systemctl status opencue-scheduler
Resource Allocation
For production deployments:
- Single instance: 2-4 CPU cores, 4-8GB RAM (handles thousands of hosts)
- Multi-instance: Divide clusters across instances based on workload
- Database: Ensure connection pool size × instances < PostgreSQL max_connections
High Availability
Currently (v1.0), the scheduler doesn’t have built-in HA. For resilience:
- Run with systemd/Docker restart policies
- Monitor with health checks on the metrics endpoint
- Use process supervisors (systemd, supervisord, Kubernetes)
Future releases will include automatic cluster distribution for true multi-instance HA.
Troubleshooting
Scheduler Not Dispatching Frames
Check:
- Database connectivity:
psql -h dbhost -U cuebot -d cuebot -c "SELECT 1" - Show ownership: Verify the show has
b_scheduler_managed = true(cueadmin -scheduler-managed <show> on) and that its hosts/allocations carry the expected tags in the database - Cuebot exclusion: Verify Cuebot isn’t still booking the same clusters
- RQD connectivity: Ensure scheduler can reach RQD gRPC port (8444)
High Database Load
Solutions:
- Reduce
database.pool_size - Increase
queue.monitor_interval(check less frequently) - Reduce
queue.worker_threads
Memory Growth
Solutions:
- Lower
host_cache.group_idle_timeoutto evict cache sooner - Reduce
queue.concurrent_groupsin host cache - Monitor with
docker statsor system tools
Frames Failing to Dispatch
Check logs for:
AllocationOverBurst: Allocation has exceeded its burst limitHostLock: Failed to acquire lock (another scheduler instance has a lock on the host)GrpcFailure: RQD communication failure
Current Limitations
Version 1.0 Constraints
- Facility-scoped distribution: Multi-instance deployments are split by facility (
--facility); there is no finer-grained per-cluster assignment - No automatic distribution: Cluster workload isn’t automatically balanced across instances within a facility
- Single instance recommended: While multi-instance is supported, it requires careful facility scoping to avoid overlap
Future Enhancements
Planned for future releases:
- Automatic cluster distribution: Central control module to coordinate multiple schedulers
- Dynamic scaling: Automatically add/remove instances based on workload
- Self-healing: Redistribute clusters when instances fail
- Load balancing: Evenly distribute work across available schedulers
What’s Next?
- Scheduler Technical Reference - Deep dive into architecture
- Deploying RQD - Set up render hosts
- Monitoring Jobs - Track job progress
Getting Help
- Slack: Join #opencue on ASWF Slack
- GitHub Issues: Report bugs or request features