SkillOrchestra Documentation¶
SkillOrchestra is a skill-aware agent orchestration system based on the paper "SkillOrchestra: Learning to Route Agents via Skill Transfer". Instead of end-to-end RL routing, it maintains a Skill Handbook that profiles each agent on fine-grained skills, infers which skills a task requires via LLM, and matches agents to tasks via explicit competence-cost scoring.
Table of Contents¶
Installation¶
How It Works¶
SkillOrchestra routes tasks through a 5-step pipeline:
- Skill Inference — An LLM analyzes the incoming task and identifies which fine-grained skills are required (e.g.,
python_coding,data_analysis,technical_writing), each with an importance weight. - Agent Scoring — Each agent is scored using a weighted competence-cost formula against the required skills. This step is pure math — no LLM calls.
- Agent Selection — The top-k agents with the highest scores are selected.
- Execution — Selected agents execute the task. Multiple agents run concurrently via
ThreadPoolExecutor. - Learning (optional) — An LLM evaluates the output quality, and agent skill profiles are updated via exponential moving average (EMA).
Scoring Formula¶
For each agent, the score is computed as:
score = Σ (competence_weight × competence_i × importance_i + cost_weight × normalized_cost_i × importance_i) / total_importance
Where:
- competence_i is the agent's estimated probability of success on skill i
- normalized_cost_i is 1 - (cost - min_cost) / (max_cost - min_cost) (lower cost = higher score)
- importance_i is how important the skill is for the task
Key Components¶
Data Models¶
| Model | Description |
|---|---|
SkillDefinition |
A fine-grained skill with name, description, and optional category |
AgentSkillProfile |
An agent's competence (0-1) and cost on a specific skill, with execution statistics |
AgentProfile |
Complete skill profile for a single agent |
SkillHandbook |
Central data structure mapping all skills to all agent profiles |
TaskSkillInference |
LLM output: skills required by a given task with importance weights |
AgentSelectionResult |
Result of agent scoring with name, score, and reasoning |
ExecutionFeedback |
Post-execution quality assessment for updating skill profiles |
Arguments Table¶
| Argument | Type | Default | Description |
|---|---|---|---|
name |
str |
"SkillOrchestra" |
Name identifier for the orchestrator |
description |
str |
"Skill-aware agent orchestration..." |
Description of the orchestrator's purpose |
agents |
List[Union[Agent, Callable]] |
None |
List of agents to orchestrate (required, at least 1) |
max_loops |
int |
1 |
Maximum execution-feedback loops per task |
output_type |
OutputType |
"dict" |
Output format: "dict", "str", "json", "final", etc. |
model |
str |
"gpt-5.4" |
LLM model for skill inference and evaluation |
temperature |
float |
0.1 |
LLM temperature for inference calls |
skill_handbook |
Optional[SkillHandbook] |
None |
Pre-built skill handbook. If None, auto-generated from agent descriptions |
auto_generate_skills |
bool |
True |
Whether to auto-generate handbook when none is provided |
cost_weight |
float |
0.3 |
Weight for cost component in scoring (0-1) |
competence_weight |
float |
0.7 |
Weight for competence component in scoring (0-1) |
top_k_agents |
int |
1 |
Number of agents to select per task |
learning_enabled |
bool |
True |
Whether to update skill profiles after execution via EMA |
learning_rate |
float |
0.1 |
EMA learning rate for profile updates |
autosave |
bool |
True |
Whether to save conversation history and handbook to disk |
verbose |
bool |
False |
Whether to log detailed information |
print_on |
bool |
True |
Whether to print panels to console |
Methods Table¶
| Method | Arguments | Returns | Description |
|---|---|---|---|
run |
task: str, img: Optional[str], imgs: Optional[List[str]] |
Any |
Run the full pipeline on a single task |
__call__ |
task: str, *args, **kwargs |
Any |
Callable interface — delegates to run() |
batch_run |
tasks: List[str] |
List[Any] |
Run multiple tasks sequentially |
concurrent_batch_run |
tasks: List[str] |
List[Any] |
Run multiple tasks concurrently |
get_handbook |
— | dict |
Return the current skill handbook as a dictionary |
update_handbook |
handbook: SkillHandbook |
None |
Replace the skill handbook |
Architecture¶
Pipeline Flow¶
flowchart TD
A["Incoming Task"] --> B["1. Skill Inference (LLM)"]
B --> C["2. Agent Scoring (Math)"]
C --> D["3. Select Top-K Agents"]
D --> E["4. Execute Agents"]
E --> F{Learning Enabled?}
F -- Yes --> G["5. Evaluate & Learn (LLM + EMA)"]
G --> H{More Loops?}
H -- Yes --> B
H -- No --> I["Return Output"]
F -- No --> I
subgraph Skill Handbook
S1["Skills: python_coding, api_design, technical_writing, ..."]
S2["Agent Profiles: competence + cost per skill"]
end
B -. reads .-> S1
C -. reads .-> S2
G -. updates .-> S2
Scoring & Selection¶
flowchart LR
subgraph Task Skills
TS1["python_coding (importance: 0.9)"]
TS2["api_design (importance: 0.5)"]
end
subgraph Agent Profiles
AP1["CodeExpert\npython: 0.95, api: 0.90"]
AP2["TechWriter\npython: 0.30, api: 0.50"]
AP3["Researcher\npython: 0.60, api: 0.30"]
end
Task Skills --> SCORE["Scoring Formula\nscore = sum(w_c * competence * importance\n+ w_cost * norm_cost * importance)\n/ total_importance"]
Agent Profiles --> SCORE
SCORE --> R1["CodeExpert: 0.68"]
SCORE --> R2["TechWriter: 0.31"]
SCORE --> R3["Researcher: 0.42"]
R1 --> SEL["Select Top-K"]
R2 --> SEL
R3 --> SEL
SEL --> WIN["CodeExpert selected"]
Execution Modes¶
flowchart TD
SEL["Selected Agents"] --> CHECK{top_k_agents}
CHECK -- "k = 1" --> SINGLE["Direct Execution\nagent.run(task)"]
CHECK -- "k > 1" --> MULTI["Concurrent Execution\nThreadPoolExecutor"]
MULTI --> A1["Agent 1"]
MULTI --> A2["Agent 2"]
MULTI --> A3["Agent N"]
A1 --> COLLECT["Collect Results"]
A2 --> COLLECT
A3 --> COLLECT
SINGLE --> OUTPUT["Output"]
COLLECT --> OUTPUT
Best Practices¶
Agent Design¶
- Write descriptive agent descriptions — The auto-generated skill handbook is only as good as your agent descriptions. Be specific about what each agent can do.
- Use distinct specializations — Agents with overlapping skills reduce the effectiveness of skill-based routing. Make each agent clearly specialized.
- Keep system prompts focused — System prompts should reinforce the agent's specialization, not try to make the agent a generalist.
Tuning Weights¶
- Default (0.7 competence / 0.3 cost) — Good for most use cases where quality matters more than cost.
- High competence weight (0.9 / 0.1) — Use when quality is critical and cost is not a concern.
- Balanced (0.5 / 0.5) — Use when you want a balance between quality and cost efficiency.
- High cost weight (0.3 / 0.7) — Use for high-volume, cost-sensitive workloads where "good enough" is acceptable.
Learning Configuration¶
learning_rate=0.1(default) — Slow adaptation, stable profiles. Good for production.learning_rate=0.3— Faster adaptation. Good for initial calibration of a new team.max_loops=1— Single pass, no refinement. Best for simple tasks.max_loops=2-3— Execute, evaluate, refine. Good for complex tasks that benefit from iterative improvement.
Error Handling¶
try:
result = orchestra.run(task)
except ValueError as e:
# Configuration errors (no agents, invalid weights)
print(f"Configuration error: {e}")
except Exception as e:
# Execution errors (LLM failures, agent errors)
print(f"Execution error: {e}")
Inspecting Routing Decisions¶
Enable verbose=True and print_on=True to see detailed routing information:
orchestra = SkillOrchestra(
agents=agents,
verbose=True, # Logs skill inference and scoring details
print_on=True, # Prints formatted panels to console
)
Saving and Loading Handbooks¶
import json
from swarms.structs.skill_orchestra import SkillHandbook
# Save a tuned handbook
handbook_dict = orchestra.get_handbook()
with open("my_handbook.json", "w") as f:
json.dump(handbook_dict, f, indent=2)
# Load and reuse later
with open("my_handbook.json") as f:
data = json.load(f)
handbook = SkillHandbook.model_validate(data)
orchestra = SkillOrchestra(
agents=agents,
skill_handbook=handbook,
auto_generate_skills=False,
)