Overview
Routing uses an LLM (or a lightweight classifier) to categorize an incoming request and direct it to a specialized handler — a dedicated prompt, a different model, or an entirely separate processing pipeline. Instead of building a single prompt that tries to handle every possible input type, you create focused handlers that each excel at one category. The router acts as a dispatcher, reading the input and deciding which path to take. This separation of concerns produces better results because each handler can be independently optimized for its specific domain.
How It Works
- Define the categories. Identify the distinct types of input your system needs to handle. Each category should map to a meaningfully different processing strategy (e.g., “billing question,” “technical support,” “general inquiry”).
- Build the router. Write a prompt (or train a classifier) that takes the raw input and outputs a category label. Keep the router’s job narrow: classification only, no answer generation.
- Create specialized handlers. For each category, design a dedicated prompt or sub-workflow optimized for that type of input. These handlers can use different system messages, few-shot examples, tools, or even different models.
- Dispatch the input. Based on the router’s classification, forward the original input to the appropriate handler.
- Return the handler’s output. The result from the selected handler is returned as the final response. Optionally, include metadata about which route was selected for observability.
When to Use
- Your inputs fall into distinct categories that benefit from different handling strategies.
- A single general-purpose prompt struggles to maintain quality across all input types.
- You want to optimize cost by routing simple queries to smaller, cheaper models and complex queries to more capable ones.
- You need clear separation of concerns for maintainability — each handler can be tested and updated independently.
- Classification accuracy is high enough that routing errors are rare.
When Not to Use
- All inputs are similar enough that a single prompt handles them well.
- You have only two or three categories and the handling differences are minor — a single prompt with conditional instructions may suffice.
- The categories are ambiguous or overlapping, making reliable classification difficult.
- Latency constraints do not allow the extra round-trip for classification.
Example
# Routing: Customer support system that routes tickets to specialized handlers.
from enum import Enum
class TicketCategory(Enum):
BILLING = "billing"
TECHNICAL = "technical"
GENERAL = "general"
def classify_ticket(message: str) -> TicketCategory:
"""Router: Classify the customer message into a category."""
response = llm.call(
system="Classify the customer message into exactly one category: billing, technical, or general.",
prompt=message
)
return TicketCategory(response.text.strip().lower())
def handle_billing(message: str) -> str:
return llm.call(
system="You are a billing specialist. You have access to account and invoice data.",
prompt=message
).text
def handle_technical(message: str) -> str:
return llm.call(
system="You are a senior technical support engineer. Provide step-by-step troubleshooting.",
prompt=message
).text
def handle_general(message: str) -> str:
return llm.call(
system="You are a friendly customer service representative.",
prompt=message
).text
HANDLERS = {
TicketCategory.BILLING: handle_billing,
TicketCategory.TECHNICAL: handle_technical,
TicketCategory.GENERAL: handle_general,
}
# Run the router
ticket = "I was charged twice for my subscription last month."
category = classify_ticket(ticket)
response = HANDLERS[category](ticket)
Related Patterns
- Prompt Chaining — Routing is often the first step in a chain, selecting which downstream chain to execute.
- Parallelization — When inputs could belong to multiple categories, you can run multiple handlers in parallel and merge results.
- Orchestrator-Worker — For tasks where the set of categories is not fixed, an orchestrator can dynamically decide what workers to invoke.