LLM Routing
When agents send LLM requests to AIOS kernel, the agent can choose one LLM backend or choose multiple LLM backends. 1) If only one LLM backend is specified, then the request will be sent to the specified LLM; 2) If multiple LLM backends are specified, it means that the agent allows this request to be processed by any of the specified LLMs. In this case, AIOS provides two LLM routing strategies — Sequential Routing and Smart Routing — to decide which LLM among the specified LLMs will be chosen to process the request.
Overview of Routing Strategies
AIOS provides two routing strategies to distribute requests across multiple LLM backends:
Sequential Routing
Sequentially cycles through the available models and selects an available one that is within the specific models
Smart Routing
A cost-quality optimized strategy which smartly chooses the lowest cost LLM while maintaining the quality of request processing
Sequential Routing
The SequentialRouting
implements a basic model selection approach for load-balancing LLM requests. It sequentially cycles through the available models and selects an available models one that is within the specified models.
Core Functions
Smart Routing
The SmartRouting
implements a sophisticated cost-quality optimized selection strategy for LLM requests, using historical performance data to predict which models will perform best for a given query while minimizing cost. It leverages a two-stage constrained optimization method. The figure below shows the overall pipeline of this smart routing strategy.
Optimization Methods
Uses Lagrangian dual optimization to globally optimize model selection
Balances overall performance against total cost
Ensures each query is assigned to exactly one model
Usage Example
For implementation details and experimental results, see our official repository and research paper.
Reference
Last updated