For the complete documentation index, see llms.txt. This page is also available as Markdown.

vLLM Backend

Source code

vLLM backends are handled using the OpenAI client class due to compatibility issues with LiteLLM.

These backends are initialized as OpenAI client instances with a custom base URL:

case "vllm":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))

case "sglang":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))

Standard Text Input

For standard text input, the OpenAI client is used directly:

completed_response = model.chat.completions.create(
    model=model_name,
    **completion_kwargs
)
return completed_response.choices[0].message.content, True

Tool Calls

When processing tool calls with OpenAI client-based backends:

The response is processed using the same decode_litellm_tool_calls function as for LiteLLM backends.

JSON-Formatted Responses

JSON formatting uses the same approach as standard OpenAI clients:

Last updated