vLLM Backend

Source codearrow-up-right

vLLM backends are handled using the OpenAI client class due to compatibility issues with LiteLLM.

These backends are initialized as OpenAI client instances with a custom base URL:

case "vllm":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))

case "sglang":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))
circle-exclamation

Standard Text Input

For standard text input, the OpenAI client is used directly:

completed_response = model.chat.completions.create(
    model=model_name,
    **completion_kwargs
)
return completed_response.choices[0].message.content, True

Tool Calls

When processing tool calls with OpenAI client-based backends:

The response is processed using the same decode_litellm_tool_callsarrow-up-right function as for LiteLLM backends.

JSON-Formatted Responses

JSON formatting uses the same approach as standard OpenAI clients:

Last updated