vLLM Backend

Source code

vLLM backends are handled using the OpenAI client class due to compatibility issues with LiteLLM.

These backends are initialized as OpenAI client instances with a custom base URL:

case "vllm":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))

case "sglang":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))

Standard Text Input

For standard text input, the OpenAI client is used directly:

completed_response = model.chat.completions.create(
    model=model_name,
    **completion_kwargs
)
return completed_response.choices[0].message.content, True

Tool Calls

When processing tool calls with OpenAI client-based backends:

# Add tools to completion parameters
if tools:
    completion_kwargs["tools"] = tools

completed_response = model.chat.completions.create(
    model=model_name,
    **completion_kwargs
)

if tools:
    completed_response = decode_litellm_tool_calls(completed_response)
    return completed_response, True

The response is processed using the same decode_litellm_tool_calls function as for LiteLLM backends.

JSON-Formatted Responses

JSON formatting uses the same approach as standard OpenAI clients:

if message_return_type == "json":
    completion_kwargs["format"] = "json"
    if response_format:
        completion_kwargs["response_format"] = response_format

Last updated