vLLM Backend
vLLM backends are handled using the OpenAI client class due to compatibility issues with LiteLLM.
These backends are initialized as OpenAI client instances with a custom base URL:
case "vllm":
self.llms.append(OpenAI(
base_url=config.hostname,
api_key="sk-1234" # Dummy API key
))
case "sglang":
self.llms.append(OpenAI(
base_url=config.hostname,
api_key="sk-1234" # Dummy API key
))
It is important to note that a dummy API key ("sk-1234") is required to set up the OpenAI hosted client since these backends typically don't require authentication when run locally.
Standard Text Input
For standard text input, the OpenAI client is used directly:
completed_response = model.chat.completions.create(
model=model_name,
**completion_kwargs
)
return completed_response.choices[0].message.content, True
Tool Calls
When processing tool calls with OpenAI client-based backends:
# Add tools to completion parameters
if tools:
completion_kwargs["tools"] = tools
completed_response = model.chat.completions.create(
model=model_name,
**completion_kwargs
)
if tools:
completed_response = decode_litellm_tool_calls(completed_response)
return completed_response, True
The response is processed using the same decode_litellm_tool_calls
function as for LiteLLM backends.
JSON-Formatted Responses
JSON formatting uses the same approach as standard OpenAI clients:
if message_return_type == "json":
completion_kwargs["format"] = "json"
if response_format:
completion_kwargs["response_format"] = response_format
Last updated