# vLLM Backend

[Source code](https://github.com/agiresearch/AIOS/blob/main/aios/llm_core/adapter.py)

vLLM backends are handled using the OpenAI client class due to compatibility issues with LiteLLM.

These backends are initialized as OpenAI client instances with a custom base URL:

```python
case "vllm":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))

case "sglang":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))
```

{% hint style="warning" %}
It is important to note that a dummy API key ("sk-1234") is required to set up the OpenAI hosted client since these backends typically don't require authentication when run locally.
{% endhint %}

**Standard Text Input**

For standard text input, the OpenAI client is used directly:

```python
completed_response = model.chat.completions.create(
    model=model_name,
    **completion_kwargs
)
return completed_response.choices[0].message.content, True
```

**Tool Calls**

When processing tool calls with OpenAI client-based backends:

```python
# Add tools to completion parameters
if tools:
    completion_kwargs["tools"] = tools

completed_response = model.chat.completions.create(
    model=model_name,
    **completion_kwargs
)

if tools:
    completed_response = decode_litellm_tool_calls(completed_response)
    return completed_response, True
```

The response is processed using the same [`decode_litellm_tool_calls`](https://github.com/agiresearch/AIOS/blob/main/aios/llm_core/utils.py) function as for LiteLLM backends.

**JSON-Formatted Responses**

JSON formatting uses the same approach as standard OpenAI clients:

```python
if message_return_type == "json":
    completion_kwargs["format"] = "json"
    if response_format:
        completion_kwargs["response_format"] = response_format
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.aios.foundation/aios-docs/aios-kernel/llm-cores/vllm-backend.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
