AIOS Docs
  • Welcome
  • Getting Started
    • Installation
    • Quickstart
      • Use Terminal
      • Use WebUI
    • Environment Variables Configuration
  • AIOS Kernel
    • Overview
    • LLM Core(s)
      • LiteLLM Compatible Backend
      • vLLM Backend
      • Hugging Face Backend
      • LLM Routing
    • Scheduler
      • FIFOScheduler
      • RRScheduler
    • Context
    • Memory
      • Base Layer
      • Agentic Memory Operations
    • Storage
      • sto_mount
      • sto_create_file
      • sto_create_directory
      • sto_write
      • sto_retrieve
      • sto_rollback
      • sto_share
    • Tools
    • Access
    • Syscalls
    • Terminal
  • AIOS Agent
    • How to Use Agent
    • How to Develop Agents
      • Develop with Native SDK
      • Develop with AutoGen
      • Develop with Open-Interpreter
      • Develop with MetaGPT
    • How to Publish Agents
  • AIOS-Agent SDK
    • Overview
    • LLM Core API
      • llm_chat
      • llm_chat_with_json_output
      • llm_chat_with_tool_call_output
      • llm_call_tool
      • llm_operate_file
    • Memory API
      • create_memory
      • get_memory
      • update_memory
      • delete_memory
      • search_memories
      • create_agentic_memory
    • Storage API
      • mount
      • create_file
      • create_dir
      • write_file
      • retrieve_file
      • rollback_file
      • share_file
    • Tool API
      • How to Develop Tools
    • Access API
    • Post API
    • Agent API
  • Community
    • How to Contribute
Powered by GitBook
On this page
  1. AIOS Kernel
  2. LLM Core(s)

vLLM Backend

PreviousLiteLLM Compatible BackendNextHugging Face Backend

Last updated 1 month ago

vLLM backends are handled using the OpenAI client class due to compatibility issues with LiteLLM.

These backends are initialized as OpenAI client instances with a custom base URL:

case "vllm":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))

case "sglang":
    self.llms.append(OpenAI(
        base_url=config.hostname,
        api_key="sk-1234"  # Dummy API key
    ))

It is important to note that a dummy API key ("sk-1234") is required to set up the OpenAI hosted client since these backends typically don't require authentication when run locally.

Standard Text Input

For standard text input, the OpenAI client is used directly:

completed_response = model.chat.completions.create(
    model=model_name,
    **completion_kwargs
)
return completed_response.choices[0].message.content, True

Tool Calls

When processing tool calls with OpenAI client-based backends:

# Add tools to completion parameters
if tools:
    completion_kwargs["tools"] = tools

completed_response = model.chat.completions.create(
    model=model_name,
    **completion_kwargs
)

if tools:
    completed_response = decode_litellm_tool_calls(completed_response)
    return completed_response, True

JSON-Formatted Responses

JSON formatting uses the same approach as standard OpenAI clients:

if message_return_type == "json":
    completion_kwargs["format"] = "json"
    if response_format:
        completion_kwargs["response_format"] = response_format

The response is processed using the same function as for LiteLLM backends.

Source code
decode_litellm_tool_calls