AIOS Docs
  • Welcome
  • Getting Started
    • Installation
    • Quickstart
      • Use Terminal
      • Use WebUI
    • Environment Variables Configuration
  • AIOS Kernel
    • Overview
    • LLM Core(s)
      • LiteLLM Compatible Backend
      • vLLM Backend
      • Hugging Face Backend
      • LLM Routing
    • Scheduler
      • FIFOScheduler
      • RRScheduler
    • Context
    • Memory
      • Base Layer
      • Agentic Memory Operations
    • Storage
      • sto_mount
      • sto_create_file
      • sto_create_directory
      • sto_write
      • sto_retrieve
      • sto_rollback
      • sto_share
    • Tools
    • Access
    • Syscalls
    • Terminal
  • AIOS Agent
    • How to Use Agent
    • How to Develop Agents
      • Develop with Native SDK
      • Develop with AutoGen
      • Develop with Open-Interpreter
      • Develop with MetaGPT
    • How to Publish Agents
  • AIOS-Agent SDK
    • Overview
    • LLM Core API
      • llm_chat
      • llm_chat_with_json_output
      • llm_chat_with_tool_call_output
      • llm_call_tool
      • llm_operate_file
    • Memory API
      • create_memory
      • get_memory
      • update_memory
      • delete_memory
      • search_memories
      • create_agentic_memory
    • Storage API
      • mount
      • create_file
      • create_dir
      • write_file
      • retrieve_file
      • rollback_file
      • share_file
    • Tool API
      • How to Develop Tools
    • Access API
    • Post API
    • Agent API
  • Community
    • How to Contribute
Powered by GitBook
On this page
  1. AIOS Kernel
  2. LLM Core(s)

Hugging Face Backend

The Hugging Face Local Backend allows running models locally using the Hugging Face Transformers library.

The HF Local Backend is initialized as a class instance:

case "huggingface":
    self.llms.append(HfLocalBackend(
        model_name=config.name,
        max_gpu_memory=config.max_gpu_memory,
        eval_device=config.eval_device
    ))

It handles loading and running Hugging Face models locally, with options for GPU memory allocation.

Standard Text Input

For standard text requests, the backend uses the generate() method:

completed_response = model.generate(**completion_kwargs)
return completed_response, True

Tool Calls

As huggingface models do not natively support tool calls, the adapter merges tool information into messages before generation and decodes tool calls after generation.

if tools:
    new_messages = merge_messages_with_tools(messages, tools)
    completion_kwargs["messages"] = new_messages

completed_response = model.generate(**completion_kwargs)

# During processing
if tools:
    if isinstance(model, HfLocalBackend):
        if finished:
            tool_calls = decode_hf_tool_calls(completed_response)
            tool_calls = double_underscore_to_slash(tool_calls)
            return LLMResponse(
                response_message=None,
                tool_calls=tool_calls,
                finished=finished
            )

JSON-Formatted Responses

JSON formatting is handled by merging the response format into the messages:

elif message_return_type == "json":
    new_messages = merge_messages_with_response_format(messages, response_format)
    completion_kwargs["messages"] = new_messages
PreviousvLLM BackendNextLLM Routing

Last updated 1 month ago

The function formats the tool information into the prompt, and extracts tool calls from the text response.

The function likely adds instructions for the model to respond in JSON format.

merge_messages_with_tools()
decode_hf_tool_calls()
merge_messages_with_response_format()