Hugging Face Backend
The Hugging Face Local Backend allows running models locally using the Hugging Face Transformers library.
The HF Local Backend is initialized as a class instance:
case "huggingface":
self.llms.append(HfLocalBackend(
model_name=config.name,
max_gpu_memory=config.max_gpu_memory,
eval_device=config.eval_device
))
It handles loading and running Hugging Face models locally, with options for GPU memory allocation.
Standard Text Input
For standard text requests, the backend uses the generate()
method:
completed_response = model.generate(**completion_kwargs)
return completed_response, True
Tool Calls
As huggingface models do not natively support tool calls, the adapter merges tool information into messages before generation and decodes tool calls after generation.
if tools:
new_messages = merge_messages_with_tools(messages, tools)
completion_kwargs["messages"] = new_messages
completed_response = model.generate(**completion_kwargs)
# During processing
if tools:
if isinstance(model, HfLocalBackend):
if finished:
tool_calls = decode_hf_tool_calls(completed_response)
tool_calls = double_underscore_to_slash(tool_calls)
return LLMResponse(
response_message=None,
tool_calls=tool_calls,
finished=finished
)
The merge_messages_with_tools()
function formats the tool information into the prompt, and decode_hf_tool_calls()
extracts tool calls from the text response.
JSON-Formatted Responses
JSON formatting is handled by merging the response format into the messages:
elif message_return_type == "json":
new_messages = merge_messages_with_response_format(messages, response_format)
completion_kwargs["messages"] = new_messages
The merge_messages_with_response_format()
function likely adds instructions for the model to respond in JSON format.
Last updated