Instead of placing separate queues within each processing module (e.g., LLM core(s) or memory manager), we centralize all the queues within the scheduler module. This aims to isolate the responsibility for request management from the individual modules, allowing each processing module to focus on its execution and simplifies the coordination of tasks across modules.
The main purpose of the scheduler is to serve different agents queries in a fine-grained granularity (i.e., syscall level granuity) as below.
Below is the implementation for the basic class of the scheduler. Different schedulers will inherit this class and implement their own abstract methods.