GitHub - AdrianVispalia/xecai: Minimalistic, provider-agnostic, AI python library

4 min read Original article ↗

Develop code for different AI services and providers, change very few lines. Easily customise the behaviour so that it fits your requirements.

Examples

Chat interface

from xecai.chat.implementations.openai.openai_chat import OpenAIChat
from xecai.models import Message, MessageType


messages = [Message(content="what model are you?", message_type=MessageType.USER)]
system_prompt = "you are a helpful bot"
model = "gpt-4o"
chat = OpenAIChat()
chat.check_model(model)

response = chat.invoke(model, system_prompt, messages)
print(response)

for chat_response in chat.stream(model, system_prompt, messages):
    if chat_response.text:
        print(chat_response.text, end="", flush=True)

Agent interface

import asyncio
from typing import Any
from xecai.agents.implementations.openai.openai_agent import OpenAIAgent
from xecai.agents.agent_interface import WebSearchTool, tool
from xecai.models import Message, MessageType


@tool
def divide(args: dict[str, Any]) -> Any:
    """Divide number 'a' by number 'b'. Requires 'a' and 'b'."""
    a = float(args.get("a", 0))
    b = float(args.get("b", 1))
    if b == 0:
        result = "Error: Division by zero"
    else:
        result = a / b
    print(f"\n  [Tool used: divide(a={a}, b={b}) -> {result}]")
    return result


async test():
    system_prompt = (
        "You are a helpful search assistant. "
        "Always use the tools provided to get accurate information, "
        "otherwise say you can't do it. Don't say approximately, use the exact tool result value."
    )
    model = "gpt-4o"
    agent = OpenAIAgent()
    scenarios = [
        ("what global news happened today?", [WebSearchTool]),  # Web search example
        ("divide 234324.23423 by 342.124324234", [divide])      # Custom function example
    ]
    for user_query, tools in scenarios:
        messages = [Message(content=user_query, message_type=MessageType.USER)]
        response = await agent.async_run(
            model_name=model,
            system_prompt=system_prompt,
            messages=messages,
            tools=tools,
        )
        print(response)

asyncio.run(test())

VectorDB interface

from xecai.vector_db.implementations.postgresql.postgresql_vector_db import PostgreSQLVectorDB
from xecai.embeddings.implementations.openai.openai_embedding import OpenAIEmbedding
from xecai.models import SearchType

vector_db = PostgreSQLVectorDB(
    embedding_interface=OpenAIEmbedding(), embedding_model="text-embedding-3-small"
)

chunks = vector_db.sync_retrieve(
    query="this is an example query",
    k=3,
    search_type=SearchType.HYBRID,
)
print(chunks)

Memory interface

from xecai.memory.implementations.postgresql.postgresql_memory import PostgreSQLMemory
from xecai.models import Conversation, Message, MessageType

memory = PostgreSQLMemory()

conversation = memory.sync_get_conversation("example_conversation_id") or Conversation()
print(conversation)

conversation.messages.append(Message(message_type=MessageType.USER, content="example query"))
memory.sync_save_conversation(conversation)

Embedding interface

from xecai.embeddings.implementations.openai.openai_embedding import OpenAIEmbedding

embedding = OpenAIEmbedding()

vector = embedding.sync_get_embeddings("This is a test document.", "text-embedding-3-small")
print(len(vector))

Reranker interface

from xecai.reranker.implementations.aws.aws_reranker import AWSReranker
from xecai.models import Chunk

reranker = AWSReranker()

chunks = [
    Chunk(content="A document about cats.", document="doc1", origin="web", fragment=0),
    Chunk(content="A document about dogs.", document="doc2", origin="web", fragment=0)
]

reranked_chunks = reranker.sync_rerank("tell me about cats", chunks, k=1)
print(reranked_chunks)

Typical rag workflow

Diagram

Details
graph TD
    %% Refined Palette with Intense Cherry and Ink Black
    %% Outlines perfectly hidden by matching stroke to fill color
    
    classDef input fill:#0074D9,stroke:#0074D9,color:#FFFFFF,font-weight:bold,rx:8,ry:8;
    classDef process fill:#003366,stroke:#003366,color:#FFFFFF,rx:8,ry:8;
    classDef condition fill:#FF851B,stroke:#FF851B,color:#FFFFFF,font-weight:bold,rx:8,ry:8;
    
    %% Lighter Grey for the output node
    classDef output fill:#F8FAFC,stroke:#F8FAFC,color:#0B0F19,font-weight:bold,rx:8,ry:8;
    
    %% Intense Cherry
    classDef condense fill:#BA0C2F,stroke:#BA0C2F,color:#FFFFFF,rx:8,ry:8;
    
    %% Ink Black
    classDef retrieve fill:#1B1F29,stroke:#0B0F19,color:#FFFFFF,rx:8,ry:8;

    %% Clean, subtle slate-gray connecting lines
    linkStyle default stroke:#A0AAB2,stroke-width:2px,fill:none;

    %% Nodes
    Query([Receive User Query]):::input
    FetchHistory[Fetch Conversation History]:::process
    CheckHistory{History Exists?}:::condition
    CondenseQuery[Condense Query with Context]:::condense
    RetrieveChunks[(Retrieve Context Chunks)]:::retrieve
    LLMResponse([Generate LLM Response]):::output

    %% Flow
    Query --> FetchHistory
    FetchHistory --> CheckHistory
    CheckHistory -- "Yes" --> CondenseQuery
    CheckHistory -- "No" --> RetrieveChunks
    CondenseQuery --> RetrieveChunks
    RetrieveChunks --> LLMResponse
Loading

You can find an example of the typical RAG implemented with FastAPI on examples/simple_rag.py.

Differences with projects that have a similar objective

Library Notes
JustLLMs Requests are made directly with http. More LLMs supported and many more features (agents, tools, etc.). I personally prefer using the official SDKs.
LiteLLM & Agents SDK + LiteLLM Local proxy router where you send the requests to be translated, adds complexity to the project and another element to the infrastructure.
LangChain The most popular one. Bloated in some features, complex implementations that make it difficult to change its behaviours.
OpenRouter All requests are sent to a 3rd party + additional charges.

Notes

  • I will implement Azure OpenAI's chat implementation if this gets enough traction (I don't want to loose my free Azure credits unless it is worth it).
  • While using Chat objects, retries are left to the user to put and customise (check out examples/retry_example.py), as it adds a lot of complexity to have a default retry logic but also let the user modify it.