GitHub - WikiTeq/mAItion: mAItion is an all-in-one ready-to-use AI-powered tool that combines your existing knowledge with LLMs, allowing you to chat, search and interact with your data through a slick chat interface.

mAItion is an all-in-one ready-to-use AI-powered tool that combines your existing knowledge with LLMs, allowing you to chat, search and interact with your data through a slick chat interface. With mAItion you can aggregate all your knowledge from many sources using Connectors into a central place and interact with your knowledge with ease!

📚 Documentation: docs.maition.com

✨ Features

Support for both local and remote LLMs for embedding and inference
Asynchronous data ingestion with deduplication and per-source configurable schedules
Data ingestion from S3 buckets with Everything-to-Markdown conversion via MarkItDown
Data ingestion from MediaWiki with Wiki-to-Markdown conversion via html2text
SerpAPI search query results ingestion from Google Search results with customizable queries
Flexible configuration supporting an arbitrary number of connectors
Built with extensibility in mind, allowing for custom connectors addition with ease
MCP servers support (stdio, streamable http)
Web-search support (through external services and via on-premise services)
In-place chat with uploaded documents and images (for multi-modal LLMs)
Code execution and Code interpreter
Text-to-Speech and Speech-to-Text capabilities
Image generator (requires model supporting image generation)
Flexible automation capabilities through Functions and Pipelines
Multi-user setup with fine-grained groups and permissions
Support for multiple customized configurations of LLM models

✨ Use-cases

A single place to chat with your company knowledge that's scattered across many external systems
A central space for looking up and refining your existing knowledge across many knowledge bases
A tool to find secret knowledge that can not be found in the other was across your scattered data
An entry-point into your on-premise hosted LLM models supporting evaluations and per-model settings

🌐 Connectors included

S3 (any AWS compatible Object Storage including AWS, Contabo, B2, Cloudflare R2, OVH, etc)
MediaWiki (all versions supported, both private and public wiki)
SerpAPI

🌐 Extra connectors

Over 100 extra connectors are available at request, including the most popular ones:

Gmail
Google Drive
Jira
Slack
GitHub
Gitlab
Notion
Microsoft Teams
Microsoft Office 365
Dropbox
Trello
Web scraper
YouTube
FTP
SCP
SSH
and many more..

Quick start

Requirements

Docker and Docker Compose
OpenRouter or OpenAI API key (a free OpenRouter account works with the default configuration)
S3 bucket

Setup

Create .env.rag file by copying .env.rag.example (see https://github.com/wikiteq/rag-of-all-trades for details)
- Set OPENROUTER_API_KEY
- Set S3_ACCOUNT1_* values to match your source S3 bucket with files
Create config.yaml out of config.yaml.example
- The default config works OK and is configured to:
  - Use a single S3 bucket as data source
  - Use openai/gpt-oss-20b:free model for rephrase
  - Use local sentence-transformers/all-mpnet-base-v2 model for embeddings
  - You can change the values if necessary, refer to https://github.com/wikiteq/rag-of-all-trades for details
Create .env file by copying .env.openwebui.example
- Set OPENAI_API_KEY
- Optionally set OPENAI_DEFAULT_MODEL

Start the stack by running docker compose up -d. Wait until all the services become healthy. You can check health status by running docker compose ps and checking the STATUS column of the services:

docker compose ps
NAME                                      IMAGE                                      COMMAND                  SERVICE     CREATED         STATUS                   PORTS
rag-of-all-trades-openwebui-api-1         ghcr.io/wikiteq/rag-of-all-trades:latest   "sh -c 'alembic upgr…"   api         4 minutes ago   Up 4 minutes (healthy)   8000/tcp
rag-of-all-trades-openwebui-openwebui-1   ghcr.io/open-webui/open-webui:0.6.5        "/custom-entrypoint.…"   openwebui   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:3000->8080/tcp, [::]:3000->8080/tcp
rag-of-all-trades-openwebui-postgres-1    ankane/pgvector:v0.5.1                     "docker-entrypoint.s…"   postgres    4 minutes ago   Up 4 minutes (healthy)   5432/tcp
rag-of-all-trades-openwebui-redis-1       redis:7                                    "docker-entrypoint.s…"   redis       4 minutes ago   Up 4 minutes (healthy)   6379/tcp

It takes up to a minute for the OpenWebUI to fully boot on cold start.

Once all the services are booted and report healthy status visit http://localhost:3000 and login using Admin credentials. The credentials are defined in X_WEBUI_ADMIN_EMAIL and X_WEBUI_ADMIN_PASS of the .env file. The default ones are:

username: admin@example123.com
password: q1w2e3r4!

If you did not change the ENABLE_OPENAI_API you will also have LLM provider pre-configured with the values you have in the .env including the default chat model

Two components handle RAG service communication:

Filter function (functions/function.py) — intercepts every user message and injects ROAT context automatically. Enabled globally via Admin Panel → Functions.
Knowledge Base Search tool (tools/roat_retrieval.py) — a Workspace Tool that lets the LLM decide when to query ROAT. Requires a model with native function calling support. Both are automatically provisioned on first boot.

Connectors configuration

The service supports multiple data sources, including multiple data sources of the same type, each with its own ingestion schedule. The connectors to enable are defined via config.yaml, and their secrets are defined in the .env.rag file.

S3 Connector

The S3 connector ingests documents from S3 buckets and converts them to Markdown format. The connector has the following configuration options:

# config.yaml

sources:
  - type: "s3" # must be s3
    name: "account1" # arbitrary name for the connector, will be stored in metadata
    config:
      endpoint: "${S3_ACCOUNT1_ENDPOINT}" # s3 endpoint
      access_key: "${S3_ACCOUNT1_ACCESS_KEY}" # s3 access key
      secret_key: "${S3_ACCOUNT1_SECRET_KEY}" # s3 secret key
      region: "${S3_ACCOUNT1_REGION}" # s3 region
      use_ssl: "${S3_ACCOUNT1_USE_SSL}" # use ssl for s3 connection, can be True or False
      buckets: "${S3_ACCOUNT1_BUCKETS}" # single entry or comma-separated list i.e. bucket1,bucket2
      schedules: "${S3_ACCOUNT1_SCHEDULES}" # single entry or comma-separated list i.e. 3600,60

  - type: "s3"
    name: "account2"
    config:
      ...

  - type: "s3"
    name: "account3"
    config:
      ...

# .env.rag

S3_ACCOUNT1_ENDPOINT=https://s3.amazonaws.com
S3_ACCOUNT1_ACCESS_KEY=xxx
S3_ACCOUNT1_SECRET_KEY=xxx
S3_ACCOUNT1_REGION=us-east-1
S3_ACCOUNT1_USE_SSL=True
S3_ACCOUNT1_BUCKETS=bucket1,bucket2
S3_ACCOUNT1_SCHEDULES=3600,60

MediaWiki Connector

The MediaWiki connector ingests documents from MediaWiki sites and converts them to Markdown format. The connector has the following configuration options:

# config.yaml

sources:
  - type: "mediawiki"
    name: "wiki1"
    config:
      host: "${MEDIAWIKI1_HOST}"
      path: "/w/"          # optional, default /w/
      scheme: "https"      # optional, default https
      page_limit: 500      # optional, max pages per namespace (default: unlimited)
      namespaces: "0,1"    # optional, comma-separated namespace IDs (default: content namespaces)
      filter_redirects: true  # optional, exclude redirect pages (default: true)
      username: "${MEDIAWIKI1_USERNAME}"  # optional, for private wikis
      password: "${MEDIAWIKI1_PASSWORD}"  # optional, for private wikis
      schedules: "${MEDIAWIKI1_SCHEDULES}"

  - type: "mediawiki"
    name: "wiki2"
    config:
      host: "${MEDIAWIKI2_HOST}"
      schedules: "${MEDIAWIKI2_SCHEDULES}"

# .env.rag

MEDIAWIKI1_HOST=wiki.example.org
MEDIAWIKI1_SCHEDULES=3600
# Only needed for private wikis requiring login:
#MEDIAWIKI1_USERNAME=your-bot-username
#MEDIAWIKI1_PASSWORD=your-bot-password

SerpAPI Connector

The SerpAPI connector ingests documents from Google Search results and converts them to Markdown format. The connector has the following configuration options:

# config.yaml

sources:
  - type: "serpapi"
    name: "serp_ingestion1"
    config:
      api_key: "${SERPAPI1_KEY}"
      queries: "${SERPAPI1_QUERIES}"
      schedules: "${SERPAPI1_SCHEDULES}"

  - type: "serpapi"
    name: "serp_ingestion2"
    config:

  - type: "serpapi"
    name: "serp_ingestion3"
    config:

# .env.rag

SERPAPI1_KEY=xxxx
SERPAPI1_QUERIES=aaa
SERPAPI1_SCHEDULES=3600

Web Connector

The Web connector ingests content from web pages. It supports two mutually exclusive modes:

URLs mode: scrapes a fixed list of URLs
Sitemap mode: discovers URLs from a sitemap.xml with optional include_prefix filter

# config.yaml

sources:
  # URLs mode
  - type: "web"
    name: "web1"
    config:
      urls: "${WEB1_URLS}"
      html_to_text: true
      schedules: "${WEB1_SCHEDULES}"

  # Sitemap mode
  - type: "web"
    name: "web2"
    config:
      sitemap_url: "${WEB2_SITEMAP_URL}"
      include_prefix: "${WEB2_INCLUDE_PREFIX}"
      html_to_text: true
      schedules: "${WEB2_SCHEDULES}"

# .env.rag

WEB1_URLS=https://example.com/page1,https://example.com/page2
WEB1_SCHEDULES=60
WEB2_SITEMAP_URL=https://example.com/sitemap.xml
WEB2_INCLUDE_PREFIX=/blog/
WEB2_SCHEDULES=60

Jira Connector

The Jira connector ingests issues from Jira Cloud or Jira Server/Data Center. It supports two authentication modes: basic (email + API token) and token (Personal Access Token / PAT).

# config.yaml

sources:
  - type: "jira"
    name: "jira1"
    config:
      server_url: "${JIRA1_SERVER_URL}"
      auth_type: "basic"        # "basic" (email + API token) or "token" (PAT)
      email: "${JIRA1_EMAIL}"   # required when auth_type is "basic"
      api_token: "${JIRA1_API_TOKEN}"
      jql: "${JIRA1_JQL}"       # JQL query to select issues
      max_results: 50
      load_comments: false
      max_comments: 10
      schedules: "${JIRA1_SCHEDULES}"

# .env.rag

JIRA1_SERVER_URL=https://your-org.atlassian.net
JIRA1_EMAIL=your-email@example.com
JIRA1_API_TOKEN=your-api-token
JIRA1_JQL=project = MYPROJECT ORDER BY updated DESC
JIRA1_SCHEDULES=3600

Pipedrive Connector

The Pipedrive connector ingests CRM records from Pipedrive via the REST API v1. Supports activities, deals, notes, organizations, persons, products, projects, leads, tasks, and mails.

# config.yaml

sources:
  - type: "pipedrive"
    name: "pipedrive1"
    config:
      api_token: "${PIPEDRIVE1_API_TOKEN}"
      load_types:           # optional, default: all entity types
        - deals
        - notes
        - persons
        - mails
      max_items: 500        # optional, per-entity limit (default: unlimited)
      max_retries: 3        # optional, retry attempts on failure (default: 3)
      filter_mail_folders:  # optional, default: [inbox]
        - inbox
        - sent
      schedules: "${PIPEDRIVE1_SCHEDULES}"

  - type: "pipedrive"
    name: "pipedrive2"
    config:
      api_token: "${PIPEDRIVE2_API_TOKEN}"
      schedules: "${PIPEDRIVE2_SCHEDULES}"

# .env.rag

PIPEDRIVE1_API_TOKEN=your-pipedrive-api-token
PIPEDRIVE1_SCHEDULES=3600
PIPEDRIVE2_API_TOKEN=your-second-pipedrive-api-token
PIPEDRIVE2_SCHEDULES=3600

Embeddings and Inference

Embeddings support

Both local and remote OpenAI-compatible models are supported for embeddings:

Local (running arbitrary embedding models from HuggingFace)
OpenRouter
OpenAI or OpenAI-compatible

Inference support

Both local and remote OpenAI-compatible models are supported for inference:

OpenRouter
OpenAI or OpenAI-compatible

Embeddings-only HuggingFace local model

You can configure the service to use local embeddings only, in this mode you can use any embedding model supported by HuggingFace. Inference is disabled in this mode, so you won't be able to use the rephrase endpoint.

# config.yaml

embedding:
  provider: local
  # you can use any embedding model supported by HuggingFace
  model_config: sentence-transformers/all-MiniLM-L6-v2
  embedding_dim: 384

inference:
  provider: None
  model_config: None

Embeddings-only OpenRouter/OpenAI model

You can configure the service to use remote embeddings, in this mode you can use any embedding model supported by OpenRouter/OpenAI. Inference is disabled in this mode, so you won't be able to use the rephrase endpoint.

# config.yaml

embedding:
  provider: openrouter
  model_config: text-embedding-3-small
  embedding_dim: 1536

inference:
  provider: None
  model_config: None

You must set OPENROUTER_API_KEY and OPENROUTER_API_BASE in the .env.rag file.

Embeddings and inference OpenRouter/OpenAI model

You can configure the service to use remote embeddings and remote inference, in this mode you can use any embedding and inference models supported by OpenRouter/OpenAI.

# config.yaml

embedding:
  provider: openrouter
  model_config: text-embedding-3-small
  embedding_dim: 1536

inference:
  provider: openrouter
  model_config: gpt-4o

You must set OPENROUTER_API_KEY and OPENROUTER_API_BASE in the .env.rag file.

Reference of the `config.yaml`

The config.yaml file contains the main configuration of the service.

Environment variables (${...}) in the config file are evaluated at runtime.

sources: # holds the list of sources to ingest from (Connectors)

  - type: # type of the connector (s3, mediawiki, serpapi)
    name: # arbitrary name for the connector, will be stored in metadata
    config:
      # connector specific configuration
      schedules: "${S3_ACCOUNT1_SCHEDULES}"

# configures models and dimensions for embeddings
embedding:
  provider: openrouter # `openrouter`/`openai` or `local` for local HuggingFace embeddings
  model_config: text-embedding-3-small # model to use
  embedding_dim: 1536 # dimensions (check with the model docs)

# configures the LLM provider and model
inference:
  provider: openrouter # `openrouter`/`openai`
  model_config: gpt-4o # model to use

# vector store configuration
vector_store:
  table_name: embeddings
  hybrid_search: true # whether to use hybrid search or not
  chunk_size: 512 # chunk size for vector indexing
  chunk_overlap: 50 # overlap between chunks
  # hnsw indexes settings
  hnsw:
    hnsw_m: 16 # number of neighbors
    hnsw_ef_construction: 64 # ef construction parameter for HNSW
    hnsw_ef_search: 40 # ef search parameter for HNSW
    hnsw_dist_method: vector_cosine_ops # distance metric for HNSW

Tech Stack

Rag-Of-All-Trades v0.1 as a RAG backend
OpenWebUI v0.6.5 as a front-end

Troubleshooting

OpenWebUI does not start

The openwebui service depends on the api service healthiness and will remain pending until the API service is online. Check the api container for any errors, review the config.yaml and .env.rag for typos.

HuggingFace connection timeout

requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: da122313-e11f-4d54-b4f3-187abfea0ca3)')

OpenWebUI downloads some HuggingFace models during first boot. Sometimes HuggingFace endpoints may timeout. In this scenario just run docker compose down -v to wipe the stack and start over with docker compose up -d

✨ Features

✨ Use-cases

🌐 Connectors included

🌐 Extra connectors

Quick start

Requirements

Setup

Connectors configuration

S3 Connector

MediaWiki Connector

SerpAPI Connector

Web Connector

Jira Connector

Pipedrive Connector

Embeddings and Inference

Embeddings support

Inference support

Embeddings-only HuggingFace local model

Embeddings-only OpenRouter/OpenAI model

Embeddings and inference OpenRouter/OpenAI model

Reference of the config.yaml

Tech Stack

Troubleshooting

OpenWebUI does not start

HuggingFace connection timeout

Star History

Reference of the `config.yaml`