GitHub - tezh404/Document-categorizer: AI-Powered Document Categorizer & Organizer

📂 AI-Powered Document Categorizer & Organizer

Supports: PDF, Markdown (.md), and Images

This project uses local LLMs (via LM Studio) to categorize files (PDFs, Markdown, and Images), then organizes them into folders based on their content.

🧠 Powered by LM Studio (Local AI)

Make sure LM Studio is running and a model is served using the OpenAI-compatible API.

To enable the API in LM Studio:

Launch LM Studio.
Load a model (e.g., gemma-3, mistral, etc.).
Go to the Server tab.
Copy the model name and API URL, and add them to your config.json.

🔧 Requirements

pip install -r requirements.txt

⚙️ Configuration (`config.json`)

⚠️ change the name of the config.example.json to config.json

{
    "path": "path-to-your-files",
    "json_path": "Path/output_file_info.json",
    "pages": 3,
    "model_name": "your-model-name",
    "api_url": "http://localhost:1234/v1/chat/completions",
    "api_key": "<API_KEY>",
    "prompt": "\"{file_name}\" , \"{title}\" , \"{text}\" Based on this information, determine the category for this document. It should be a single word in English. Example: Engineering, Computer etc."
}

📁 Step 1: Categorize Files

Run one or more of the following categorizers depending on the file type:

📄 PDF Files

→ Generates pdf_file_info.json

📝 Markdown Files

→ Generates md_file_info.json

🖼️ Image Files

⚠️ Requires a model that supports image input (e.g., gemma-3-12b or gemma-3-4b)

→ Generates img_file_info.json

📂 Step 2: Organize Files

After categorizing, set "json_path" in your config.json to the relevant .json file created, then run:

Files will be moved into folders based on their detected category.

✅ Example Workflow

Start LM Studio and load your model.
Categorize files:
- python pdfCategorizer.py
- python mdCategorizer.py
- python imgCategorizer.py
Run the organizer for each JSON output:
- Update json_path to match (pdf_file_info.json, etc.)
- python Organizer.py