📂 AI-Powered Document Categorizer & Organizer
Supports: PDF, Markdown (.md), and Images
This project uses local LLMs (via LM Studio) to categorize files (PDFs, Markdown, and Images), then organizes them into folders based on their content.
🧠 Powered by LM Studio (Local AI)
Make sure LM Studio is running and a model is served using the OpenAI-compatible API.
To enable the API in LM Studio:
- Launch LM Studio.
- Load a model (e.g.,
gemma-3,mistral, etc.). - Go to the Server tab.
- Copy the model name and API URL, and add them to your
config.json.
🔧 Requirements
pip install -r requirements.txt
⚙️ Configuration (config.json)
⚠️ change the name of the config.example.json to config.json
{
"path": "path-to-your-files",
"json_path": "Path/output_file_info.json",
"pages": 3,
"model_name": "your-model-name",
"api_url": "http://localhost:1234/v1/chat/completions",
"api_key": "<API_KEY>",
"prompt": "\"{file_name}\" , \"{title}\" , \"{text}\" Based on this information, determine the category for this document. It should be a single word in English. Example: Engineering, Computer etc."
}📁 Step 1: Categorize Files
Run one or more of the following categorizers depending on the file type:
📄 PDF Files
→ Generates pdf_file_info.json
📝 Markdown Files
→ Generates md_file_info.json
🖼️ Image Files
gemma-3-12b or gemma-3-4b)
→ Generates img_file_info.json
📂 Step 2: Organize Files
After categorizing, set "json_path" in your config.json to the relevant .json file created, then run:
Files will be moved into folders based on their detected category.
✅ Example Workflow
-
Start LM Studio and load your model.
-
Categorize files:
python pdfCategorizer.pypython mdCategorizer.pypython imgCategorizer.py
-
Run the organizer for each JSON output:
- Update
json_pathto match (pdf_file_info.json, etc.) python Organizer.py
- Update