GitHub - robbyzhaox/myocr: A highly extensible and customizable framework for building OCR systems.

MyOCR - Advanced OCR Pipeline Builder

MyOCR is a highly extensible and customizable framework for building OCR systems. Engineers can easily train, integrate deep learning models into custom OCR pipelines for real-world applications.

Try the online demo on HuggingFace or ModelScope

🌟 Key Features:

⚡️ End-to-End OCR Development Framework – Designed for developers to build and integrate detection, recognition, and custom OCR models in a unified and flexible pipeline.

🛠️ Modular & Extensible – Mix and match components - swap models, predictors, or input output processors with minimal changes.

🔌 Developer-Friendly by Design - Clean Python APIs, prebuilt pipelines and processors, and straightforward customization for training and inference.

🚀 Production-Ready Performance – ONNX runtime support for fast CPU/GPU inference, support various ways of deployment.

📣 Updates

🔥2025.05.17 MyOCR v0.1.1 released

🛠️ Installation

📦 Requirements

Python 3.11+
CUDA: Version 12.6 or higher is recommended for GPU acceleration. CPU-only mode is also supported.
Operating System: Linux, macOS, or Windows.

📥 Install Dependencies

# Clone the code from GitHub
git clone https://github.com/robbyzhaox/myocr.git
cd myocr

# You can create your own venv before the following steps
# Install dependencies
pip install -e .

# Development environment installation
pip install -e ".[dev]"

# Download pre-trained model weights to models
# for Linux, macOS
mkdir -p ~/.MyOCR/models/
# for Windows, the "models" directory can be created in the current path
Download weights from: https://drive.google.com/drive/folders/1RXppgx4XA_pBX9Ll4HFgWyhECh5JtHnY
# Alternative download link: https://pan.baidu.com/s/122p9zqepWfbEmZPKqkzGBA?pwd=yq6j

🚀 Quick Start

🖥️ Local Inference

Basic OCR Recognition

from myocr.pipelines import CommonOCRPipeline

# Initialize common OCR pipeline (using GPU)
pipeline = CommonOCRPipeline("cuda:0")  # Use "cpu" for CPU mode

# Perform OCR recognition on an image
result = pipeline("path/to/your/image.jpg")
print(result)

Structured OCR Output (Example: Invoice Information Extraction)

config chat_bot in myocr.pipelines.config.structured_output_pipeline.yaml

chat_bot:
  model: qwen2.5:14b
  base_url: http://127.0.0.1:11434/v1
  api_key: 'key'

Note: chat bot currently support:

Ollama API
OpenAI API

from pydantic import BaseModel, Field
from myocr.pipelines import StructuredOutputOCRPipeline

# Define output data model, refer to InvoiceModel in main.py

# Initialize structured OCR pipeline
pipeline = StructuredOutputOCRPipeline("cuda:0", InvoiceModel)

# Process image and get structured data
result = pipeline("path/to/invoice.jpg")
print(result.to_dict())

🐳 Docker Deployment

The framework provides support for Docker deployment, which can be built and run using the following commands:

Run the Docker Container

docker run -d -p 8000:8000 robbyzhaox/myocr:latest

# set the environment variables like following with -e option of docker run if you want use the StructuredOutputOCRPipline
docker run -d \
  -p 8000:8000 \
  -e CHAT_BOT_MODEL="qwen2.5:14b" \
  -e CHAT_BOT_BASEURL="http://127.0.0.1:11434/v1" \
  -e CHAT_BOT_APIKEY="key" \
  robbyzhaox/myocr:latest

Accessing API Endpoints (Docker)

IMAGE_PATH="your_image.jpg"

BASE64_IMAGE=$(base64 -w 0 "$IMAGE_PATH")  # Linux
#BASE64_IMAGE=$(base64 -i "$IMAGE_PATH" | tr -d '\n') # macOS

curl -X POST \
  -H "Content-Type: application/json" \
  -d "{\"image\": \"${BASE64_IMAGE}\"}" \
  http://localhost:8000/ocr

🔗 Using Rest API

The framework provides a simple Flask API service that can be called via HTTP interface:

# Start the service default port: 5000
python main.py

API endpoints:

GET /ping: Check if the service is running properly
POST /ocr: Basic OCR recognition
POST /ocr-json: Structured OCR output

We also have a UI for these endpoints, please refer to doc-insight-ui

🎖 Contribution Guidelines

We welcome any form of contribution, including but not limited to:

Submitting bug reports
Adding new features
Improving documentation
Optimizing performance

📄 License

This project is open-sourced under the Apache 2.0 License, see the LICENSE file for details.