GitHub - Dogacel/Universal-DeepSeek-OCR-2: Run DeepSeek-OCR-2 on any device

2 min read Original article ↗

Universal DeepSeek-OCR 2 – CPU, MPS, CUDA Support

Hugging Face

This repository uses the weights from the original DeepSeek-OCR 2 and modifies model to support inference on different devices such as CPU and MPS (Apple Metal GPU). By default runs on CPU.


DeepSeek AI

DeepSeek-OCR 2: Visual Causal Flow

Explore more human-like visual encoding.

Usage

Unlike the original DeepSeek-OCR-2 repository, Universal version works on different device types, such as CPU, MPS and even CUDA.

  1. Clone this repository and navigate to the DeepSeek-OCR-2 folder
git clone https://github.com/Dogacel/Universal-DeepSeek-OCR-2.git
cd Universal-DeepSeek-OCR-2
  1. Install Dependencies
conda create -n deepseek-ocr2 python=3.12.9 -y
conda activate deepseek-ocr2
pip install -r requirements.txt
  1. Run

Choose the sample to run.

python sample_cpu.py  # For CPU inference

export PYTORCH_ENABLE_MPS_FALLBACK=1
python sample_mps.py  # For Apple Metal GPU inference

python sample_cuda.py # For NVIDIA GPU inference

Note: if you want to use the CUDA, you might need to install torch from a wheel that is built using CUDA using a command such as pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu118.

Sample Code

from transformers import AutoModel, AutoTokenizer
import torch

model_name = 'Dogacel/Universal-DeepSeek-OCR-2'

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, use_safetensors=True)
model = model.eval().to("cpu").to(torch.float16)

# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'sample/paper.png'
output_path = 'output'

res = model.infer(
    tokenizer, 
    prompt=prompt, 
    image_file=image_file, 
    output_path = output_path, 
    base_size = 1024, 
    image_size = 768, 
    crop_mode = True, 
    save_results = True,
    test_compress = True,
)

vLLM-Inference

For vLLM inference support, refer to the original DeepSeek-OCR-2 repository.