Universal DeepSeek-OCR 2 – CPU, MPS, CUDA Support
This repository uses the weights from the original DeepSeek-OCR 2 and modifies model to support inference on different devices such as CPU and MPS (Apple Metal GPU). By default runs on CPU.
Explore more human-like visual encoding.
Usage
Unlike the original DeepSeek-OCR-2 repository, Universal version works on different device types, such as CPU, MPS and even CUDA.
- Clone this repository and navigate to the DeepSeek-OCR-2 folder
git clone https://github.com/Dogacel/Universal-DeepSeek-OCR-2.git
cd Universal-DeepSeek-OCR-2- Install Dependencies
conda create -n deepseek-ocr2 python=3.12.9 -y conda activate deepseek-ocr2 pip install -r requirements.txt
- Run
Choose the sample to run.
python sample_cpu.py # For CPU inference export PYTORCH_ENABLE_MPS_FALLBACK=1 python sample_mps.py # For Apple Metal GPU inference python sample_cuda.py # For NVIDIA GPU inference
Note: if you want to use the CUDA, you might need to install torch from a wheel that is built using CUDA using a command such as pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu118.
Sample Code
from transformers import AutoModel, AutoTokenizer import torch model_name = 'Dogacel/Universal-DeepSeek-OCR-2' tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModel.from_pretrained(model_name, trust_remote_code=True, use_safetensors=True) model = model.eval().to("cpu").to(torch.float16) # prompt = "<image>\nFree OCR. " prompt = "<image>\n<|grounding|>Convert the document to markdown. " image_file = 'sample/paper.png' output_path = 'output' res = model.infer( tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 768, crop_mode = True, save_results = True, test_compress = True, )
vLLM-Inference
For vLLM inference support, refer to the original DeepSeek-OCR-2 repository.
