Document OCR with vision-language model - leveraging DeepSeek Vision for intelligent document understanding and text extraction powered by vLLM inference server.
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart TB
subgraph Input["๐ฅ DOCUMENT INPUT"]
DOC[/"๐ Document Image
PDF / PNG / JPG"/]
CFG[/"โ๏ธ OCR Config
Language, Mode"/]
end
subgraph Preprocessing["๐ง IMAGE PREPROCESSING"]
LOAD["๐ฅ Load Image
PIL / OpenCV"]
RESIZE["๐ Resize & Normalize
Optimal Resolution"]
ENCODE["๐ข Encode to Base64
Model Input Format"]
end
subgraph VisionModel["๐ง DEEPSEEK VISION MODEL"]
direction TB
PROMPT["๐ Craft OCR Prompt
Extract all text..."]
VL[("๐ง DeepSeek VL
Vision-Language Model")]
INFER["โก Model Inference
Text Extraction"]
end
subgraph Output["๐ค TEXT OUTPUT"]
RAW["๐ Raw Extracted Text"]
CLEAN["โจ Clean & Format"]
RESULT["๐ Final OCR Result"]
end
DOC --> LOAD
CFG --> PROMPT
LOAD --> RESIZE
RESIZE --> ENCODE
ENCODE --> VL
PROMPT --> VL
VL --> INFER
INFER --> RAW
RAW --> CLEAN
CLEAN --> RESULT
DeepSeek Vision: Multimodal vision-language model capable of understanding document layouts, handwriting, tables, and complex formatting for accurate text extraction.
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart LR
subgraph Client["๐ฑ CLIENT"]
REQ["๐ HTTP Request
Image + Prompt"]
RESP["๐ฅ Response
Extracted Text"]
end
subgraph Gateway["๐ API GATEWAY"]
AUTH["๐ Authentication"]
RATE["โฑ๏ธ Rate Limiting"]
QUEUE["๐ Request Queue"]
end
subgraph vLLMServer["โก vLLM SERVER"]
direction TB
ENGINE[("โก vLLM Engine
PagedAttention")]
BATCH["๐ฆ Continuous Batching
Dynamic Batching"]
KV["๐พ KV Cache
Memory Optimization"]
GPU["๐ฎ GPU Inference
CUDA Acceleration"]
end
subgraph Model["๐ง DEEPSEEK VL"]
WEIGHTS[("๐ง Model Weights
Vision + Language")]
end
REQ --> AUTH
AUTH --> RATE
RATE --> QUEUE
QUEUE --> ENGINE
ENGINE --> BATCH
BATCH --> KV
KV --> GPU
GPU --> WEIGHTS
WEIGHTS --> GPU
GPU --> RESP
vLLM Performance: High-throughput inference with PagedAttention for efficient KV cache management, continuous batching for optimal GPU utilization, and memory-efficient serving.
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart TB
subgraph Input["๐ฅ RAW INPUT"]
PDF[/"๐ PDF Document"/]
IMG[/"๐ผ๏ธ Image File"/]
SCAN[/"๐ท Scanned Doc"/]
end
subgraph Detection["๐ FORMAT DETECTION"]
DETECT["๐ File Type Detection
MIME / Magic Bytes"]
ROUTE{"๐ Route by Type"}
end
subgraph PDFProcess["๐ PDF PROCESSING"]
EXTRACT["๐ Extract Pages
pdf2image"]
DPI["๐ Set DPI
300 DPI Default"]
end
subgraph ImageProcess["๐ผ๏ธ IMAGE PROCESSING"]
LOAD2["๐ฅ Load Image"]
ORIENT["๐ Auto-Orient
EXIF Rotation"]
DESKEW["๐ Deskew
Angle Correction"]
end
subgraph Normalize["๐ NORMALIZATION"]
RESIZE2["๐ Resize
Max 2048px"]
CONTRAST["๐จ Enhance Contrast"]
DENOISE["๐ Denoise
Gaussian Blur"]
SHARP["โจ Sharpen
Edge Enhancement"]
end
subgraph Output["๐ค MODEL INPUT"]
TENSOR["๐ข To Tensor
Normalized Array"]
BASE64["๐ Base64 Encode"]
READY["โ
Ready for VL Model"]
end
PDF --> DETECT
IMG --> DETECT
SCAN --> DETECT
DETECT --> ROUTE
ROUTE -->|"PDF"| EXTRACT
ROUTE -->|"Image"| LOAD2
EXTRACT --> DPI
DPI --> ORIENT
LOAD2 --> ORIENT
ORIENT --> DESKEW
DESKEW --> RESIZE2
RESIZE2 --> CONTRAST
CONTRAST --> DENOISE
DENOISE --> SHARP
SHARP --> TENSOR
TENSOR --> BASE64
BASE64 --> READY
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart TB
subgraph Input["๐ฅ BATCH INPUT"]
DOCS[/"๐ Multi-Page PDF
or Document Batch"/]
CONFIG[/"โ๏ธ Batch Config
Concurrency, Priority"/]
end
subgraph Splitter["โ๏ธ PAGE SPLITTER"]
SPLIT["๐ Split to Pages"]
INDEX["๐ข Index Pages
Maintain Order"]
QUEUE2["๐ Page Queue"]
end
subgraph ParallelOCR["โก PARALLEL OCR"]
direction LR
subgraph Worker1["Worker 1"]
W1["๐ง DeepSeek VL"]
end
subgraph Worker2["Worker 2"]
W2["๐ง DeepSeek VL"]
end
subgraph Worker3["Worker 3"]
W3["๐ง DeepSeek VL"]
end
subgraph WorkerN["Worker N"]
WN["๐ง DeepSeek VL"]
end
end
subgraph Aggregator["๐ RESULT AGGREGATOR"]
COLLECT["๐ฅ Collect Results"]
ORDER["๐ข Restore Order"]
MERGE["๐ Merge Text
Page Separators"]
end
subgraph Output["๐ค FINAL OUTPUT"]
COMBINED["๐ Combined Document"]
META[("๐ Metadata
Page Count, Confidence")]
end
DOCS --> SPLIT
CONFIG --> QUEUE2
SPLIT --> INDEX
INDEX --> QUEUE2
QUEUE2 --> W1
QUEUE2 --> W2
QUEUE2 --> W3
QUEUE2 --> WN
W1 --> COLLECT
W2 --> COLLECT
W3 --> COLLECT
WN --> COLLECT
COLLECT --> ORDER
ORDER --> MERGE
MERGE --> COMBINED
MERGE --> META
Parallel Processing: Distribute pages across multiple vLLM workers for high-throughput batch document processing with automatic load balancing and result aggregation.
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart TB
subgraph Input["๐ฅ RAW OCR OUTPUT"]
RAW2[/"๐ Raw Extracted Text"/]
SCHEMA[/"๐ Target Schema
JSON Template"/]
end
subgraph Analysis["๐ TEXT ANALYSIS"]
SEGMENT["๐ Segment Text
Headers, Body, Tables"]
DETECT2["๐ท๏ธ Entity Detection
Dates, Names, Numbers"]
PATTERN["๐ Pattern Matching
Regex Extraction"]
end
subgraph LLMParsing["๐ง LLM STRUCTURED PARSING"]
PROMPT2["๐ Parsing Prompt
Schema-Guided"]
VL2[("๐ง DeepSeek VL
JSON Mode")]
VALIDATE["โ
JSON Validation
Schema Check"]
end
subgraph Transform["๐ TRANSFORMATION"]
NORMALIZE2["๐ Normalize Values
Dates, Currency"]
ENRICH["โจ Enrich Data
Computed Fields"]
CLEAN2["๐งน Clean Nulls
Default Values"]
end
subgraph Output["๐ค STRUCTURED OUTPUT"]
JSON2[("๐ JSON Document")]
CSV["๐ CSV Export"]
DB[("๐๏ธ Database
Insert/Update")]
end
RAW2 --> SEGMENT
SCHEMA --> PROMPT2
SEGMENT --> DETECT2
DETECT2 --> PATTERN
PATTERN --> PROMPT2
PROMPT2 --> VL2
VL2 --> VALIDATE
VALIDATE --> NORMALIZE2
NORMALIZE2 --> ENRICH
ENRICH --> CLEAN2
CLEAN2 --> JSON2
JSON2 --> CSV
JSON2 --> DB
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart LR
subgraph Intake["๐ฅ DOCUMENT INTAKE"]
UPLOAD["๐ค Upload API
REST / gRPC"]
STORAGE[("โ๏ธ Object Storage
S3 / GCS")]
TRIGGER["โก Event Trigger
New Document"]
end
subgraph Pipeline["๐ OCR PIPELINE"]
direction TB
PRE["๐ง Preprocessing
Normalize Images"]
SPLIT2["โ๏ธ Page Splitting
Multi-Page Support"]
subgraph InferenceCluster["โก vLLM CLUSTER"]
LB["โ๏ธ Load Balancer"]
N1["๐ง DeepSeek VL #1"]
N2["๐ง DeepSeek VL #2"]
N3["๐ง DeepSeek VL #3"]
end
AGG["๐ Aggregator
Merge Results"]
end
subgraph PostProcess["โจ POST-PROCESSING"]
STRUCT["๐ Structure Extraction
Tables, Forms"]
VALIDATE2["โ
Confidence Check
Quality Score"]
FORMAT["๐ Output Formatting
JSON, Markdown"]
end
subgraph Delivery["๐ค DELIVERY"]
WEBHOOK["๐ Webhook
Callback URL"]
QUEUE3["๐ฌ Message Queue
Kafka / RabbitMQ"]
API["๐ REST API
Poll Results"]
ARCHIVE[("๐๏ธ Archive
Long-term Storage")]
end
UPLOAD --> STORAGE
STORAGE --> TRIGGER
TRIGGER --> PRE
PRE --> SPLIT2
SPLIT2 --> LB
LB --> N1
LB --> N2
LB --> N3
N1 --> AGG
N2 --> AGG
N3 --> AGG
AGG --> STRUCT
STRUCT --> VALIDATE2
VALIDATE2 --> FORMAT
FORMAT --> WEBHOOK
FORMAT --> QUEUE3
FORMAT --> API
FORMAT --> ARCHIVE
Production Architecture: Scalable document processing pipeline with load-balanced vLLM inference cluster, async processing via message queues, and multiple delivery options.