OCI PIPELINE

2 Squirrels AI

Technology Stack

Document OCR with vision-language model - leveraging DeepSeek Vision for intelligent document understanding and text extraction powered by vLLM inference server.

Vision Model DeepSeek VL
Inference Server vLLM
Processing Document OCR
Output Structured Data

Workflow 1: DeepSeek Vision OCR

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart TB
    subgraph Input["๐Ÿ“ฅ DOCUMENT INPUT"]
        DOC[/"๐Ÿ“„ Document Image
PDF / PNG / JPG"/] CFG[/"โš™๏ธ OCR Config
Language, Mode"/] end subgraph Preprocessing["๐Ÿ”ง IMAGE PREPROCESSING"] LOAD["๐Ÿ“ฅ Load Image
PIL / OpenCV"] RESIZE["๐Ÿ“ Resize & Normalize
Optimal Resolution"] ENCODE["๐Ÿ”ข Encode to Base64
Model Input Format"] end subgraph VisionModel["๐Ÿง  DEEPSEEK VISION MODEL"] direction TB PROMPT["๐Ÿ“ Craft OCR Prompt
Extract all text..."] VL[("๐Ÿง  DeepSeek VL
Vision-Language Model")] INFER["โšก Model Inference
Text Extraction"] end subgraph Output["๐Ÿ“ค TEXT OUTPUT"] RAW["๐Ÿ“œ Raw Extracted Text"] CLEAN["โœจ Clean & Format"] RESULT["๐Ÿ“‹ Final OCR Result"] end DOC --> LOAD CFG --> PROMPT LOAD --> RESIZE RESIZE --> ENCODE ENCODE --> VL PROMPT --> VL VL --> INFER INFER --> RAW RAW --> CLEAN CLEAN --> RESULT

DeepSeek Vision: Multimodal vision-language model capable of understanding document layouts, handwriting, tables, and complex formatting for accurate text extraction.

Workflow 2: vLLM Inference Server

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart LR
    subgraph Client["๐Ÿ“ฑ CLIENT"]
        REQ["๐ŸŒ HTTP Request
Image + Prompt"] RESP["๐Ÿ“ฅ Response
Extracted Text"] end subgraph Gateway["๐Ÿ” API GATEWAY"] AUTH["๐Ÿ”‘ Authentication"] RATE["โฑ๏ธ Rate Limiting"] QUEUE["๐Ÿ“‹ Request Queue"] end subgraph vLLMServer["โšก vLLM SERVER"] direction TB ENGINE[("โšก vLLM Engine
PagedAttention")] BATCH["๐Ÿ“ฆ Continuous Batching
Dynamic Batching"] KV["๐Ÿ’พ KV Cache
Memory Optimization"] GPU["๐ŸŽฎ GPU Inference
CUDA Acceleration"] end subgraph Model["๐Ÿง  DEEPSEEK VL"] WEIGHTS[("๐Ÿง  Model Weights
Vision + Language")] end REQ --> AUTH AUTH --> RATE RATE --> QUEUE QUEUE --> ENGINE ENGINE --> BATCH BATCH --> KV KV --> GPU GPU --> WEIGHTS WEIGHTS --> GPU GPU --> RESP

vLLM Performance: High-throughput inference with PagedAttention for efficient KV cache management, continuous batching for optimal GPU utilization, and memory-efficient serving.

Workflow 3: Document Preprocessing

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart TB
    subgraph Input["๐Ÿ“ฅ RAW INPUT"]
        PDF[/"๐Ÿ“„ PDF Document"/]
        IMG[/"๐Ÿ–ผ๏ธ Image File"/]
        SCAN[/"๐Ÿ“ท Scanned Doc"/]
    end

    subgraph Detection["๐Ÿ” FORMAT DETECTION"]
        DETECT["๐Ÿ”Ž File Type Detection
MIME / Magic Bytes"] ROUTE{"๐Ÿ”€ Route by Type"} end subgraph PDFProcess["๐Ÿ“„ PDF PROCESSING"] EXTRACT["๐Ÿ“‘ Extract Pages
pdf2image"] DPI["๐Ÿ“ Set DPI
300 DPI Default"] end subgraph ImageProcess["๐Ÿ–ผ๏ธ IMAGE PROCESSING"] LOAD2["๐Ÿ“ฅ Load Image"] ORIENT["๐Ÿ”„ Auto-Orient
EXIF Rotation"] DESKEW["๐Ÿ“ Deskew
Angle Correction"] end subgraph Normalize["๐Ÿ“Š NORMALIZATION"] RESIZE2["๐Ÿ“ Resize
Max 2048px"] CONTRAST["๐ŸŽจ Enhance Contrast"] DENOISE["๐Ÿ”‡ Denoise
Gaussian Blur"] SHARP["โœจ Sharpen
Edge Enhancement"] end subgraph Output["๐Ÿ“ค MODEL INPUT"] TENSOR["๐Ÿ”ข To Tensor
Normalized Array"] BASE64["๐Ÿ“ Base64 Encode"] READY["โœ… Ready for VL Model"] end PDF --> DETECT IMG --> DETECT SCAN --> DETECT DETECT --> ROUTE ROUTE -->|"PDF"| EXTRACT ROUTE -->|"Image"| LOAD2 EXTRACT --> DPI DPI --> ORIENT LOAD2 --> ORIENT ORIENT --> DESKEW DESKEW --> RESIZE2 RESIZE2 --> CONTRAST CONTRAST --> DENOISE DENOISE --> SHARP SHARP --> TENSOR TENSOR --> BASE64 BASE64 --> READY

Workflow 4: Multi-Page Processing

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart TB
    subgraph Input["๐Ÿ“ฅ BATCH INPUT"]
        DOCS[/"๐Ÿ“š Multi-Page PDF
or Document Batch"/] CONFIG[/"โš™๏ธ Batch Config
Concurrency, Priority"/] end subgraph Splitter["โœ‚๏ธ PAGE SPLITTER"] SPLIT["๐Ÿ“‘ Split to Pages"] INDEX["๐Ÿ”ข Index Pages
Maintain Order"] QUEUE2["๐Ÿ“‹ Page Queue"] end subgraph ParallelOCR["โšก PARALLEL OCR"] direction LR subgraph Worker1["Worker 1"] W1["๐Ÿง  DeepSeek VL"] end subgraph Worker2["Worker 2"] W2["๐Ÿง  DeepSeek VL"] end subgraph Worker3["Worker 3"] W3["๐Ÿง  DeepSeek VL"] end subgraph WorkerN["Worker N"] WN["๐Ÿง  DeepSeek VL"] end end subgraph Aggregator["๐Ÿ”— RESULT AGGREGATOR"] COLLECT["๐Ÿ“ฅ Collect Results"] ORDER["๐Ÿ”ข Restore Order"] MERGE["๐Ÿ”— Merge Text
Page Separators"] end subgraph Output["๐Ÿ“ค FINAL OUTPUT"] COMBINED["๐Ÿ“„ Combined Document"] META[("๐Ÿ“Š Metadata
Page Count, Confidence")] end DOCS --> SPLIT CONFIG --> QUEUE2 SPLIT --> INDEX INDEX --> QUEUE2 QUEUE2 --> W1 QUEUE2 --> W2 QUEUE2 --> W3 QUEUE2 --> WN W1 --> COLLECT W2 --> COLLECT W3 --> COLLECT WN --> COLLECT COLLECT --> ORDER ORDER --> MERGE MERGE --> COMBINED MERGE --> META

Parallel Processing: Distribute pages across multiple vLLM workers for high-throughput batch document processing with automatic load balancing and result aggregation.

Workflow 5: Structured Data Extraction

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart TB
    subgraph Input["๐Ÿ“ฅ RAW OCR OUTPUT"]
        RAW2[/"๐Ÿ“œ Raw Extracted Text"/]
        SCHEMA[/"๐Ÿ“‹ Target Schema
JSON Template"/] end subgraph Analysis["๐Ÿ” TEXT ANALYSIS"] SEGMENT["๐Ÿ“Š Segment Text
Headers, Body, Tables"] DETECT2["๐Ÿท๏ธ Entity Detection
Dates, Names, Numbers"] PATTERN["๐Ÿ”Ž Pattern Matching
Regex Extraction"] end subgraph LLMParsing["๐Ÿง  LLM STRUCTURED PARSING"] PROMPT2["๐Ÿ“ Parsing Prompt
Schema-Guided"] VL2[("๐Ÿง  DeepSeek VL
JSON Mode")] VALIDATE["โœ… JSON Validation
Schema Check"] end subgraph Transform["๐Ÿ”„ TRANSFORMATION"] NORMALIZE2["๐Ÿ“ Normalize Values
Dates, Currency"] ENRICH["โœจ Enrich Data
Computed Fields"] CLEAN2["๐Ÿงน Clean Nulls
Default Values"] end subgraph Output["๐Ÿ“ค STRUCTURED OUTPUT"] JSON2[("๐Ÿ“‹ JSON Document")] CSV["๐Ÿ“Š CSV Export"] DB[("๐Ÿ—„๏ธ Database
Insert/Update")] end RAW2 --> SEGMENT SCHEMA --> PROMPT2 SEGMENT --> DETECT2 DETECT2 --> PATTERN PATTERN --> PROMPT2 PROMPT2 --> VL2 VL2 --> VALIDATE VALIDATE --> NORMALIZE2 NORMALIZE2 --> ENRICH ENRICH --> CLEAN2 CLEAN2 --> JSON2 JSON2 --> CSV JSON2 --> DB

Workflow 6: Full OCR Architecture

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#C17852', 'primaryTextColor': '#F0F6FC', 'primaryBorderColor': '#4A5E32', 'lineColor': '#E6C98F', 'secondaryColor': '#161B22', 'tertiaryColor': '#0D1117', 'background': '#0D1117', 'mainBkg': '#161B22', 'nodeBorder': '#4A5E32', 'clusterBkg': '#161B22', 'clusterBorder': '#4A5E32', 'titleColor': '#E6C98F', 'edgeLabelBackground': '#161B22'}}}%%
flowchart LR
    subgraph Intake["๐Ÿ“ฅ DOCUMENT INTAKE"]
        UPLOAD["๐Ÿ“ค Upload API
REST / gRPC"] STORAGE[("โ˜๏ธ Object Storage
S3 / GCS")] TRIGGER["โšก Event Trigger
New Document"] end subgraph Pipeline["๐Ÿ”„ OCR PIPELINE"] direction TB PRE["๐Ÿ”ง Preprocessing
Normalize Images"] SPLIT2["โœ‚๏ธ Page Splitting
Multi-Page Support"] subgraph InferenceCluster["โšก vLLM CLUSTER"] LB["โš–๏ธ Load Balancer"] N1["๐Ÿง  DeepSeek VL #1"] N2["๐Ÿง  DeepSeek VL #2"] N3["๐Ÿง  DeepSeek VL #3"] end AGG["๐Ÿ”— Aggregator
Merge Results"] end subgraph PostProcess["โœจ POST-PROCESSING"] STRUCT["๐Ÿ“‹ Structure Extraction
Tables, Forms"] VALIDATE2["โœ… Confidence Check
Quality Score"] FORMAT["๐Ÿ“ Output Formatting
JSON, Markdown"] end subgraph Delivery["๐Ÿ“ค DELIVERY"] WEBHOOK["๐Ÿ”” Webhook
Callback URL"] QUEUE3["๐Ÿ“ฌ Message Queue
Kafka / RabbitMQ"] API["๐ŸŒ REST API
Poll Results"] ARCHIVE[("๐Ÿ—„๏ธ Archive
Long-term Storage")] end UPLOAD --> STORAGE STORAGE --> TRIGGER TRIGGER --> PRE PRE --> SPLIT2 SPLIT2 --> LB LB --> N1 LB --> N2 LB --> N3 N1 --> AGG N2 --> AGG N3 --> AGG AGG --> STRUCT STRUCT --> VALIDATE2 VALIDATE2 --> FORMAT FORMAT --> WEBHOOK FORMAT --> QUEUE3 FORMAT --> API FORMAT --> ARCHIVE

Production Architecture: Scalable document processing pipeline with load-balanced vLLM inference cluster, async processing via message queues, and multiple delivery options.