doc-processing
TypeScript
Async document processing service with OCR, validation, and queue management using Node.js/TypeScript.
- Stars
- 0
- Forks
- 0
- Open issues
- 0
Document processing app
Tech stack: Node.js, Express, TypeScript, Redis, BullMQ, Zod, Docker.
What this does
- Upload a document (PDF, JPEG, PNG) via HTTP.
- Enqueue async processing with BullMQ (OCR -> Extract -> Validate -> Persist).
- Maintain per-document status in Redis: uploaded, processing, processed, validated, validation_failed, done, failed.
- Persist raw file on disk and metadata in Redis.
Quick start
Docker (recommended):
- Build and start services
docker compose build
docker compose up
-
API available at http://localhost:3000
-
Upload a file
curl -F "document=@assets/untitled.pdf" http://localhost:3000/upload
- List docs (IDs)
curl http://localhost:3000/documents
- List summaries
curl "http://localhost:3000/documents?summary=1"
- Get a document by id
curl http://localhost:3000/documents/<id>
Env vars:
- PORT: API port (default 3000)
- REDIS_URL: e.g. redis://localhost:6379 (docker-compose sets to redis://redis:6379)
- STORAGE_DIR: local file storage dir (default ./storage)
API
-
GET /health
- returns: { success, redis }
-
POST /upload
- form-data field: document (file)
- returns: { documentId, status, createdAt }
-
GET /documents
- returns: { ids: string[] }
-
GET /documents?summary=1
- returns: { documents: { id, status, originalFilename, size }[] }
-
GET /documents/:id
- returns: all metadata for a document (status, filenames, metadata when available)
Processing pipeline
- Upload: saves file to STORAGE_DIR and creates Redis hash doc: with status "uploaded".
- Queue: adds a job with attempts=3 and exponential backoff.
- Worker steps:
- processing -> simulateOCR -> processed
- extractInvoiceData -> validate (Zod)
- if invalid: validation_failed + save metadata + job fails (will retry)
- if valid: validated -> save metadata -> done
- on ultimate failure: status set to failed and job moved to dead-letter queue