Get OCR Extraction
GET/api/sales-orders/ocr/:extraction_id
Retrieve a single OCR extraction record by ID. Returns the extracted header (customer PO number, dates, customer match, totals), per-line items with quantities/prices/match status, and the current review status.
Statuses: processing, pending_review, confirmed, failed, rejected, duplicate_detected
Source-related fields:
source—manual,email, oremail_body(new).email_bodyindicates the extraction came from email body text rather than a PDF attachment, processed via Azure OpenAI gpt-4o-mini.source_email_from,source_email_subject— set whensourceisemailoremail_body.source_body_text,source_body_html— set whensourceisemail_body. The frontend renders these in a body preview pane instead of the PDF viewer.sales_order_id— the linked sales order created from this extraction (set whenstatusisconfirmed). When the linked sales order is later deleted, the FKnullOnDelete()clears this field, leaving the extraction in an "orphaned confirmed" state which the rescan/delete/clone-from/reclassify guards treat as unlocked (see those endpoints' descriptions).sales_order— eager-loaded summary{ id, number, order_date, customer_po_number }of the linked sales order, present whensales_order_idis set. The OCR review page uses it to render a click-through link to the created sales order's detail page. Omitted from list responses where the relation is not loaded.clone_of_sales_order_id— set when this extraction was cloned from a prior sales order (via the LLMclone_last_orderflag, or manually via the Clone Lines from Sales Order endpoint).clone_of_sales_order— eager-loaded summary{ id, number, order_date, customer_po_number }of the source order whenclone_of_sales_order_idis set.
Header field normalization (added 2026-05-06): The extractors store header keys with names that vary by source:
- Azure Document Intelligence (PDF) returns
po_numberandinvoice_date. - OpenAI (email body) returns
po_numberandorder_date.
The API response layers canonical UI keys on top of these so the frontend has stable field names to bind to:
customer_po_number— derived frompo_number(or already-set canonical value).po_date— derived fromorder_date→invoice_date(or already-set canonical value).deliver_by_date— the customer-requested delivery date. The PDF flow stores it underdue_date(AzureDueDate/ the LLM header extractor mapsdeliver_by_date→due_date); the email-body flow emitsdeliver_by_datedirectly. The response bridgesdue_date→deliver_by_dateso the UI always has the key.
The original extractor keys are preserved alongside the canonical ones, so consumers that read po_number / invoice_date / order_date / due_date directly continue to work.
Per-PO ship-to address (added 2026-05-18):
The header includes ship_to_address — a nullable object { name, company, address1, address2, city, province, zip, country_code } extracted from the customer PO's "Ship To" / "Deliver To" block. A customer can specify a different shipping destination on each PO, so this is captured per-extraction. It is null when the PO states no separate shipping destination.
Request
Responses
- 200
Successful response