Skip to main content

Get OCR Extraction

GET 

/api/sales-orders/ocr/:extraction_id

Retrieve a single OCR extraction record by ID. Returns the extracted header (customer PO number, dates, customer match, totals), per-line items with quantities/prices/match status, and the current review status.

Statuses: processing, pending_review, confirmed, failed, rejected, duplicate_detected

Source-related fields:

  • sourcemanual, email, or email_body (new). email_body indicates the extraction came from email body text rather than a PDF attachment, processed via Azure OpenAI gpt-4o-mini.
  • source_email_from, source_email_subject — set when source is email or email_body.
  • source_body_text, source_body_html — set when source is email_body. The frontend renders these in a body preview pane instead of the PDF viewer.
  • sales_order_id — the linked sales order created from this extraction (set when status is confirmed). When the linked sales order is later deleted, the FK nullOnDelete() clears this field, leaving the extraction in an "orphaned confirmed" state which the rescan/delete/clone-from/reclassify guards treat as unlocked (see those endpoints' descriptions).
  • sales_order — eager-loaded summary { id, number, order_date, customer_po_number } of the linked sales order, present when sales_order_id is set. The OCR review page uses it to render a click-through link to the created sales order's detail page. Omitted from list responses where the relation is not loaded.
  • clone_of_sales_order_id — set when this extraction was cloned from a prior sales order (via the LLM clone_last_order flag, or manually via the Clone Lines from Sales Order endpoint).
  • clone_of_sales_order — eager-loaded summary { id, number, order_date, customer_po_number } of the source order when clone_of_sales_order_id is set.

Header field normalization (added 2026-05-06): The extractors store header keys with names that vary by source:

  • Azure Document Intelligence (PDF) returns po_number and invoice_date.
  • OpenAI (email body) returns po_number and order_date.

The API response layers canonical UI keys on top of these so the frontend has stable field names to bind to:

  • customer_po_number — derived from po_number (or already-set canonical value).
  • po_date — derived from order_dateinvoice_date (or already-set canonical value).
  • deliver_by_date — the customer-requested delivery date. The PDF flow stores it under due_date (Azure DueDate / the LLM header extractor maps deliver_by_datedue_date); the email-body flow emits deliver_by_date directly. The response bridges due_datedeliver_by_date so the UI always has the key.

The original extractor keys are preserved alongside the canonical ones, so consumers that read po_number / invoice_date / order_date / due_date directly continue to work.

Per-PO ship-to address (added 2026-05-18): The header includes ship_to_address — a nullable object { name, company, address1, address2, city, province, zip, country_code } extracted from the customer PO's "Ship To" / "Deliver To" block. A customer can specify a different shipping destination on each PO, so this is captured per-extraction. It is null when the PO states no separate shipping destination.

Request

Responses

Successful response