Extract Proposal Field

Module · Stable

Extract one requirement value from a proposal PDF using PyMuPDF text extraction and an LLM structured call.

Given a proposal_id and requirement_id, loads the proposal PDF from document_store, extracts text per page, selects the top-3 most relevant pages by keyword overlap, and calls the LLM to produce a structured result with value, confidence (0–1), source_page, and a verbatim source_excerpt. Normalises the value according to requirement kind (number, currency, date, bool, enum, text). Upserts the result into the extracted_field table and returns the ExtractedField dataclass. Never raises — returns not_found_reason on any failure.

When to use

Use after a trigger or For each step that surfaces a proposal_id and a requirement_id — for example, looping over a proposal’s requirement list to populate an evaluation brief. The output value_normalized and confidence feed naturally into a Branch step (confidence threshold check) or a Set values step that assembles the comparison table.

When not to use

If you need to pull arbitrary text or structured data from a non-proposal PDF with no pre-defined requirement schema, use OCR instead. If the value you need is already a stored field on a proposal record rather than buried in the PDF body, use Ask AI with the field value passed in context to avoid an unnecessary PDF round-trip.

Inputs

Configured per use: proposal_id, requirement_id.

Outputs

value
value_normalized
confidence
source_page
source_excerpt

Auto-generated from the skill registry (load_skills()). Do not edit by hand.