talk 25 minutes

Structured Document Extraction with LLMs - From First Principles to Production

Speaker

Louis de Benoist
Louis de Benoist CEO at Retab

About This Session

Structured generation with LLMs makes it easier to extract standardized data from complex documents. But the process for building & evaluating the right schema is still largely artisanal. Retab has built an end-to-end platform for rigorously building these schema-driven pipelines. In this talk, Louis will go over technical insights and best practices.

He will cover the whole process:

  • Going from business problem to JSON schema
  • Context engineering for large documents
  • Designing evals for first-order and second-order extractions
  • Using k-LLM consensus to boost performance and quantify uncertainty
  • Retab’s agent for auto-optimizing JSON schema descriptions
  • Human-in-the-loop for schema-driven automations