@ba_ababa

This channel is going to be big some day. very clearly explained and gives so much value.

@demetriusvaux

ser, your content is pure gold, subscribed and turned alerts ON

@KhenBoKan

Unfortunately only few get to see this gem as YouTube recommends sensationalized thumbnails and titles

@wolpumba4099

Structured Output from LLMs: Grammars, Regex, and State Machines

* 0:00 Introduction: LLMs can transform natural language into structured formats like JSON for various applications, but challenges arise in ensuring consistent formatting and avoiding extraneous information.
* 1:06 OpenAI API Example:  OpenAI's API supports structured output generation using Pydantic and Zod schemas for type validation, increasing format consistency. However, current support is limited to basic features, excluding regex and custom validators.
* 3:02 Outlines Library Example: The open-source Outlines library enables structured output with any LLM, converting schemas to regex for matching. It also allows for direct regex-based output generation.
* 4:57 Finite State Machines and Regex:  Regular expressions are linked to finite state machines (FSMs), which are used to validate output against defined patterns. Each regex has a corresponding FSM.
* 5:58 Regex Matching with LLMs:  LLMs can generate regex-compliant outputs by tracking the current state in the FSM during token generation, filtering invalid tokens, and sampling from valid ones. Pre-computing valid tokens for each state enhances efficiency.
* 8:41 Context-Free Grammars:  For complex nested structures, context-free grammars (CFGs) and pushdown automata (PDA) are necessary, offering greater expressiveness than regular languages.
* 9:40 Incremental Parsing of CFGs:  Incremental parsing checks the validity of generated prefixes against the CFG at each step, but it's computationally expensive due to per-token checks.
* 11:22 Pushdown Automata: PDAs provide a faster approach to CFG-based generation, similar to FSMs but with a stack for tracking nested structures. This allows pre-computation of valid tokens for state-stack combinations.
* 12:18 Token-Terminal Mismatch Problem: Combining CFGs with LLMs introduces a mismatch between grammar terminals and LLM tokens, as tokenization boundaries don't align with grammar elements.
* 14:26 Vocabulary-Aligned Subgrammars & State Machine Composition:  Recent research addresses the mismatch problem through vocabulary-aligned subgrammars (tracking both scanner and parser positions) and state machine composition (combining PDA and tokenizer FSMs).
* 16:06 Format Restriction and LLM Performance: Overly strict format restrictions can negatively impact LLM accuracy. A less restrictive approach to formatting is recommended for better performance.


I used gemini-exp-1121 on rocketrecap dot com to summarize the transcript.
Cost (if I didn't use the free tier): $-0.0191
Input tokens: 18540
Output tokens: 526

@GAGANDEEPSINGH-pb8uu

Great explanation! Thanks for making these videos šŸ‘šŸ‘šŸ’ÆšŸ’Æ

@chenhsu3581

I tried to build a CFG enforcer about a year ago and it was way more complicated than we expected. The number of states that need to be unwrapped is enormous, and the misalignment of tokens and grammar chars is also another big issue.

@davidro00

Amazing

@alexm2716

Thanks for this video. I am using outline for parsing PDF tables from financial documents into a standard format. I have noticed that there is a very slow operation every time I switch to a new JSON schema. A log says it is "compiling FSM", and it takes sometimes upwards of 30 seconds per schema, depending on complexity. Do you know what the purpose of this operation is and why it could be slow?

@qwerty_and_azerty

Great overview, Bai! I’m curious to know what strategies you use to minimize the constraints on the output? My current strategy is to ask the model to output specific json schema as part of the instruction prompt, while keeping the schema as simple as possible (eg, just the names of the expected keys, but no constraints on the values). Then, I implement a complicated cascade of output parsers to try to make sense of the output. It mostly works

@lazizhamdi4535

Hello thank you for your work, this is very interesting, i'm a newer on "contrained decoding for llms", actualy i wonder if there is some work that does the same for "html" generation and more exactely "tables html represtation". I would like to have your opinion on what is the best method to use for this problem, personaly i think "vocabulary aligned subgrammers" may be a good solution as i'm using a tokenizer, but i will be glad to have your thoughts, thank you again.

@erniea5843

Does this change with tool use and MCP?