The Mixture-of-Experts Approach to Construction Document Extraction
A construction bid set is not one document. It is a collection of radically different visual documents that happen to come in the same PDF.
Page 1 might be an architectural floor plan showing room layouts and wall types. Page 15 is an electrical power plan with device symbols and circuit designations. Page 42 is a panel schedule — a structured table. Page 67 is a single-line diagram showing power distribution topology. Page 89 is a mechanical duct layout with rectangular sections and flow arrows. Page 112 is a plumbing riser diagram showing vertical pipe routing.
Each of these page types has its own visual language, its own extraction requirements, and its own failure modes. Treating them all the same is the single biggest mistake in construction document AI.
This post describes the mixture-of-experts architectural pattern — how it works, why it is necessary for construction extraction, and what makes it outperform monolithic approaches.
The Problem with Single-Model Extraction
The intuitive approach to construction extraction is straightforward: take a powerful vision model, give it a comprehensive prompt describing all the things you want extracted, and run it on every page.
This approach fails for a structural reason: the extraction task varies too much across drawing types for a single set of instructions to handle well.
The Instruction Conflict Problem
Consider what you would need to tell a single model:
- "On floor plans, count every device symbol and read its designation label"
- "On panel schedules, parse the table rows and extract circuit data"
- "On duct plans, measure rectangular sections and note their dimensions"
- "On schematics, trace connection topology and identify components"
- "On detail drawings, read dimension callouts and material specifications"
These are fundamentally different tasks. A model trying to follow all of them simultaneously on any given page will spend attention on irrelevant instructions, increasing the chance of errors and hallucinations.
It is the equivalent of giving an estimator a single instruction sheet that covers electrical takeoff, mechanical measurement, plumbing counting, structural steel detailing, and earthwork calculation — and asking them to figure out which instructions apply to whatever drawing they are looking at. Even a skilled estimator would perform worse than one who receives only the instructions relevant to the drawing type they are working on.
The Vocabulary Collision Problem
Different drawing types use overlapping visual elements with different meanings:
- A circle on an electrical plan is a junction box. On a fire alarm plan, it is a smoke detector. On a plumbing plan, it is a cleanout. On a site plan, it is a manhole.
- A rectangle on a mechanical plan is an air handling unit. On an electrical plan, it is a panel board. On an architectural plan, it is a room.
- A line with tick marks on a duct plan represents insulation. On a structural plan, it represents reinforcement.
A single model processing all page types has no reliable way to resolve these ambiguities because it has no context about which vocabulary applies. The result is systematic misclassification that contaminates the takeoff.
The Mixture-of-Experts Pattern
The mixture-of-experts (MoE) approach solves this by decomposing the extraction problem into specialized sub-problems:
Step 1: Classification (The Router)
Before any extraction happens, each page in the drawing set is classified by type and discipline. This is the "routing" layer that determines which experts handle each page.
Classification uses two complementary approaches:
Deterministic signals: Drawing titles, sheet numbering conventions, and text patterns provide strong classification signals. A sheet titled "E-201 FIRST FLOOR POWER PLAN" is unambiguously an electrical power plan. A sheet with "PANEL SCHEDULE" in the title block is a panel schedule. These rules are fast, reliable, and catch the majority of well-labeled drawings.
AI classification: For ambiguous pages — unlabeled sheets, cover pages with mixed content, or drawings where the title does not clearly indicate the type — a vision model examines the visual content and assigns type labels. This handles the cases that deterministic rules cannot.
The key insight: classification is multi-label, not single-label. A single page can contain both architectural and electrical information (an overlay), or both lighting and power information (a combined plan). The router identifies all trades present on a page, not just the primary one.
Step 2: Expert Selection
Based on the classification, the system selects which extraction experts to invoke for each page. Each expert is a specialized extraction module with:
- A defined vocabulary of items it knows how to extract
- A blocked vocabulary of items it should never report (preventing cross-contamination)
- Drawing-type-specific extraction instructions tuned to the failure modes of that drawing type
- Appropriate measurement logic (counting for discrete devices, linear measurement for runs, table parsing for schedules)
- Expected output patterns that serve as sanity checks
A single page might invoke multiple experts if it contains multiple trades — for example, a combined electrical/fire alarm plan would invoke both the electrical expert and the fire alarm expert, each extracting only the items in its domain.
Step 3: Parallel Extraction
Selected experts run on the page in parallel (or sequentially, depending on resource constraints). Each expert sees the same page image but interprets it through its specialized lens:
- The electrical floor plan expert looks for receptacles, switches, luminaires, and junction boxes, reading designation labels and counting by type
- The panel schedule expert parses the tabular structure, extracting circuit numbers, breaker sizes, loads, and connected equipment
- The mechanical duct plan expert identifies duct sections, measures dimensions, and catalogs equipment and terminal devices
- The plumbing fixture expert counts fixtures by type and specification
- The fire alarm expert identifies detectors, manual call points, notification appliances, and control panels
Each expert produces a structured output with item descriptions, quantities, units, and confidence scores.
Step 4: Merge and Reconciliation
The outputs from all experts across all pages are merged into a unified extraction result. This step handles:
- Deduplication — when the same item is extracted by multiple experts (because it appears on overlapping drawing types), the system resolves the duplicate rather than double-counting
- Discipline assignment — each item is assigned to the correct discipline based on the expert that extracted it, with override rules for items that are commonly mis-categorized
- Quantity normalization — ensuring consistent units (linear items in linear meters, areas in square meters, discrete items counted) and correcting obvious measurement errors
- Confidence aggregation — items extracted with high confidence by their specialized expert carry higher overall confidence than items at the boundary between two experts' domains
Why This Architecture Outperforms
Reason 1: Reduced Attention Competition
Each expert only needs to focus on the items in its domain. The electrical expert is not distracted by ductwork symbols. The mechanical expert is not confused by electrical designations. This narrow focus directly improves accuracy because the model's attention is not split across irrelevant tasks.
Reason 2: Hard Boundary Enforcement
The blocked vocabulary for each expert acts as a hard filter. If an electrical extraction module reports finding a "VAV box" (a mechanical component), the system rejects it — because that item is on the blocked list for electrical extractors. This prevents the most common cross-discipline errors that plague single-model approaches.
Reason 3: Specialized Failure Mode Handling
Each drawing type has characteristic failure modes:
- Panel schedules often have merged cells and inconsistent formatting
- Floor plans have symbol legend variations between consultants
- Duct plans have dimension text that overlaps with duct lines
- Schematics have symbols that look different at different zoom levels
A specialized expert can include instructions and logic to handle the failure modes specific to its drawing type. A general-purpose model cannot anticipate and handle all failure modes across all drawing types.
Reason 4: Incremental Improvement
When a new drawing type is encountered or an existing expert underperforms, the fix is localized: update one expert's instructions, vocabulary, or logic without affecting the others. In a single-model system, changing the prompt to fix one drawing type's extraction often breaks another's.
This modularity means the system improves steadily over time as each expert is refined independently.
Reason 5: Cost and Latency Control
Not every page needs the same level of AI processing. A well-titled sheet that deterministic rules can classify with certainty skips the AI classification step. A panel schedule that is clearly tabular goes straight to the table parser. Only ambiguous or complex pages invoke the full AI pipeline.
This selective invocation keeps processing costs proportional to drawing complexity rather than drawing volume.
Lessons from Building This in Production
Lesson 1: Classification Accuracy Compounds
Errors in page classification propagate to extraction. If a fire alarm plan is misclassified as an electrical power plan, the wrong expert runs, and the output is full of misidentified items. Getting classification right — through the hybrid deterministic + AI approach — is the highest-leverage investment in the pipeline.
Lesson 2: The Blocked List Is as Important as the Allowed List
Telling an expert what NOT to extract is often more valuable than telling it what to extract. Cross-discipline contamination (reporting HVAC items on electrical plans, or electrical items on plumbing plans) is the most common accuracy problem, and hard-blocked vocabularies are the most effective countermeasure.
Lesson 3: Multi-Label Classification Is Essential
Real construction drawings do not always respect discipline boundaries. Combined plans, overlay sheets, and multi-trade details are common. A classification system that forces a single label per page will miss content. Multi-label classification with per-label expert invocation handles the reality of how bid sets are structured.
Lesson 4: Post-Processing Catches Systematic Errors
Even well-tuned experts produce systematic errors — a linear item reported as a discrete count, a discipline mis-assigned because the item description is ambiguous, or a quantity that does not make physical sense. Rule-based post-processing catches these patterns across the entire output, providing a safety net that the individual experts cannot.
The Broader Principle
The mixture-of-experts pattern is not unique to construction. It reflects a general principle in applied AI: domain-specific problems are best solved by compositions of specialized components, not by general-purpose models alone.
In construction extraction, this means:
- The router (page classifier) determines which specialists to consult
- Each specialist (drawing-type expert) has deep knowledge of its narrow domain
- The reconciliation layer (merge and post-processing) ensures the combined output is consistent and accurate
This is how an estimating department works at a human level. It is also how the most effective AI extraction systems work at a technical level. The model's raw capability matters, but the architecture that channels that capability toward the right sub-problem is what makes the system reliable enough for production use.
Aginera uses specialized extraction experts for 15+ construction drawing types — each with its own vocabulary, measurement logic, and validation rules. Upload a drawing set and see the difference.