Documentation Index Fetch the complete documentation index at: https://docs-terra.withunify.org/llms.txt
Use this file to discover all available pages before exploring further.
AI Capabilities Roadmap
$140 billion in public benefits go unclaimed every year. This roadmap tracks the AI capabilities funded by the GitLab Foundation AI for Economic Opportunity grant to address this gap.
The Four Pillars
Document Extraction Extract structured data from paystubs, IDs, bank statements. Turn documents into verified information.
Fraud Detection Identify coordinated fraud while maintaining low false-positive rates for legitimate applicants.
Program Generation Generate intake flows, eligibility logic, and operational frameworks from plain language descriptions.
Multilingual Outreach Inform applicants about opportunities and deadlines in their own language at scale.
This roadmap assumes a shared AI platform layer that powers all four pillars. It is both an internal architecture layer and a product capability: model routing, evaluations, safety, and auditability are built once and reused everywhere.
Core Capabilities
Model Router : Multi-provider routing, fallback, and cost/latency-aware selection.
Prompt + Policy Registry : Versioned prompts, policy templates, and safe defaults per workflow.
Evaluation Harness : Golden datasets, regression tests, and confidence calibration per document type.
Observability : Per-request traces, latency, cost, and error analytics tied to submissions.
PII Handling : Redaction, field-level access controls, and content filtering by program.
Human Review Queue : Shared reviewer tooling, SLAs, and audit trails across pillars.
Feature Store : Verified fields and risk signals shared between eligibility, fraud, and outreach.
Data Lineage & Human-in-the-Loop
Track lineage for every AI-derived field: source document, model/version, confidence, and reviewer action.
All overrides require a reason code and persist an immutable audit log.
Reviewer corrections feed back into evaluation datasets and model improvements.
Compliance Readiness (Future)
Data retention policies and deletion workflows per program.
Least-privilege access to AI outputs and raw documents.
Vendor model logging controls and secure transport defaults.
Model Router Options (Decision Matrix)
Option Strengths Tradeoffs When to Choose External Router (e.g., OpenRouter) Fast access to many models; quick experimentation External dependency; data handling constraints may evolve Early-stage velocity and multi-model evaluation Internal Router + Direct Providers Full control of policies, logging, and routing logic More engineering and vendor integrations When auditability and custom controls matter most Cloud Provider Router Enterprise governance and data residency controls Potential vendor lock-in and narrower model mix When compliance and regulated data become primary
Router Decision Criteria
Data class (PII sensitivity), residency, and retention requirements.
Cost targets and latency SLAs per use case.
Model coverage (vision, OCR, structured extraction, reasoning).
Observability depth (per-request traces, prompt/version tracking).
Contracting complexity and vendor support timelines.
Internal Router Interface (Draft)
request :
use_case : document_extraction
inputs :
document_id : doc_123
constraints :
max_latency_ms : 6000
max_cost_usd : 0.05
data_class : pii_high
policy :
model_family : vision
fallback : [ primary , secondary ]
prompt_version : doc_extract_v3
response :
provider : primary
model : vision-large
cost_usd : 0.032
latency_ms : 2810
confidence : 0.91
safety_flags : []
trace_id : trc_abc123
Notes:
All AI calls must emit trace_id, prompt_version, and policy_id.
Router decisions should be auditable and replayable for evaluation.
Roadmap Dependencies (Q1 2026)
Dependency Enables Notes Extraction confidence scoring Fraud risk scoring, eligibility logic, outreach targeting Confidence gates prevent low-quality auto-approvals Verified fields (income, identity, address) Program eligibility + fraud vectors Single source of truth reduces duplicate logic Rules engine + reason codes Case management + applicant appeals Necessary for transparency and trust Translation system + language preference Multilingual outreach + form UX Shared across Terra + Pathfinder
Most benefit programs struggle to turn documentation into structured information. When someone submits a paystub to verify income, staff manually review it, interpret what they’re seeing, and enter data into systems. Human perception varies. Mistakes happen. Processing takes hours.
What Exists
Component Status Location Reducto API Integration ✅ Complete terra/src/lib/form-import/document-parser.tsClaude Vision Fallback ✅ Complete terra/src/lib/form-import/ai-parser.tsPDF/Image Parsing ✅ Complete Multi-page, table extraction, 3-120 pages Document Repository ✅ Complete pathfinder/src/lib/dal/repositories/documents.repository.tsDocument Type Enum ✅ Complete Driver’s License, Passport, Paystub, W2, 1099, Tax Return, Bank Statement, Lease, Utility Bill Plaid ID Verification ✅ Complete terra/src/lib/plaid.ts - KYC, liveness, document uploadPlaid Income Verification ✅ Complete Payroll + bank income, employer extraction Extracted Data Storage ✅ Complete documents.extracted_data JSON field
What’s Missing
Component Priority Effort Domain-Specific Extractors P1 2-3 weeks Paystub OCR (gross income, deductions, employer, pay period) Tax return parsing (AGI, filing status, dependents) Bank statement analysis (balance, transactions) ID card standardized extraction Extraction Confidence Scoring P1 1 week Per-field confidence scores (0-100) Low-confidence flagging for manual review Document Classification P2 1 week Auto-detect document type from content Quality/eligibility checks before processing Form Field Auto-Population P2 1-2 weeks Map extracted fields to form questions Pre-fill forms from verified documents Manual Review Workflow P2 1-2 weeks UI for reviewing/correcting extractions Override interface with audit trail
Architecture
Downstream Implementation Notes
Document processing pipeline is downstream of upload and classification.
Pipeline stages are event-driven and re-runnable (classification → extraction → confidence → review).
All outputs are stored with lineage metadata for auditability and retraining.
2. Fraud Detection That Protects Legitimate Applicants
We will analyze patterns across our more than 400,000 existing applications to build a comprehensive understanding of fraudulent submission behavior. As generative AI makes fake documents increasingly realistic, there need to be models that develop deeper fraud vectors that identify coordinated attempts and synthetic identities.
What Exists
Component Status Location Geolocation Tracking ✅ Complete terra/src/lib/geolocation.tsIP address (anonymized), country, city, region, user agent Migration 073 Duplicate Detection ✅ Complete find_potential_duplicates() stored procedureSSN4+DOB (95%), Name+DOB (90%), Email (80%), Phone (70%), Address (75%) Migration 022 Applicant Identity Linking ✅ Complete applicants, applicant_pii, applicant_profiles tablesCross-app tracking, verification scores, crisis flags Migration 022 Audit Logging ✅ Complete 28 action types, 13 entity types
What’s Missing
Component Priority Effort Risk Scoring System P1 2-3 weeks risk_score column on submissionsWeighted scoring from multiple vectors Configurable thresholds (auto-approve, review, block) Sentinel Rules Engine P1 2-3 weeks sentinel_rules table with configurable rulesRule types: velocity, geographic, behavioral, financial Enable/disable rules per program Blocklist Management P1 1-2 weeks IP blocklist, email blocklist, phone blocklist Expiration dates, reason tracking Admin UI for managing blocklists Velocity Checks P2 1 week Submissions per IP per hour/day Submissions per user per form Geographic impossibility (location changes) IP Clustering P2 1-2 weeks Group submissions by IP ranges Identify coordinated submission patterns VPN/proxy detection Behavioral Analysis P2 2 weeks Form fill time tracking Field navigation patterns Submission timing analysis Case Management P3 2-3 weeks Investigation workflow UI Flag status tracking (active, cleared, confirmed) Cross-program fraud actor database
Architecture
Risk Scoring Model
Score Range Action False Positive Target 0-30 Auto-approve N/A 31-50 Low priority review <5% 51-70 Standard review <10% 71-85 High priority review <15% 86-100 Critical + escalation <20%
3. Program Template Generation from Proven Models
Good benefit programs share common patterns in eligibility criteria, documentation requirements, workflow design, and fraud controls. By analyzing our existing programs alongside publicly available benefit programs that people love, AI can identify what makes programs work well. Administrators describe their goals in plain language. The goal is to generate customized intake flows, eligibility logic, and operational frameworks based on proven models.
What Exists
Component Status Location Form Template System ✅ Complete terra/src/app/actions/templates.ts4 tables + materialized view, 8 categories Migrations 082-083 Pre-built Templates ✅ Complete Building Permit, Business License, FOIA, Noise Complaint, Park Reservation Template API ✅ Complete getTemplates, searchTemplates, createFormFromTemplate Form Duplication ✅ Complete duplicateForm() with full audit loggingAI Form Import ✅ Complete Claude Opus 4.5 + Gemini Flash HTML/PDF parsing, platform detection terra/src/lib/form-import/
What’s Missing
Component Priority Effort Template Gallery UI P1 1-2 weeks /templates grid view with filters, search/templates/[slug] detail page with preview”Use This Template” flow Program Generation AI P1 3-4 weeks Plain language → eligibility rules Goal description → intake flow Reference programs → customized templates Eligibility Logic Generator P2 2-3 weeks AI-generated conditional logic Income thresholds, household rules Document requirements inference Program Cloning with Variants P2 1 week Clone with modifications Version/variant tracking Template Marketplace P3 2-3 weeks Community templates Ratings and reviews UI Featured/popular sections
Architecture
Families miss opportunities because they don’t know about deadlines, timelines, or programs they qualify for. AI enables personalized outreach at scale, informing applicants about relevant opportunities, upcoming deadlines, and next steps in their own language.
What Exists
Component Status Location i18n Infrastructure ✅ Complete terra/src/lib/i18n.ts31 languages defined, DeepL integration ready Language Preference Field ✅ Complete language-preference-field.tsxFlag emojis, native names, form integration Language-Aware Notifications ✅ Complete terra/src/lib/notifications.tsEmail + SMS with language parameter Template variable interpolation Multi-Provider Support ✅ Complete Resend, SendGrid, Twilio, SMTP, Postmark Notification Templates ✅ Complete notification-templates.tsPer-event defaults, preview, test send Deadline Reminders ✅ Partial pathfinder/src/app/actions/calendar-reminders.tsScheduled notifications, priority levels Pathfinder only
What’s Missing
Component Priority Effort Bulk Outreach System P1 2-3 weeks Send to multiple applicants at once Filter by status, language, deadline Campaign management UI Deadline Tracking (Terra) P1 1-2 weeks Deadline fields on forms table Reminder window configuration Auto-send X days before deadline Job Processor/Scheduler P1 1-2 weeks Cron-based notification queue Time-zone aware scheduling Retry logic for failed sends Language Auto-Detection P2 1 week Browser Accept-Language header IP geolocation fallback Save preference across sessions Translation Completion P2 2-3 weeks Complete translations for top 10 languages Admin UI for translation management DeepL auto-translate integration Outreach Analytics P2 1-2 weeks Delivery metrics by language Campaign performance tracking A/B testing support Conditional Notifications P3 1-2 weeks Different messages by status/language Branching logic in templates Audience segmentation rules
Architecture
Implementation Timeline
Success Metrics
Metric Current Target Document extraction Manual (hours per app) Minutes per application Fraud detection false positive rate N/A <15% Language coverage 2 languages 20+ languages Program launch speed Months <2 weeks Geographic expansion 16 states 25 states Distribution capacity $5M annually $15M annually
Platform Vision How Terra, Pathfinder, Forge, Sentinel, and Hub work together
Engineering Planning Quarterly roadmap and capability audit
Sentinel Introduction Fraud analysis platform deep dive
Hub Introduction Unified applicant view deep dive