AI Capabilities Roadmap
$140 billion in public benefits go unclaimed every year. This roadmap tracks the AI capabilities funded by the GitLab Foundation AI for Economic Opportunity grant to address this gap.
The Four Pillars
Document Extraction
Extract structured data from paystubs, IDs, bank statements. Turn documents into verified information.
Fraud Detection
Identify coordinated fraud while maintaining low false-positive rates for legitimate applicants.
Program Generation
Generate intake flows, eligibility logic, and operational frameworks from plain language descriptions.
Multilingual Outreach
Inform applicants about opportunities and deadlines in their own language at scale.
Shared AI Platform Layer (Cross-Cutting)
This roadmap assumes a shared AI platform layer that powers all four pillars. It is both an internal architecture layer and a product capability: model routing, evaluations, safety, and auditability are built once and reused everywhere.Core Capabilities
- Model Router: Multi-provider routing, fallback, and cost/latency-aware selection.
- Prompt + Policy Registry: Versioned prompts, policy templates, and safe defaults per workflow.
- Evaluation Harness: Golden datasets, regression tests, and confidence calibration per document type.
- Observability: Per-request traces, latency, cost, and error analytics tied to submissions.
- PII Handling: Redaction, field-level access controls, and content filtering by program.
- Human Review Queue: Shared reviewer tooling, SLAs, and audit trails across pillars.
- Feature Store: Verified fields and risk signals shared between eligibility, fraud, and outreach.
Data Lineage & Human-in-the-Loop
- Track lineage for every AI-derived field: source document, model/version, confidence, and reviewer action.
- All overrides require a reason code and persist an immutable audit log.
- Reviewer corrections feed back into evaluation datasets and model improvements.
Compliance Readiness (Future)
- Data retention policies and deletion workflows per program.
- Least-privilege access to AI outputs and raw documents.
- Vendor model logging controls and secure transport defaults.
Model Router Options (Decision Matrix)
| Option | Strengths | Tradeoffs | When to Choose |
|---|---|---|---|
| External Router (e.g., OpenRouter) | Fast access to many models; quick experimentation | External dependency; data handling constraints may evolve | Early-stage velocity and multi-model evaluation |
| Internal Router + Direct Providers | Full control of policies, logging, and routing logic | More engineering and vendor integrations | When auditability and custom controls matter most |
| Cloud Provider Router | Enterprise governance and data residency controls | Potential vendor lock-in and narrower model mix | When compliance and regulated data become primary |
Router Decision Criteria
- Data class (PII sensitivity), residency, and retention requirements.
- Cost targets and latency SLAs per use case.
- Model coverage (vision, OCR, structured extraction, reasoning).
- Observability depth (per-request traces, prompt/version tracking).
- Contracting complexity and vendor support timelines.
Internal Router Interface (Draft)
- All AI calls must emit
trace_id,prompt_version, andpolicy_id. - Router decisions should be auditable and replayable for evaluation.
Roadmap Dependencies (Q1 2026)
| Dependency | Enables | Notes |
|---|---|---|
| Extraction confidence scoring | Fraud risk scoring, eligibility logic, outreach targeting | Confidence gates prevent low-quality auto-approvals |
| Verified fields (income, identity, address) | Program eligibility + fraud vectors | Single source of truth reduces duplicate logic |
| Rules engine + reason codes | Case management + applicant appeals | Necessary for transparency and trust |
| Translation system + language preference | Multilingual outreach + form UX | Shared across Terra + Pathfinder |
1. Automated Document Review & Data Extraction
Most benefit programs struggle to turn documentation into structured information. When someone submits a paystub to verify income, staff manually review it, interpret what they’re seeing, and enter data into systems. Human perception varies. Mistakes happen. Processing takes hours.
What Exists
| Component | Status | Location |
|---|---|---|
| Reducto API Integration | ✅ Complete | terra/src/lib/form-import/document-parser.ts |
| Claude Vision Fallback | ✅ Complete | terra/src/lib/form-import/ai-parser.ts |
| PDF/Image Parsing | ✅ Complete | Multi-page, table extraction, 3-120 pages |
| Document Repository | ✅ Complete | pathfinder/src/lib/dal/repositories/documents.repository.ts |
| Document Type Enum | ✅ Complete | Driver’s License, Passport, Paystub, W2, 1099, Tax Return, Bank Statement, Lease, Utility Bill |
| Plaid ID Verification | ✅ Complete | terra/src/lib/plaid.ts - KYC, liveness, document upload |
| Plaid Income Verification | ✅ Complete | Payroll + bank income, employer extraction |
| Extracted Data Storage | ✅ Complete | documents.extracted_data JSON field |
What’s Missing
| Component | Priority | Effort |
|---|---|---|
| Domain-Specific Extractors | P1 | 2-3 weeks |
| Paystub OCR (gross income, deductions, employer, pay period) | ||
| Tax return parsing (AGI, filing status, dependents) | ||
| Bank statement analysis (balance, transactions) | ||
| ID card standardized extraction | ||
| Extraction Confidence Scoring | P1 | 1 week |
| Per-field confidence scores (0-100) | ||
| Low-confidence flagging for manual review | ||
| Document Classification | P2 | 1 week |
| Auto-detect document type from content | ||
| Quality/eligibility checks before processing | ||
| Form Field Auto-Population | P2 | 1-2 weeks |
| Map extracted fields to form questions | ||
| Pre-fill forms from verified documents | ||
| Manual Review Workflow | P2 | 1-2 weeks |
| UI for reviewing/correcting extractions | ||
| Override interface with audit trail |
Architecture
Downstream Implementation Notes
- Document processing pipeline is downstream of upload and classification.
- Pipeline stages are event-driven and re-runnable (classification → extraction → confidence → review).
- All outputs are stored with lineage metadata for auditability and retraining.
2. Fraud Detection That Protects Legitimate Applicants
We will analyze patterns across our more than 400,000 existing applications to build a comprehensive understanding of fraudulent submission behavior. As generative AI makes fake documents increasingly realistic, there need to be models that develop deeper fraud vectors that identify coordinated attempts and synthetic identities.
What Exists
| Component | Status | Location |
|---|---|---|
| Geolocation Tracking | ✅ Complete | terra/src/lib/geolocation.ts |
| IP address (anonymized), country, city, region, user agent | Migration 073 | |
| Duplicate Detection | ✅ Complete | find_potential_duplicates() stored procedure |
| SSN4+DOB (95%), Name+DOB (90%), Email (80%), Phone (70%), Address (75%) | Migration 022 | |
| Applicant Identity Linking | ✅ Complete | applicants, applicant_pii, applicant_profiles tables |
| Cross-app tracking, verification scores, crisis flags | Migration 022 | |
| Audit Logging | ✅ Complete | 28 action types, 13 entity types |
What’s Missing
| Component | Priority | Effort |
|---|---|---|
| Risk Scoring System | P1 | 2-3 weeks |
risk_score column on submissions | ||
| Weighted scoring from multiple vectors | ||
| Configurable thresholds (auto-approve, review, block) | ||
| Sentinel Rules Engine | P1 | 2-3 weeks |
sentinel_rules table with configurable rules | ||
| Rule types: velocity, geographic, behavioral, financial | ||
| Enable/disable rules per program | ||
| Blocklist Management | P1 | 1-2 weeks |
| IP blocklist, email blocklist, phone blocklist | ||
| Expiration dates, reason tracking | ||
| Admin UI for managing blocklists | ||
| Velocity Checks | P2 | 1 week |
| Submissions per IP per hour/day | ||
| Submissions per user per form | ||
| Geographic impossibility (location changes) | ||
| IP Clustering | P2 | 1-2 weeks |
| Group submissions by IP ranges | ||
| Identify coordinated submission patterns | ||
| VPN/proxy detection | ||
| Behavioral Analysis | P2 | 2 weeks |
| Form fill time tracking | ||
| Field navigation patterns | ||
| Submission timing analysis | ||
| Case Management | P3 | 2-3 weeks |
| Investigation workflow UI | ||
| Flag status tracking (active, cleared, confirmed) | ||
| Cross-program fraud actor database |
Architecture
Risk Scoring Model
| Score Range | Action | False Positive Target |
|---|---|---|
| 0-30 | Auto-approve | N/A |
| 31-50 | Low priority review | <5% |
| 51-70 | Standard review | <10% |
| 71-85 | High priority review | <15% |
| 86-100 | Critical + escalation | <20% |
3. Program Template Generation from Proven Models
Good benefit programs share common patterns in eligibility criteria, documentation requirements, workflow design, and fraud controls. By analyzing our existing programs alongside publicly available benefit programs that people love, AI can identify what makes programs work well. Administrators describe their goals in plain language. The goal is to generate customized intake flows, eligibility logic, and operational frameworks based on proven models.
What Exists
| Component | Status | Location |
|---|---|---|
| Form Template System | ✅ Complete | terra/src/app/actions/templates.ts |
| 4 tables + materialized view, 8 categories | Migrations 082-083 | |
| Pre-built Templates | ✅ Complete | Building Permit, Business License, FOIA, Noise Complaint, Park Reservation |
| Template API | ✅ Complete | getTemplates, searchTemplates, createFormFromTemplate |
| Form Duplication | ✅ Complete | duplicateForm() with full audit logging |
| AI Form Import | ✅ Complete | Claude Opus 4.5 + Gemini Flash |
| HTML/PDF parsing, platform detection | terra/src/lib/form-import/ |
What’s Missing
| Component | Priority | Effort |
|---|---|---|
| Template Gallery UI | P1 | 1-2 weeks |
/templates grid view with filters, search | ||
/templates/[slug] detail page with preview | ||
| ”Use This Template” flow | ||
| Program Generation AI | P1 | 3-4 weeks |
| Plain language → eligibility rules | ||
| Goal description → intake flow | ||
| Reference programs → customized templates | ||
| Eligibility Logic Generator | P2 | 2-3 weeks |
| AI-generated conditional logic | ||
| Income thresholds, household rules | ||
| Document requirements inference | ||
| Program Cloning with Variants | P2 | 1 week |
| Clone with modifications | ||
| Version/variant tracking | ||
| Template Marketplace | P3 | 2-3 weeks |
| Community templates | ||
| Ratings and reviews UI | ||
| Featured/popular sections |
Architecture
4. Personalized Outreach in Community Languages
Families miss opportunities because they don’t know about deadlines, timelines, or programs they qualify for. AI enables personalized outreach at scale, informing applicants about relevant opportunities, upcoming deadlines, and next steps in their own language.
What Exists
| Component | Status | Location |
|---|---|---|
| i18n Infrastructure | ✅ Complete | terra/src/lib/i18n.ts |
| 31 languages defined, DeepL integration ready | ||
| Language Preference Field | ✅ Complete | language-preference-field.tsx |
| Flag emojis, native names, form integration | ||
| Language-Aware Notifications | ✅ Complete | terra/src/lib/notifications.ts |
| Email + SMS with language parameter | ||
| Template variable interpolation | ||
| Multi-Provider Support | ✅ Complete | Resend, SendGrid, Twilio, SMTP, Postmark |
| Notification Templates | ✅ Complete | notification-templates.ts |
| Per-event defaults, preview, test send | ||
| Deadline Reminders | ✅ Partial | pathfinder/src/app/actions/calendar-reminders.ts |
| Scheduled notifications, priority levels | Pathfinder only |
What’s Missing
| Component | Priority | Effort |
|---|---|---|
| Bulk Outreach System | P1 | 2-3 weeks |
| Send to multiple applicants at once | ||
| Filter by status, language, deadline | ||
| Campaign management UI | ||
| Deadline Tracking (Terra) | P1 | 1-2 weeks |
Deadline fields on forms table | ||
| Reminder window configuration | ||
| Auto-send X days before deadline | ||
| Job Processor/Scheduler | P1 | 1-2 weeks |
| Cron-based notification queue | ||
| Time-zone aware scheduling | ||
| Retry logic for failed sends | ||
| Language Auto-Detection | P2 | 1 week |
| Browser Accept-Language header | ||
| IP geolocation fallback | ||
| Save preference across sessions | ||
| Translation Completion | P2 | 2-3 weeks |
| Complete translations for top 10 languages | ||
| Admin UI for translation management | ||
| DeepL auto-translate integration | ||
| Outreach Analytics | P2 | 1-2 weeks |
| Delivery metrics by language | ||
| Campaign performance tracking | ||
| A/B testing support | ||
| Conditional Notifications | P3 | 1-2 weeks |
| Different messages by status/language | ||
| Branching logic in templates | ||
| Audience segmentation rules |
Architecture
Implementation Timeline
Success Metrics
| Metric | Current | Target |
|---|---|---|
| Document extraction | Manual (hours per app) | Minutes per application |
| Fraud detection false positive rate | N/A | <15% |
| Language coverage | 2 languages | 20+ languages |
| Program launch speed | Months | <2 weeks |
| Geographic expansion | 16 states | 25 states |
| Distribution capacity | $5M annually | $15M annually |