Skip to main content

AI Capabilities Roadmap

$140 billion in public benefits go unclaimed every year. This roadmap tracks the AI capabilities funded by the GitLab Foundation AI for Economic Opportunity grant to address this gap.

The Four Pillars


Shared AI Platform Layer (Cross-Cutting)

This roadmap assumes a shared AI platform layer that powers all four pillars. It is both an internal architecture layer and a product capability: model routing, evaluations, safety, and auditability are built once and reused everywhere.

Core Capabilities

  • Model Router: Multi-provider routing, fallback, and cost/latency-aware selection.
  • Prompt + Policy Registry: Versioned prompts, policy templates, and safe defaults per workflow.
  • Evaluation Harness: Golden datasets, regression tests, and confidence calibration per document type.
  • Observability: Per-request traces, latency, cost, and error analytics tied to submissions.
  • PII Handling: Redaction, field-level access controls, and content filtering by program.
  • Human Review Queue: Shared reviewer tooling, SLAs, and audit trails across pillars.
  • Feature Store: Verified fields and risk signals shared between eligibility, fraud, and outreach.

Data Lineage & Human-in-the-Loop

  • Track lineage for every AI-derived field: source document, model/version, confidence, and reviewer action.
  • All overrides require a reason code and persist an immutable audit log.
  • Reviewer corrections feed back into evaluation datasets and model improvements.

Compliance Readiness (Future)

  • Data retention policies and deletion workflows per program.
  • Least-privilege access to AI outputs and raw documents.
  • Vendor model logging controls and secure transport defaults.

Model Router Options (Decision Matrix)

OptionStrengthsTradeoffsWhen to Choose
External Router (e.g., OpenRouter)Fast access to many models; quick experimentationExternal dependency; data handling constraints may evolveEarly-stage velocity and multi-model evaluation
Internal Router + Direct ProvidersFull control of policies, logging, and routing logicMore engineering and vendor integrationsWhen auditability and custom controls matter most
Cloud Provider RouterEnterprise governance and data residency controlsPotential vendor lock-in and narrower model mixWhen compliance and regulated data become primary

Router Decision Criteria

  • Data class (PII sensitivity), residency, and retention requirements.
  • Cost targets and latency SLAs per use case.
  • Model coverage (vision, OCR, structured extraction, reasoning).
  • Observability depth (per-request traces, prompt/version tracking).
  • Contracting complexity and vendor support timelines.

Internal Router Interface (Draft)

request:
  use_case: document_extraction
  inputs:
    document_id: doc_123
  constraints:
    max_latency_ms: 6000
    max_cost_usd: 0.05
    data_class: pii_high
  policy:
    model_family: vision
    fallback: [primary, secondary]
    prompt_version: doc_extract_v3

response:
  provider: primary
  model: vision-large
  cost_usd: 0.032
  latency_ms: 2810
  confidence: 0.91
  safety_flags: []
  trace_id: trc_abc123
Notes:
  • All AI calls must emit trace_id, prompt_version, and policy_id.
  • Router decisions should be auditable and replayable for evaluation.

Roadmap Dependencies (Q1 2026)

DependencyEnablesNotes
Extraction confidence scoringFraud risk scoring, eligibility logic, outreach targetingConfidence gates prevent low-quality auto-approvals
Verified fields (income, identity, address)Program eligibility + fraud vectorsSingle source of truth reduces duplicate logic
Rules engine + reason codesCase management + applicant appealsNecessary for transparency and trust
Translation system + language preferenceMultilingual outreach + form UXShared across Terra + Pathfinder

1. Automated Document Review & Data Extraction

Most benefit programs struggle to turn documentation into structured information. When someone submits a paystub to verify income, staff manually review it, interpret what they’re seeing, and enter data into systems. Human perception varies. Mistakes happen. Processing takes hours.

What Exists

ComponentStatusLocation
Reducto API Integration✅ Completeterra/src/lib/form-import/document-parser.ts
Claude Vision Fallback✅ Completeterra/src/lib/form-import/ai-parser.ts
PDF/Image Parsing✅ CompleteMulti-page, table extraction, 3-120 pages
Document Repository✅ Completepathfinder/src/lib/dal/repositories/documents.repository.ts
Document Type Enum✅ CompleteDriver’s License, Passport, Paystub, W2, 1099, Tax Return, Bank Statement, Lease, Utility Bill
Plaid ID Verification✅ Completeterra/src/lib/plaid.ts - KYC, liveness, document upload
Plaid Income Verification✅ CompletePayroll + bank income, employer extraction
Extracted Data Storage✅ Completedocuments.extracted_data JSON field

What’s Missing

ComponentPriorityEffort
Domain-Specific ExtractorsP12-3 weeks
Paystub OCR (gross income, deductions, employer, pay period)
Tax return parsing (AGI, filing status, dependents)
Bank statement analysis (balance, transactions)
ID card standardized extraction
Extraction Confidence ScoringP11 week
Per-field confidence scores (0-100)
Low-confidence flagging for manual review
Document ClassificationP21 week
Auto-detect document type from content
Quality/eligibility checks before processing
Form Field Auto-PopulationP21-2 weeks
Map extracted fields to form questions
Pre-fill forms from verified documents
Manual Review WorkflowP21-2 weeks
UI for reviewing/correcting extractions
Override interface with audit trail

Architecture

Downstream Implementation Notes

  • Document processing pipeline is downstream of upload and classification.
  • Pipeline stages are event-driven and re-runnable (classification → extraction → confidence → review).
  • All outputs are stored with lineage metadata for auditability and retraining.

2. Fraud Detection That Protects Legitimate Applicants

We will analyze patterns across our more than 400,000 existing applications to build a comprehensive understanding of fraudulent submission behavior. As generative AI makes fake documents increasingly realistic, there need to be models that develop deeper fraud vectors that identify coordinated attempts and synthetic identities.

What Exists

ComponentStatusLocation
Geolocation Tracking✅ Completeterra/src/lib/geolocation.ts
IP address (anonymized), country, city, region, user agentMigration 073
Duplicate Detection✅ Completefind_potential_duplicates() stored procedure
SSN4+DOB (95%), Name+DOB (90%), Email (80%), Phone (70%), Address (75%)Migration 022
Applicant Identity Linking✅ Completeapplicants, applicant_pii, applicant_profiles tables
Cross-app tracking, verification scores, crisis flagsMigration 022
Audit Logging✅ Complete28 action types, 13 entity types

What’s Missing

ComponentPriorityEffort
Risk Scoring SystemP12-3 weeks
risk_score column on submissions
Weighted scoring from multiple vectors
Configurable thresholds (auto-approve, review, block)
Sentinel Rules EngineP12-3 weeks
sentinel_rules table with configurable rules
Rule types: velocity, geographic, behavioral, financial
Enable/disable rules per program
Blocklist ManagementP11-2 weeks
IP blocklist, email blocklist, phone blocklist
Expiration dates, reason tracking
Admin UI for managing blocklists
Velocity ChecksP21 week
Submissions per IP per hour/day
Submissions per user per form
Geographic impossibility (location changes)
IP ClusteringP21-2 weeks
Group submissions by IP ranges
Identify coordinated submission patterns
VPN/proxy detection
Behavioral AnalysisP22 weeks
Form fill time tracking
Field navigation patterns
Submission timing analysis
Case ManagementP32-3 weeks
Investigation workflow UI
Flag status tracking (active, cleared, confirmed)
Cross-program fraud actor database

Architecture

Risk Scoring Model

Score RangeActionFalse Positive Target
0-30Auto-approveN/A
31-50Low priority review<5%
51-70Standard review<10%
71-85High priority review<15%
86-100Critical + escalation<20%

3. Program Template Generation from Proven Models

Good benefit programs share common patterns in eligibility criteria, documentation requirements, workflow design, and fraud controls. By analyzing our existing programs alongside publicly available benefit programs that people love, AI can identify what makes programs work well. Administrators describe their goals in plain language. The goal is to generate customized intake flows, eligibility logic, and operational frameworks based on proven models.

What Exists

ComponentStatusLocation
Form Template System✅ Completeterra/src/app/actions/templates.ts
4 tables + materialized view, 8 categoriesMigrations 082-083
Pre-built Templates✅ CompleteBuilding Permit, Business License, FOIA, Noise Complaint, Park Reservation
Template API✅ CompletegetTemplates, searchTemplates, createFormFromTemplate
Form Duplication✅ CompleteduplicateForm() with full audit logging
AI Form Import✅ CompleteClaude Opus 4.5 + Gemini Flash
HTML/PDF parsing, platform detectionterra/src/lib/form-import/

What’s Missing

ComponentPriorityEffort
Template Gallery UIP11-2 weeks
/templates grid view with filters, search
/templates/[slug] detail page with preview
”Use This Template” flow
Program Generation AIP13-4 weeks
Plain language → eligibility rules
Goal description → intake flow
Reference programs → customized templates
Eligibility Logic GeneratorP22-3 weeks
AI-generated conditional logic
Income thresholds, household rules
Document requirements inference
Program Cloning with VariantsP21 week
Clone with modifications
Version/variant tracking
Template MarketplaceP32-3 weeks
Community templates
Ratings and reviews UI
Featured/popular sections

Architecture


4. Personalized Outreach in Community Languages

Families miss opportunities because they don’t know about deadlines, timelines, or programs they qualify for. AI enables personalized outreach at scale, informing applicants about relevant opportunities, upcoming deadlines, and next steps in their own language.

What Exists

ComponentStatusLocation
i18n Infrastructure✅ Completeterra/src/lib/i18n.ts
31 languages defined, DeepL integration ready
Language Preference Field✅ Completelanguage-preference-field.tsx
Flag emojis, native names, form integration
Language-Aware Notifications✅ Completeterra/src/lib/notifications.ts
Email + SMS with language parameter
Template variable interpolation
Multi-Provider Support✅ CompleteResend, SendGrid, Twilio, SMTP, Postmark
Notification Templates✅ Completenotification-templates.ts
Per-event defaults, preview, test send
Deadline Reminders✅ Partialpathfinder/src/app/actions/calendar-reminders.ts
Scheduled notifications, priority levelsPathfinder only

What’s Missing

ComponentPriorityEffort
Bulk Outreach SystemP12-3 weeks
Send to multiple applicants at once
Filter by status, language, deadline
Campaign management UI
Deadline Tracking (Terra)P11-2 weeks
Deadline fields on forms table
Reminder window configuration
Auto-send X days before deadline
Job Processor/SchedulerP11-2 weeks
Cron-based notification queue
Time-zone aware scheduling
Retry logic for failed sends
Language Auto-DetectionP21 week
Browser Accept-Language header
IP geolocation fallback
Save preference across sessions
Translation CompletionP22-3 weeks
Complete translations for top 10 languages
Admin UI for translation management
DeepL auto-translate integration
Outreach AnalyticsP21-2 weeks
Delivery metrics by language
Campaign performance tracking
A/B testing support
Conditional NotificationsP31-2 weeks
Different messages by status/language
Branching logic in templates
Audience segmentation rules

Architecture


Implementation Timeline


Success Metrics

MetricCurrentTarget
Document extractionManual (hours per app)Minutes per application
Fraud detection false positive rateN/A<15%
Language coverage2 languages20+ languages
Program launch speedMonths<2 weeks
Geographic expansion16 states25 states
Distribution capacity$5M annually$15M annually