╔════════════════════════════════════════════════════════════════════╗ ║ ReqIF -> OPA -> SARIF Compliance Gate ║ ╚════════════════════════════════════════════════════════════════════╝
Deterministic requirements ingestion and policy evaluation with traceable evidence.
Status
Executive view:
- This repo turns governed source artifacts into traceable compliance decisions.
- It keeps extraction deterministic, keeps ReqIF derived, and keeps policy judgement in OPA.
Engineer view:
- There are two active surfaces:
reqif_mcp/for ReqIF-centric gate operations andreqif_ingest_cli/for source-to-ReqIF derivation.
flowchart LR
SRC[Governed source artifacts] --> DERIVE[Deterministic derivation]
DERIVE --> REQIF[Derived ReqIF baseline]
REQIF --> OPA[OPA policy gate]
OPA --> EVIDENCE[Traceable evidence output]
Surface summary:
reqif_mcp/- FastMCP server for parsing, validating, querying, and evaluating existing ReqIF baselines.
- Verification events and decision logs are written to the evidence store.
reqif_ingest_cli/- Standalone deterministic ingestion pipeline for source artifacts.
- First-class XLSX support, offline PDF text extraction, derived ReqIF emission, and optional Azure Foundry review hooks.
What exists today:
- ReqIF parse/normalize/query/verification via MCP
- Deterministic source-to-ReqIF derivation for XLSX, text-layer PDF, DOCX, and Markdown
- Standards sample baselines for OWASP ASVS and NIST SSDF repo self-testing
- Selective compliance-gate filtering by requirement key, attribute, text fragment, or limit
- Typed test coverage for parser, normalization, SARIF mapping, compliance gate, and ingest
What is still future work:
- ingest surfaced as MCP tools
- normalized diffing between successive source versions
- persistent baseline storage beyond in-memory handles
- richer PDF structure extraction with pre-seeded offline Docling models
- externalized profile/config mapping instead of code-first profile logic
Architecture
The system has two deliberate pipelines:
- ingestion pipeline
- source artifact -> artifact record -> document graph -> requirement candidate -> derived ReqIF
- gate pipeline
- derived ReqIF + agent facts -> OPA decision -> SARIF + verification events
The control boundary is equally deliberate:
- ingestion is standalone and deterministic first
- ReqIF remains a derived artifact
- agents produce facts, not pass/fail decisions
- OPA remains the gate and policy authority
Gate Criteria Model (beyond pass/fail)
The compliance gate is not a single boolean check. It evaluates three layers:
meta_policy- baseline sanity, subtype selection, filter results, facts shape, and bundle/package alignment
processing- OPA execution, SARIF generation, verification-event writes, and other runtime failures
policy- the actual requirement decision returned by OPA
Each OPA decision carries:
- overall
status - weighted
criteria[] - human-readable
reasons[] - policy provenance in
policy
flowchart TD
R[Selected requirement] --> MP{Meta-policy valid?}
MP -- no --> MFAIL[Hard gate failure<br/>EMPTY_BASELINE / EMPTY_SELECTION / etc.]
MP -- yes --> OPA[OPA decision]
OPA --> CRIT[criteria, reasons, and policy provenance]
CRIT --> PROC{Processing healthy?}
PROC -- no --> PFAIL[Hard gate failure<br/>OPA / SARIF / verification error]
PROC -- yes --> DEC{Decision status}
DEC -- fail/high severity --> GFAIL[Gate failure]
DEC -- pass or acceptable --> GPASS[Gate passes]
DEC -- conditional / blocked / inconclusive --> GREVIEW[Review path]
Useful filter controls during self-test runs or triage:
--requirement-key--attribute-filter--text-contains--limit
Quick Start
Root repo:
just install just check just serve
Ingest and derived ReqIF smoke:
just -f reqif_ingest_cli/justfile check just selftest-ingest
Repo security self-test:
just selftest-asvs just selftest-asvs-cwe CWE-20 just selftest-ssdf
Emit a derived ReqIF from the tracked AESCSF core workbook:
uv run python -m reqif_ingest_cli emit-reqif \ "samples/aemo/The AESCSF v2 Core.xlsx" \ --title "AESCSF Core Derived Baseline" \ --output "/tmp/aescsf-core.reqif" \ --pretty
Repo Map
flowchart TD
ROOT[repo root] --> MCP[reqif_mcp/]
ROOT --> ING[reqif_ingest_cli/]
ROOT --> AGENTS[agents/]
ROOT --> BUNDLES[opa-bundles/]
ROOT --> SAMPLES[samples/]
ROOT --> TESTS[tests/]
AGENTS --> GATE[repo-security-agent]
BUNDLES --> GATE
SAMPLES --> ING
SAMPLES --> TESTS
reqif_mcp/- FastMCP server and current ReqIF evaluation surfacereqif_ingest_cli/- deterministic artifact intake, extraction, distillation, and ReqIF emissionagents/- deterministic facts producers, including repo-security self-testingopa-bundles/- example and standards sample policy bundlessamples/- tracked source artifacts, contracts, and standards fixturestests/- parser, normalization, SARIF, gate, and ingest tests
ReqIF MCP Server
sequenceDiagram
participant Client
participant MCP as reqif_mcp
participant Store as in-memory baseline store
participant OPA as opa
participant Evidence as evidence_store
Client->>MCP: reqif_parse / reqif_validate / reqif_query
MCP->>Store: normalize and store baseline handle
Client->>MCP: reqif_write_verification
MCP->>OPA: evaluate requirement against facts
OPA-->>MCP: decision
MCP->>Evidence: verification event + decision log
Current tool surface in reqif_mcp/server.py:
reqif_parsereqif_validatereqif_queryreqif_export_req_setreqif_write_verification
Current implementation notes:
- baselines are stored in memory by handle
- ReqIF is accepted as base64 XML input
- HTTP and STDIO transports are supported
- evaluation and evidence are separate from ingestion
Standalone Ingest CLI
flowchart LR
REG[register-artifact] --> EXT[extract]
EXT --> DG[document_graph]
DG --> DIST[distill]
DIST --> RC[requirement_candidate]
RC --> EMIT[emit-reqif]
CFG[foundry-config] -. optional review hook .-> DIST
Available commands in reqif_ingest_cli/__main__.py:
register-artifactextractdistillemit-reqiffoundry-config
Current extraction profiles:
aescsf_core_v2aescsf_toolkit_v1_1generic_xlsx_tablepdf_docling_v1docx_docling_v1markdown_docling_v1
Current implementation notes:
- XLSX is first-class and deterministic
- PDF prefers
pypdffor offline text-layer extraction doclingremains the richer path for DOCX and Markdown- Azure Foundry integration is optional and not part of the deterministic first pass
- Install
ingest-litewithuv sync --extra ingest-litefor XLSX plus text-layer PDF support - Install
ingest-fullwithuv sync --extra ingest-fullfor docling-backed extraction - Install
llm-reviewwithuv sync --extra llm-reviewfor optional Foundry quality-eval hooks
See README-reqif-ingest-cli.md for command details. See README-azure.md for Azure DevOps CI, deployment, and demo guidance.
Samples and Fixtures
The main README now links to sample indexes instead of carrying inline sample payloads.
flowchart TD
S[samples/] --> A[aemo/]
S --> C[contracts/]
S --> ST[standards/]
A --> ING[ingest smoke inputs]
C --> DOCS[README + tests]
ST --> DOGFOOD[ASVS / SSDF gate samples]
Start here:
samples/README.md- sample inventory and navigationsamples/aemo/README.md- tracked AEMO source artifacts used by ingestsamples/contracts/README.md- JSON contracts referenced by docs and testssamples/standards/README.md- upstream standards material and derived self-test baselines
Repo Self-Testing With GitHub and Copilot
flowchart LR
PR[Pull request] --> QC[just check]
PR --> INGEST[just selftest-ingest]
PR --> ASVS[just selftest-asvs]
PR --> SSDF[just selftest-ssdf]
ASVS --> SARIF[SARIF artifacts]
SSDF --> SUMMARY[gate summary]
QC --> COPILOT[Copilot reviews current command surface]
Recommended CI/CD workflow shape for this repo:
- pull request quality gate
- run
just check - run
just check-ingest - fail fast on parser, schema, or ingest regressions
- run
- sample artifact smoke run
- trigger when
samples/**,reqif_ingest_cli/**, orREADME-reqif-ingest-cli.mdchanges - emit derived ReqIF from the tracked AESCSF workbooks
- upload the derived ReqIF as a build artifact for review
- trigger when
- security standards self-test
- trigger when
reqif_mcp/**,opa-bundles/**,agents/**, orsamples/standards/**changes - evaluate repo facts against OWASP ASVS and NIST SSDF sample bundles
- upload gate summaries and SARIF artifacts
- trigger when
- baseline drift check
- trigger on changes to tracked source documents
- emit ReqIF and compare normalized requirements against the last known baseline
Azure DevOps deployment and demo details live in README-azure.md.
Recommended Copilot usage in this repo:
- treat
justfileandreqif_ingest_cli/justfileas the public command surface - reuse
samples/contracts/*.jsonrather than inventing ad hoc JSON examples - update tests first when changing parser, gate, or ingest behavior
- keep raw ReqIF XML generation behind the emitter instead of hand-editing XML
Data Contracts
flowchart LR
ART[artifact/1] --> DG[document_graph/1]
DG --> CAND[requirement_candidate/1]
CAND --> REQ[requirement record]
FACTS[agent facts] --> OPA[OPA decision]
REQ --> OPA
OPA --> SARIF[SARIF result]
Use the tracked sample files instead of copying large JSON blobs into the README:
samples/contracts/requirement-record.example.jsonsamples/contracts/requirement-candidate.example.jsonsamples/contracts/agent-facts.example.jsonsamples/contracts/opa-output.example.json
Contract descriptions and file-level notes live in samples/contracts/README.md.
Standards and Formats
flowchart LR
ASVS[OWASP ASVS] --> SREQ[Derived ReqIF sample]
SSDF[NIST SSDF] --> SREQ
SREQ --> OPA[OPA bundles]
OPA --> SARIF[SARIF / gate output]
REQIF[ReqIF 1.2] --> MCP[reqif_mcp]
SARIF --> GITHUB[GitHub code scanning]
Implemented and tracked here:
- ReqIF 1.2
- SARIF v2.1.0
- OPA bundles and decision logs
- FastMCP
- OWASP ASVS sample baseline and bundle
- NIST SSDF sample baseline and bundle
Roadmap
flowchart LR
NOW[Current] --> NEXT1[Expose ingest as MCP tools]
NEXT1 --> NEXT2[Externalize profile mappings]
NEXT2 --> NEXT3[Normalized baseline diffing]
NEXT3 --> LATER1[Persistent baseline storage]
NEXT3 --> LATER2[Richer offline PDF structure extraction]
Near term:
- expose ingest as MCP tools
- externalize document profiles and header maps from Python into config
- implement normalized requirement diffing across source versions
- publish derived ReqIF, gate summaries, and SARIF artifacts in CI
Later:
- persistent baseline storage
- richer PDF structure extraction with preloaded offline models
- document-family specific distillation profiles beyond AESCSF
- broader repo-security fact extraction and policy coverage