Visible task input only.
The evaluated system receives only visible task input: exam descriptor, provided findings, locale, and allowed public context.
In official evaluation, the primary verdict is Strict PASS - a case passes only when every clinically decisive gate holds.
Synthetic harness demo, not the official benchmark.
The current public package exposes the harness, scoring contract, synthetic demo suite, preprint materials, and submission rules.
The synthetic demo is not used to judge clinical model performance. No official clinical leaderboard is published in this technical preview.
Official rows require controlled or hosted evaluation with frozen outputs, suite hashes, disclosure metadata, and eligibility review.
The public demo verifies the harness; controlled evaluation scores the model.
The evaluated system receives only visible task input: exam descriptor, provided findings, locale, and allowed public context.
It never receives gold labels, hidden criteria, reference answers, judge prompts, private scoring rules, leakage markers, or hidden test metadata.
Strict PASS is binary per case. Per-dimension means explain failures; they do not erase clinically decisive gates.
Scores explain failure modes. Strict PASS decides whether a case survives.
Decisive imaging findings are checked with negation handling where implemented. In controlled evaluation, hidden labels may be used to verify preservation of urgent or clinically decisive findings. The public mirror does not ship the full clinical category set.
Severity-aware matching checks whether clinically relevant findings are preserved without rewarding unsupported normality. In controlled evaluation, comparison may use hidden labels, adjudicated references, or controlled reference reports - never plain string overlap alone.
Terminology checks detect modality drift, section drift, forbidden openers, local-style violations, and modality-specific vocabulary errors where implemented.
Guideline modules run only when the rule is applicable to the case.
Retrieval fidelity is evaluated only for retrieval-enabled agents.
The method is public. The official test set is not.
The method can be public while the official test set remains controlled. What is published must be reproducible; what protects privacy, anti-contamination, and benchmark integrity stays gated.
Request controlled evaluation for official scoring.
Use lite-public.pt-BR for local smoke testing and contract inspection. Public cases are synthetic:true.
Official rows require frozen outputs, suite hash, disclosure metadata, and eligibility review.
Formal and architectural contribution. Not a clinical-use claim.
This preprint is a formal and architectural contribution. It does not claim clinical validation, regulatory approval, autonomous diagnosis, product clearance, or replacement of radiologist oversight.
The public site contains the method, paper materials, public-safe synthetic demo, and submission contract.
The clinical corpus, raw reports, hidden test set, answer keys, and private scoring criteria are not distributed.