DICOM File Support -- Phase 1¶

Status: Proposed Author: SD Created: 2026-04-10 GitHub Issue: #147 Branch: feat/dicom-support

1. Overview and Motivation¶

What¶

Accept DICOM (.dcm/.dicom) medical imaging files in the Curaway upload flow, extract structured metadata (body part, modality, study date, laterality, patient demographics), parse DICOM-SR (Structured Reports) for radiology findings, de-identify patient PII from headers before provider forwarding, auto-match imaging metadata against procedure requirements, and render a thumbnail preview in the EHR drawer.

Why¶

Patients have DICOM files. Radiology departments frequently provide imaging on CD/USB as DICOM, not PDF. Patients currently cannot upload these -- they must convert to PDF first, losing structured metadata.
Structured metadata > OCR. DICOM headers contain machine-readable body part, modality, study date, and laterality. This is 100% accurate data that the OCR + LLM pipeline would otherwise have to infer from free text with lower confidence.
DICOM-SR carries radiology findings. Structured Reports embedded in DICOM files contain the radiologist's findings, measurements, and impressions in a parseable format -- no OCR needed.
Procedure requirement matching. Auto-matching BodyPartExamined=KNEE + Modality=MR against a TKR procedure's "Knee MRI" requirement is deterministic and instant.
De-identification is mandatory. DICOM headers contain patient name, DOB, and medical record numbers. These must be stripped before forwarding to international providers (GDPR + HIPAA Safe Harbor).
Cross-border value. Patients traveling from US/UK/UAE to India/Turkey/Thailand often carry imaging on disc. Supporting DICOM directly removes a friction point in the onboarding flow.

What This Is Not¶

Not a full PACS viewer. No windowing, level adjustment, or multi-frame cine playback.
Not multi-file DICOM series handling (Phase 2).
Not DICOM networking (C-STORE, C-FIND, WADO-RS) -- that is post-Series A.
Not DICOM-RT (radiation therapy) support.

2. Architecture Decisions¶

Where `dicom_parser.py` Fits¶

New service at app/services/dicom_parser.py -- sits alongside document_processing.py in the services layer. It is a pure utility module: takes bytes in, returns structured dict out. No DB access, no side effects.

Upload flow (existing)
  |
  v
document_service.confirm_upload()
  |
  +-- file_type == "dicom" ?
  |     |
  |     YES --> dicom_parser.parse_dicom(file_bytes)
  |     |         returns: {metadata, sr_findings, thumbnail_bytes, deidentified_bytes}
  |     |
  |     +-- Store deidentified_bytes to R2 (overwrite original)
  |     +-- Store thumbnail to R2 ({storage_key}_thumb.png)
  |     +-- Save metadata + sr_findings to doc.extracted_data
  |     +-- Skip OCR, go straight to run_post_ocr_pipeline() with extracted text from SR
  |     |
  |     NO --> existing OCR pipeline (unchanged)
  |
  v
run_post_ocr_pipeline()  <-- receives SR text or metadata-derived text
  |
  v
Clinical Context Agent --> FHIR resources --> requirement matching --> EHR rebuild

Key Design Choices¶

Decision	Choice	Rationale
DICOM library	pydicom 2.4+	Industry standard, pure Python, no C dependencies, active maintenance, 15K+ GitHub stars
Thumbnail generation	pydicom + Pillow	pydicom reads pixel data, Pillow converts to PNG. No GDAL/VTK heavyweight dependencies.
De-identification	Custom tag stripper (Safe Harbor)	Full DICOM anonymization suites (deid, DicomAnonymizer) are overkill. We strip a known tag list -- deterministic, auditable, <50 lines.
SR parsing	pydicom ContentSequence traversal	No need for highdicom at MVP. Walk the SR tree, extract TEXT/NUM/CODE value types.
Storage	De-identified bytes overwrite original in R2	Never persist PII-bearing DICOM on our infrastructure. Original is replaced.
Thumbnail format	PNG, 256x256 max, 8-bit grayscale	Small enough for inline preview, good enough for "is this the right scan?"
Feature flag	`dicom_support_enabled` (Flagsmith)	Kill switch for the entire feature. Default off until tested.

3. Implementation Checklist¶

Tier 1: Opus (Architecture, Clinical Logic, Security)¶

[ ] O1: app/services/dicom_parser.py -- core parser module [NEW FILE]
parse_dicom(file_bytes: bytes) -> DicomParseResult -- main entry point
extract_metadata(ds: pydicom.Dataset) -> dict -- pull structured tags
extract_sr_findings(ds: pydicom.Dataset) -> list[SRFinding] -- walk SR content tree
deidentify(ds: pydicom.Dataset) -> pydicom.Dataset -- strip PII tags per Safe Harbor
generate_thumbnail(ds: pydicom.Dataset) -> bytes | None -- pixel data to PNG
build_text_representation(metadata: dict, findings: list) -> str -- synthesize text for Clinical Context Agent
[ ] O2: De-identification tag list and logic [IN dicom_parser.py]
Implement Safe Harbor tag stripping (see Section 6)
Replace stripped values with safe placeholders (e.g., "DEIDENTIFIED")
Preserve all clinical tags (BodyPartExamined, Modality, StudyDate, etc.)
Write to new Dataset, never modify-in-place on the input
Log deidentification event to audit table (tag count stripped, no PII values)
[ ] O3: DICOM-SR parsing logic [IN dicom_parser.py]
Walk ContentSequence recursively
Extract TEXT, NUM, CODE, PNAME, DATE, TIME, UIDREF, COMPOSITE value types
Map SR concept names to clinical meaning (see Section 7)
Return structured findings list
[ ] O4: Wire into document_processing.py [MODIFY]
Add DICOM branch in the processing pipeline
Skip OCR for DICOM files
Pass SR text + metadata-derived text to run_post_ocr_pipeline()
Set ocr_method = "dicom_metadata" for DICOM files
[ ] O5: Wire into attachment_handler.py [MODIFY]
Add DICOM detection in process_attachments()
When file_type == "dicom", skip inline OCR, use extracted_data directly
Set report_type = "imaging" for Clinical Context Agent
[ ] O6: Auto-match body part + modality against procedure requirements [MODIFY requirement_matcher.py]
New function: match_dicom_metadata_to_requirements(metadata, proc_reqs) -> list[dict]
Match BodyPartExamined + Modality against requirement descriptions
Higher confidence than LLM matching (deterministic, 0.95+ confidence)
[ ] O7: FHIR ImagingStudy resource generation [MODIFY clinical_context.py prompts]
When DICOM metadata is available, generate FHIR ImagingStudy (not just Condition/Observation)
Map modality, body part, study date, accession number to ImagingStudy fields
Store via fhir_service.create_fhir_resource()

Tier 2: Sonnet (Mechanical Implementation, Config, Tests)¶

[ ] S1: Update config/guardrails.yaml [MODIFY]
Add .dcm, .dicom to frontend.allowed_extensions
Add application/dicom to backend.allowed_mime_types (already present, verify)
Add DICOM-specific medical keywords: dicom, modality, body part, series
[ ] S2: Update app/services/file_validator.py [MODIFY]
Add .dcm, .dicom to extension validation
Add application/dicom MIME type mapping
DICOM files skip medical keyword check (metadata provides clinical context natively)
[ ] S3: Update presign endpoint [MODIFY app/routers/documents.py]
Accept .dcm/.dicom extensions in presign request validation
Set file_type = "dicom" in DocumentReference creation
[ ] S4: Update confirm_upload flow [MODIFY app/services/document_service.py]
Detect DICOM file type
Call dicom_parser.parse_dicom() instead of queuing OCR
Store deidentified bytes back to R2
Store thumbnail to R2
Populate extracted_data with DICOM metadata
Set document_category = "imaging" automatically
Queue run_post_ocr_pipeline() with SR text
[ ] S5: Update QStash OCR callback [MODIFY app/routers/internal.py]
In process_ocr(), detect DICOM file type before starting OCR tiers
If DICOM: call dicom_parser.parse_dicom(), skip all OCR tiers
Feed results into run_post_ocr_pipeline() as normal
[ ] S6: Pydantic schemas for DICOM data [NEW FILE app/schemas/dicom.py]
DicomMetadata -- body_part, modality, study_date, laterality, etc.
SRFinding -- concept_name, value_type, value, unit, finding_type
DicomParseResult -- metadata, sr_findings, thumbnail_key, text_representation, deidentification_summary
[ ] S7: Frontend -- accept DICOM file types [MODIFY ConversationApp.tsx]
Update accept= attribute: add .dcm,.dicom
Update frontend file validation (extension list)
Add DICOM icon/badge for file attachment cards
[ ] S8: Frontend -- DICOM thumbnail preview [MODIFY EHR drawer components]
When document has thumbnail_key in extracted_data, fetch and display
Fallback to generic imaging icon when no thumbnail available
256x256 max, grayscale, with body part + modality label overlay
[ ] S9: Flagsmith feature flag [CONFIG]
Create dicom_support_enabled flag in Flagsmith
Gate all DICOM-specific code paths behind this flag
Default: OFF until integration testing passes
[ ] S10: Add pydicom and Pillow to requirements [MODIFY requirements.txt]
pydicom>=2.4.0,<3.0
Pillow>=10.0.0,<11.0 (likely already present for other image handling)

4. File-by-File Changes¶

New Files¶

File	Purpose
`app/services/dicom_parser.py`	Core DICOM parsing, de-identification, thumbnail generation, SR extraction
`app/schemas/dicom.py`	Pydantic models for DICOM metadata and SR findings
`tests/test_dicom_parser.py`	Unit tests for parser, de-identification, SR parsing
`tests/test_dicom_pipeline.py`	Integration tests for DICOM flow through document pipeline
`tests/fixtures/sample.dcm`	Minimal valid DICOM file for testing (synthetic, no real PHI)
`tests/fixtures/sample_sr.dcm`	Minimal DICOM-SR file for SR parsing tests

Modified Files¶

File	Change
`config/guardrails.yaml`	Add `.dcm`, `.dicom` to frontend allowed_extensions
`app/services/document_processing.py`	DICOM branch before OCR, `ocr_method="dicom_metadata"`
`app/agents/attachment_handler.py`	DICOM detection, skip inline OCR, set report_type="imaging"
`app/routers/internal.py`	DICOM detection in `process_ocr()`, skip OCR tiers
`app/services/document_service.py`	DICOM handling in `confirm_upload()`
`app/routers/documents.py`	Accept `.dcm`/`.dicom` in presign
`app/services/file_validator.py`	DICOM extension + MIME validation
`app/services/requirement_matcher.py`	Deterministic DICOM metadata matching
`app/agents/clinical_context.py`	FHIR ImagingStudy generation from DICOM metadata
`requirements.txt`	Add `pydicom>=2.4.0`
`curaway-health-navigator/src/pages/ConversationApp.tsx`	Accept `.dcm,.dicom` in file input

5. Data Model¶

`extracted_data` JSONB Schema for DICOM Documents¶

When a DICOM file is processed, document_references.extracted_data will contain:

{
  "source_type": "dicom",
  "dicom_metadata": {
    "modality": "MR",
    "modality_description": "Magnetic Resonance",
    "body_part_examined": "KNEE",
    "laterality": "L",
    "study_date": "2026-03-15",
    "study_description": "MRI LEFT KNEE WITHOUT CONTRAST",
    "series_description": "SAG PD FAT SAT",
    "institution_name": "DEIDENTIFIED",
    "referring_physician": "DEIDENTIFIED",
    "accession_number": "DEIDENTIFIED",
    "manufacturer": "SIEMENS",
    "station_name": "MRC35273",
    "slice_thickness": 3.0,
    "pixel_spacing": [0.35, 0.35],
    "rows": 512,
    "columns": 512,
    "bits_allocated": 16,
    "photometric_interpretation": "MONOCHROME2",
    "number_of_frames": 1
  },
  "sr_findings": [
    {
      "concept_name": "Finding",
      "value_type": "TEXT",
      "value": "Moderate tricompartmental osteoarthritis with near-complete loss of medial compartment cartilage",
      "finding_type": "impression"
    },
    {
      "concept_name": "Measurement",
      "value_type": "NUM",
      "value": 2.3,
      "unit": "mm",
      "finding_type": "measurement",
      "measurement_site": "Medial meniscus tear"
    }
  ],
  "thumbnail_key": "tenant-001/patient-abc/doc-xyz_thumb.png",
  "deidentification_summary": {
    "tags_stripped": 18,
    "tags_preserved": 45,
    "method": "safe_harbor_v1"
  },
  "analysis_status": "completed",
  "extracted_entities": [],
  "observations": [],
  "matched_requirements": []
}

FHIR Mapping¶

DICOM Tag	FHIR Resource	FHIR Field
Modality (0008,0060)	ImagingStudy	`series[0].modality.code`
BodyPartExamined (0018,0015)	ImagingStudy	`series[0].bodySite.display`
StudyDate (0008,0020)	ImagingStudy	`started`
StudyDescription (0008,1030)	ImagingStudy	`description`
AccessionNumber (0008,0050)	ImagingStudy	`identifier[0].value` (deidentified)
Laterality (0020,0060)	ImagingStudy	`series[0].laterality.code`
NumberOfFrames (0028,0008)	ImagingStudy	`numberOfInstances`
SR Finding (impression)	Condition or DiagnosticReport	`conclusion` / `code`
SR Finding (measurement)	Observation	`valueQuantity`

FHIR ImagingStudy resource will be created with status: "available" and linked to the patient via subject reference. SR findings feed into the existing Clinical Context Agent flow, which generates Condition and Observation resources.

6. De-Identification: Safe Harbor DICOM Tag Stripping¶

Tags to Strip (DICOM PS3.15 Table E.1-1, Safe Harbor subset)¶

These tags contain direct patient identifiers and must be removed or replaced with "DEIDENTIFIED" before storage or forwarding.

Tag	Name	Action
(0010,0010)	PatientName	Replace with "DEIDENTIFIED"
(0010,0020)	PatientID	Replace with "DEIDENTIFIED"
(0010,0030)	PatientBirthDate	Remove
(0010,0040)	PatientSex	Preserve (clinically relevant, not identifying alone)
(0010,1000)	OtherPatientIDs	Remove
(0010,1001)	OtherPatientNames	Remove
(0010,1010)	PatientAge	Preserve (clinically relevant)
(0010,1020)	PatientSize	Preserve (height, clinically relevant)
(0010,1030)	PatientWeight	Preserve (clinically relevant)
(0010,21B0)	AdditionalPatientHistory	Remove (may contain narrative PII)
(0008,0050)	AccessionNumber	Replace with "DEIDENTIFIED"
(0008,0080)	InstitutionName	Replace with "DEIDENTIFIED"
(0008,0081)	InstitutionAddress	Remove
(0008,0090)	ReferringPhysicianName	Replace with "DEIDENTIFIED"
(0008,1048)	PhysiciansOfRecord	Remove
(0008,1050)	PerformingPhysicianName	Remove
(0008,1060)	NameOfPhysiciansReadingStudy	Remove
(0008,1070)	OperatorsName	Remove
(0010,0050)	PatientInsurancePlanCode	Remove
(0010,2154)	PatientTelephoneNumbers	Remove
(0010,2160)	EthnicGroup	Remove
(0010,21F0)	PatientReligiousPreference	Remove
(0020,000D)	StudyInstanceUID	Replace with generated UID
(0020,000E)	SeriesInstanceUID	Replace with generated UID
(0008,0018)	SOPInstanceUID	Replace with generated UID
(0040,A123)	PersonName (in SR)	Replace with "DEIDENTIFIED"
(0032,1032)	RequestingPhysician	Remove
(0032,1060)	RequestedProcedureDescription	Preserve (clinically relevant)

Tags to Preserve (Clinical Value)¶

All of these stay intact -- they carry clinical meaning without identifying the patient:

Modality (0008,0060)
BodyPartExamined (0018,0015)
StudyDate (0008,0020) -- Note: study date is preserved because knowing when the imaging was done is critical for procedure requirement matching (validity windows). Date alone does not identify a patient under Safe Harbor.
StudyDescription (0008,1030)
SeriesDescription (0008,103E)
Laterality (0020,0060)
ImageLaterality (0020,0062)
All pixel data tags
All acquisition parameter tags (SliceThickness, PixelSpacing, etc.)
Manufacturer, StationName, SoftwareVersions

Implementation Notes¶

Use pydicom's Dataset.walk() to traverse all sequences (including nested SR content)
After stripping, call ds.save_as() to produce clean bytes
Generate new UIDs using pydicom.uid.generate_uid() (maintains DICOM validity)
Log an audit event: {"event_type": "DICOM_DEIDENTIFIED", "tags_stripped": N, "document_id": "..."}
Never log the stripped values themselves

7. DICOM-SR Parsing¶

Supported SR Template Types (Phase 1)¶

SR Type	IOD	SOP Class UID	Priority
Basic Text SR	1.2.840.10008.5.1.4.1.1.88.11	Comprehensive	Must have
Enhanced SR	1.2.840.10008.5.1.4.1.1.88.22	Comprehensive	Must have
Comprehensive SR	1.2.840.10008.5.1.4.1.1.88.33	Comprehensive	Must have
Comprehensive 3D SR	1.2.840.10008.5.1.4.1.1.88.34	Comprehensive	Nice to have
Key Object Selection	1.2.840.10008.5.1.4.1.1.88.59	N/A	Skip (no clinical text)

SR Content Tree Traversal¶

DICOM-SR stores findings as a tree of Content Items. Each item has a ValueType and a ConceptNameCodeSequence. The parser walks this tree recursively:

ContentSequence
  +-- Container: "Imaging Report"
       +-- Container: "Findings"
       |    +-- TEXT: "Moderate tricompartmental osteoarthritis..."
       |    +-- NUM: 2.3 mm (meniscus tear measurement)
       |    +-- CODE: SNOMED 396230008 (osteoarthritis)
       +-- Container: "Impression"
            +-- TEXT: "Near-complete loss of medial compartment cartilage"

Value Type Handling¶

ValueType	Extraction	Maps To
TEXT	Direct string extraction	Finding text, impression, recommendation
NUM	Value + MeasurementUnitsCodeSequence	Observation with valueQuantity
CODE	CodingSchemeDesignator + CodeValue + CodeMeaning	Condition code (ICD/SNOMED if present)
PNAME	De-identify, do not extract	Stripped
DATE	Parse as ISO date	Finding date
TIME	Parse as ISO time	Finding time
UIDREF	De-identify	Stripped
COMPOSITE	Reference to another DICOM object	Log reference, do not follow
IMAGE	Reference to image frame	Log reference, do not follow

Finding Classification¶

SR findings are classified into types for downstream processing:

Finding Type	Heuristic	FHIR Mapping
`impression`	Under "Impression" or "Conclusion" container	DiagnosticReport.conclusion
`finding`	Under "Findings" container	Condition or Observation
`measurement`	NUM value type with unit	Observation.valueQuantity
`recommendation`	Under "Recommendation" container	CarePlan.activity (future)
`coded_diagnosis`	CODE value type with SNOMED/ICD	Condition.code

Fallback for Non-SR DICOM¶

Most DICOM files patients upload will be image instances, not SR. When no SR content is found: 1. Build text from metadata: "{Modality} of {BodyPartExamined}, {Laterality}, Study Date: {StudyDate}. {StudyDescription}" 2. Pass this text to Clinical Context Agent as report_type="imaging" 3. The agent will not extract diagnoses from metadata alone (correct behavior -- a knee MRI file does not contain findings, the radiology report does)

8. Frontend Changes¶

File Input¶

ConversationApp.tsx -- update the accept attribute:

Current:  accept=".pdf,.jpg,.jpeg,.png,.doc,.docx"
New:      accept=".pdf,.jpg,.jpeg,.png,.doc,.docx,.dcm,.dicom"

Frontend File Validation¶

Update the client-side extension check to include .dcm and .dicom. DICOM files can be large (50-200MB for multi-frame), but Phase 1 only supports single-frame files. Keep the 20MB limit for now; revisit in Phase 2 for series support.

File Attachment Card¶

When file_type === "dicom", show: - A medical imaging icon (Lucide ScanLine or FileImage) instead of the generic file icon - Badge: "DICOM" in teal - After processing: body part + modality label (e.g., "MR - Left Knee")

EHR Drawer / Documents Tab¶

When a document has extracted_data.thumbnail_key: - Fetch thumbnail via presigned URL - Display as a small preview (128x128) in the document row - Click to expand to 256x256 in a lightbox - Overlay: modality + body part + study date

When no thumbnail (e.g., DICOM-SR without pixel data): - Show generic imaging icon - Display metadata summary: modality, body part, study date

9. Testing Plan¶

Unit Tests (`tests/test_dicom_parser.py`)¶

Test	Description	Assertion
`test_parse_valid_dicom`	Parse a synthetic DICOM file	Returns metadata with body_part, modality, study_date
`test_parse_dicom_missing_tags`	DICOM with minimal tags	Graceful None values, no crash
`test_parse_non_dicom_file`	Pass a PDF to the parser	Raises `DicomParseError`, does not crash
`test_deidentify_strips_patient_name`	Check PatientName tag	Replaced with "DEIDENTIFIED"
`test_deidentify_strips_patient_id`	Check PatientID tag	Replaced with "DEIDENTIFIED"
`test_deidentify_preserves_modality`	Check clinical tags	Modality, BodyPartExamined unchanged
`test_deidentify_preserves_study_date`	Check StudyDate	Preserved
`test_deidentify_replaces_uids`	Check instance UIDs	New UIDs generated, old ones gone
`test_deidentify_nested_sr`	SR with PersonName in content	PersonName in ContentSequence stripped
`test_sr_extraction_basic_text`	Basic Text SR	Extracts TEXT findings with concept names
`test_sr_extraction_measurements`	SR with NUM values	Extracts value + unit
`test_sr_extraction_coded_diagnosis`	SR with CODE values	Extracts SNOMED/ICD codes
`test_sr_no_content_sequence`	Image DICOM, no SR	Returns empty findings, no crash
`test_thumbnail_generation`	DICOM with pixel data	Returns PNG bytes, dimensions <= 256x256
`test_thumbnail_no_pixel_data`	DICOM-SR or metadata-only	Returns None
`test_build_text_representation`	Metadata + findings	Produces coherent text string for the Clinical Context Agent
`test_body_part_modality_matching`	KNEE + MR vs TKR requirements	Returns match with 0.95+ confidence

Integration Tests (`tests/test_dicom_pipeline.py`)¶

Test	Description
`test_dicom_upload_confirm_flow`	Upload .dcm via presign, confirm, verify extracted_data populated
`test_dicom_skips_ocr`	Upload .dcm, verify OCR not attempted, `ocr_method="dicom_metadata"`
`test_dicom_deidentified_in_r2`	Upload .dcm, download from R2, verify PatientName absent
`test_dicom_thumbnail_stored`	Upload .dcm with pixel data, verify `_thumb.png` key exists in R2
`test_dicom_sr_feeds_clinical_context`	Upload DICOM-SR, verify Clinical Context Agent produces FHIR resources
`test_dicom_requirement_matching`	Upload knee MRI DICOM for TKR case, verify requirement auto-matched
`test_dicom_feature_flag_off`	Disable `dicom_support_enabled`, upload .dcm, verify rejection

Test Fixtures¶

Create synthetic DICOM files using pydicom (no real patient data):

# tests/conftest.py
import pydicom
from pydicom.dataset import Dataset, FileDataset
from pydicom.uid import generate_uid

@pytest.fixture
def sample_dicom_bytes():
    ds = Dataset()
    ds.PatientName = "TEST^PATIENT"
    ds.PatientID = "TEST123"
    ds.Modality = "MR"
    ds.BodyPartExamined = "KNEE"
    ds.Laterality = "L"
    ds.StudyDate = "20260315"
    ds.StudyDescription = "MRI LEFT KNEE W/O CONTRAST"
    ds.BitsAllocated = 16
    ds.Rows = 64
    ds.Columns = 64
    ds.PixelData = b'\x00' * (64 * 64 * 2)
    # ... set required DICOM file meta
    # Return bytes via BytesIO

10. Risks and Mitigations¶

Risk	Severity	Likelihood	Mitigation
Large DICOM files (>20MB)	Medium	High	Phase 1 keeps the 20MB limit. Multi-frame/series support in Phase 2 with chunked upload. Warn user at frontend if file too large.
Compressed pixel data	Medium	Medium	DICOM supports JPEG2000, RLE, JPEG-LS compression. pydicom can decompress most with `pydicom[PIL]` extras. If decompression fails, skip thumbnail but still extract metadata.
DICOM files without standard tags	Low	Medium	Graceful degradation: missing tags return None, parser still succeeds. Minimum viable: if file parses as DICOM at all, accept it.
PII leak via pixel data	High	Low	Burned-in patient annotations on images (name/MRN in pixel data). Phase 1 does NOT scrub pixel data (that requires OCR on the image itself). Risk accepted for MVP; flagged for Phase 2.
Malicious DICOM files	High	Low	pydicom is a parser, not an executor. Limit pixel data memory to 512MB via `pydicom.config.MAXIMUM_TAG_SIZE_IN_MEMORY`. Reject files that fail to parse.
DICOM-SR without standard templates	Low	Medium	Non-standard SR trees are walked generically. Unknown containers logged but not classified. TEXT values always extracted regardless of container name.
Railway memory limits	Medium	Low	pydicom loads entire file into memory. 20MB DICOM is manageable. Phase 2 multi-frame series (500MB+) would need streaming -- out of scope.
R2 overwrite race condition	Low	Low	De-identified file overwrites original atomically via R2 PUT. If the write fails, original stays (acceptable -- retry will deidentify again).

11. Dependencies¶

Python Packages¶

Package	Version	Purpose	Size
`pydicom`	`>=2.4.0,<3.0`	DICOM file parsing, tag access, SR traversal, UID generation	~8MB
`Pillow`	`>=10.0.0,<11.0`	Pixel data to PNG thumbnail conversion	~3MB (likely already installed)

Optional (Not Required for Phase 1)¶

Package	Purpose	When
`pylibjpeg` + `pylibjpeg-libjpeg`	JPEG compressed transfer syntax decompression	If users upload JPEG-compressed DICOM
`pylibjpeg-openjpeg`	JPEG2000 compressed transfer syntax	If users upload J2K-compressed DICOM
`highdicom`	Higher-level SR parsing, Measurement Report templates	Phase 2 if SR parsing needs more sophistication
`gdcm`	Handles exotic transfer syntaxes	Post-MVP if decompression failures are common

No New Infrastructure¶

No new database tables (uses existing document_references.extracted_data JSONB)
No new R2 buckets (thumbnails stored alongside documents)
No new services or containers
No new external API calls

Appendix A: DICOM Modality Codes to Procedure Requirement Mapping¶

Modality Code	Modality Name	Typical Procedure Match
MR	Magnetic Resonance	Knee MRI, Brain MRI, Spine MRI
CT	Computed Tomography	CT Angiogram, CT Scan
CR / DX	Computed/Digital Radiography	X-Ray, Chest X-Ray
US	Ultrasound	Cardiac Echo, Abdominal Ultrasound
NM	Nuclear Medicine	Bone Scan, PET Scan
PT	PET	PET Scan
XA	X-Ray Angiography	Cardiac Catheterization
MG	Mammography	Mammogram
ECG	Electrocardiography	ECG/EKG
SR	Structured Report	Radiology Report (text, not imaging)

Appendix B: Body Part to Procedure Code Mapping¶

BodyPartExamined	Procedure Codes
KNEE	knee_replacement (27447), acl_reconstruction
HIP	hip_replacement (27130)
SPINE / LSPINE / CSPINE	spinal_fusion, laminectomy
HEART / CHEST	cabg (33533), valve_replacement
SHOULDER	shoulder_replacement, rotator_cuff
ABDOMEN	bariatric (43775), hernia_repair
BRAIN / HEAD	craniotomy
BREAST	mastectomy

These mappings are used by the deterministic matcher in match_dicom_metadata_to_requirements() and will be stored in config/dicom_mappings.yaml for configurability.