Skip to content

DICOM File Support -- Phase 1

Status: Proposed Author: SD Created: 2026-04-10 GitHub Issue: #147 Branch: feat/dicom-support


1. Overview and Motivation

What

Accept DICOM (.dcm/.dicom) medical imaging files in the Curaway upload flow, extract structured metadata (body part, modality, study date, laterality, patient demographics), parse DICOM-SR (Structured Reports) for radiology findings, de-identify patient PII from headers before provider forwarding, auto-match imaging metadata against procedure requirements, and render a thumbnail preview in the EHR drawer.

Why

  • Patients have DICOM files. Radiology departments frequently provide imaging on CD/USB as DICOM, not PDF. Patients currently cannot upload these -- they must convert to PDF first, losing structured metadata.
  • Structured metadata > OCR. DICOM headers contain machine-readable body part, modality, study date, and laterality. This is 100% accurate data that the OCR + LLM pipeline would otherwise have to infer from free text with lower confidence.
  • DICOM-SR carries radiology findings. Structured Reports embedded in DICOM files contain the radiologist's findings, measurements, and impressions in a parseable format -- no OCR needed.
  • Procedure requirement matching. Auto-matching BodyPartExamined=KNEE + Modality=MR against a TKR procedure's "Knee MRI" requirement is deterministic and instant.
  • De-identification is mandatory. DICOM headers contain patient name, DOB, and medical record numbers. These must be stripped before forwarding to international providers (GDPR + HIPAA Safe Harbor).
  • Cross-border value. Patients traveling from US/UK/UAE to India/Turkey/Thailand often carry imaging on disc. Supporting DICOM directly removes a friction point in the onboarding flow.

What This Is Not

  • Not a full PACS viewer. No windowing, level adjustment, or multi-frame cine playback.
  • Not multi-file DICOM series handling (Phase 2).
  • Not DICOM networking (C-STORE, C-FIND, WADO-RS) -- that is post-Series A.
  • Not DICOM-RT (radiation therapy) support.

2. Architecture Decisions

Where dicom_parser.py Fits

New service at app/services/dicom_parser.py -- sits alongside document_processing.py in the services layer. It is a pure utility module: takes bytes in, returns structured dict out. No DB access, no side effects.

Upload flow (existing)
  |
  v
document_service.confirm_upload()
  |
  +-- file_type == "dicom" ?
  |     |
  |     YES --> dicom_parser.parse_dicom(file_bytes)
  |     |         returns: {metadata, sr_findings, thumbnail_bytes, deidentified_bytes}
  |     |
  |     +-- Store deidentified_bytes to R2 (overwrite original)
  |     +-- Store thumbnail to R2 ({storage_key}_thumb.png)
  |     +-- Save metadata + sr_findings to doc.extracted_data
  |     +-- Skip OCR, go straight to run_post_ocr_pipeline() with extracted text from SR
  |     |
  |     NO --> existing OCR pipeline (unchanged)
  |
  v
run_post_ocr_pipeline()  <-- receives SR text or metadata-derived text
  |
  v
Clinical Context Agent --> FHIR resources --> requirement matching --> EHR rebuild

Key Design Choices

Decision Choice Rationale
DICOM library pydicom 2.4+ Industry standard, pure Python, no C dependencies, active maintenance, 15K+ GitHub stars
Thumbnail generation pydicom + Pillow pydicom reads pixel data, Pillow converts to PNG. No GDAL/VTK heavyweight dependencies.
De-identification Custom tag stripper (Safe Harbor) Full DICOM anonymization suites (deid, DicomAnonymizer) are overkill. We strip a known tag list -- deterministic, auditable, <50 lines.
SR parsing pydicom ContentSequence traversal No need for highdicom at MVP. Walk the SR tree, extract TEXT/NUM/CODE value types.
Storage De-identified bytes overwrite original in R2 Never persist PII-bearing DICOM on our infrastructure. Original is replaced.
Thumbnail format PNG, 256x256 max, 8-bit grayscale Small enough for inline preview, good enough for "is this the right scan?"
Feature flag dicom_support_enabled (Flagsmith) Kill switch for the entire feature. Default off until tested.

3. Implementation Checklist

Tier 1: Opus (Architecture, Clinical Logic, Security)

  • [ ] O1: app/services/dicom_parser.py -- core parser module [NEW FILE]
  • parse_dicom(file_bytes: bytes) -> DicomParseResult -- main entry point
  • extract_metadata(ds: pydicom.Dataset) -> dict -- pull structured tags
  • extract_sr_findings(ds: pydicom.Dataset) -> list[SRFinding] -- walk SR content tree
  • deidentify(ds: pydicom.Dataset) -> pydicom.Dataset -- strip PII tags per Safe Harbor
  • generate_thumbnail(ds: pydicom.Dataset) -> bytes | None -- pixel data to PNG
  • build_text_representation(metadata: dict, findings: list) -> str -- synthesize text for Clinical Context Agent

  • [ ] O2: De-identification tag list and logic [IN dicom_parser.py]

  • Implement Safe Harbor tag stripping (see Section 6)
  • Replace stripped values with safe placeholders (e.g., "DEIDENTIFIED")
  • Preserve all clinical tags (BodyPartExamined, Modality, StudyDate, etc.)
  • Write to new Dataset, never modify-in-place on the input
  • Log deidentification event to audit table (tag count stripped, no PII values)

  • [ ] O3: DICOM-SR parsing logic [IN dicom_parser.py]

  • Walk ContentSequence recursively
  • Extract TEXT, NUM, CODE, PNAME, DATE, TIME, UIDREF, COMPOSITE value types
  • Map SR concept names to clinical meaning (see Section 7)
  • Return structured findings list

  • [ ] O4: Wire into document_processing.py [MODIFY]

  • Add DICOM branch in the processing pipeline
  • Skip OCR for DICOM files
  • Pass SR text + metadata-derived text to run_post_ocr_pipeline()
  • Set ocr_method = "dicom_metadata" for DICOM files

  • [ ] O5: Wire into attachment_handler.py [MODIFY]

  • Add DICOM detection in process_attachments()
  • When file_type == "dicom", skip inline OCR, use extracted_data directly
  • Set report_type = "imaging" for Clinical Context Agent

  • [ ] O6: Auto-match body part + modality against procedure requirements [MODIFY requirement_matcher.py]

  • New function: match_dicom_metadata_to_requirements(metadata, proc_reqs) -> list[dict]
  • Match BodyPartExamined + Modality against requirement descriptions
  • Higher confidence than LLM matching (deterministic, 0.95+ confidence)

  • [ ] O7: FHIR ImagingStudy resource generation [MODIFY clinical_context.py prompts]

  • When DICOM metadata is available, generate FHIR ImagingStudy (not just Condition/Observation)
  • Map modality, body part, study date, accession number to ImagingStudy fields
  • Store via fhir_service.create_fhir_resource()

Tier 2: Sonnet (Mechanical Implementation, Config, Tests)

  • [ ] S1: Update config/guardrails.yaml [MODIFY]
  • Add .dcm, .dicom to frontend.allowed_extensions
  • Add application/dicom to backend.allowed_mime_types (already present, verify)
  • Add DICOM-specific medical keywords: dicom, modality, body part, series

  • [ ] S2: Update app/services/file_validator.py [MODIFY]

  • Add .dcm, .dicom to extension validation
  • Add application/dicom MIME type mapping
  • DICOM files skip medical keyword check (metadata provides clinical context natively)

  • [ ] S3: Update presign endpoint [MODIFY app/routers/documents.py]

  • Accept .dcm/.dicom extensions in presign request validation
  • Set file_type = "dicom" in DocumentReference creation

  • [ ] S4: Update confirm_upload flow [MODIFY app/services/document_service.py]

  • Detect DICOM file type
  • Call dicom_parser.parse_dicom() instead of queuing OCR
  • Store deidentified bytes back to R2
  • Store thumbnail to R2
  • Populate extracted_data with DICOM metadata
  • Set document_category = "imaging" automatically
  • Queue run_post_ocr_pipeline() with SR text

  • [ ] S5: Update QStash OCR callback [MODIFY app/routers/internal.py]

  • In process_ocr(), detect DICOM file type before starting OCR tiers
  • If DICOM: call dicom_parser.parse_dicom(), skip all OCR tiers
  • Feed results into run_post_ocr_pipeline() as normal

  • [ ] S6: Pydantic schemas for DICOM data [NEW FILE app/schemas/dicom.py]

  • DicomMetadata -- body_part, modality, study_date, laterality, etc.
  • SRFinding -- concept_name, value_type, value, unit, finding_type
  • DicomParseResult -- metadata, sr_findings, thumbnail_key, text_representation, deidentification_summary

  • [ ] S7: Frontend -- accept DICOM file types [MODIFY ConversationApp.tsx]

  • Update accept= attribute: add .dcm,.dicom
  • Update frontend file validation (extension list)
  • Add DICOM icon/badge for file attachment cards

  • [ ] S8: Frontend -- DICOM thumbnail preview [MODIFY EHR drawer components]

  • When document has thumbnail_key in extracted_data, fetch and display
  • Fallback to generic imaging icon when no thumbnail available
  • 256x256 max, grayscale, with body part + modality label overlay

  • [ ] S9: Flagsmith feature flag [CONFIG]

  • Create dicom_support_enabled flag in Flagsmith
  • Gate all DICOM-specific code paths behind this flag
  • Default: OFF until integration testing passes

  • [ ] S10: Add pydicom and Pillow to requirements [MODIFY requirements.txt]

  • pydicom>=2.4.0,<3.0
  • Pillow>=10.0.0,<11.0 (likely already present for other image handling)

4. File-by-File Changes

New Files

File Purpose
app/services/dicom_parser.py Core DICOM parsing, de-identification, thumbnail generation, SR extraction
app/schemas/dicom.py Pydantic models for DICOM metadata and SR findings
tests/test_dicom_parser.py Unit tests for parser, de-identification, SR parsing
tests/test_dicom_pipeline.py Integration tests for DICOM flow through document pipeline
tests/fixtures/sample.dcm Minimal valid DICOM file for testing (synthetic, no real PHI)
tests/fixtures/sample_sr.dcm Minimal DICOM-SR file for SR parsing tests

Modified Files

File Change
config/guardrails.yaml Add .dcm, .dicom to frontend allowed_extensions
app/services/document_processing.py DICOM branch before OCR, ocr_method="dicom_metadata"
app/agents/attachment_handler.py DICOM detection, skip inline OCR, set report_type="imaging"
app/routers/internal.py DICOM detection in process_ocr(), skip OCR tiers
app/services/document_service.py DICOM handling in confirm_upload()
app/routers/documents.py Accept .dcm/.dicom in presign
app/services/file_validator.py DICOM extension + MIME validation
app/services/requirement_matcher.py Deterministic DICOM metadata matching
app/agents/clinical_context.py FHIR ImagingStudy generation from DICOM metadata
requirements.txt Add pydicom>=2.4.0
curaway-health-navigator/src/pages/ConversationApp.tsx Accept .dcm,.dicom in file input

5. Data Model

extracted_data JSONB Schema for DICOM Documents

When a DICOM file is processed, document_references.extracted_data will contain:

{
  "source_type": "dicom",
  "dicom_metadata": {
    "modality": "MR",
    "modality_description": "Magnetic Resonance",
    "body_part_examined": "KNEE",
    "laterality": "L",
    "study_date": "2026-03-15",
    "study_description": "MRI LEFT KNEE WITHOUT CONTRAST",
    "series_description": "SAG PD FAT SAT",
    "institution_name": "DEIDENTIFIED",
    "referring_physician": "DEIDENTIFIED",
    "accession_number": "DEIDENTIFIED",
    "manufacturer": "SIEMENS",
    "station_name": "MRC35273",
    "slice_thickness": 3.0,
    "pixel_spacing": [0.35, 0.35],
    "rows": 512,
    "columns": 512,
    "bits_allocated": 16,
    "photometric_interpretation": "MONOCHROME2",
    "number_of_frames": 1
  },
  "sr_findings": [
    {
      "concept_name": "Finding",
      "value_type": "TEXT",
      "value": "Moderate tricompartmental osteoarthritis with near-complete loss of medial compartment cartilage",
      "finding_type": "impression"
    },
    {
      "concept_name": "Measurement",
      "value_type": "NUM",
      "value": 2.3,
      "unit": "mm",
      "finding_type": "measurement",
      "measurement_site": "Medial meniscus tear"
    }
  ],
  "thumbnail_key": "tenant-001/patient-abc/doc-xyz_thumb.png",
  "deidentification_summary": {
    "tags_stripped": 18,
    "tags_preserved": 45,
    "method": "safe_harbor_v1"
  },
  "analysis_status": "completed",
  "extracted_entities": [],
  "observations": [],
  "matched_requirements": []
}

FHIR Mapping

DICOM Tag FHIR Resource FHIR Field
Modality (0008,0060) ImagingStudy series[0].modality.code
BodyPartExamined (0018,0015) ImagingStudy series[0].bodySite.display
StudyDate (0008,0020) ImagingStudy started
StudyDescription (0008,1030) ImagingStudy description
AccessionNumber (0008,0050) ImagingStudy identifier[0].value (deidentified)
Laterality (0020,0060) ImagingStudy series[0].laterality.code
NumberOfFrames (0028,0008) ImagingStudy numberOfInstances
SR Finding (impression) Condition or DiagnosticReport conclusion / code
SR Finding (measurement) Observation valueQuantity

FHIR ImagingStudy resource will be created with status: "available" and linked to the patient via subject reference. SR findings feed into the existing Clinical Context Agent flow, which generates Condition and Observation resources.


6. De-Identification: Safe Harbor DICOM Tag Stripping

Tags to Strip (DICOM PS3.15 Table E.1-1, Safe Harbor subset)

These tags contain direct patient identifiers and must be removed or replaced with "DEIDENTIFIED" before storage or forwarding.

Tag Name Action
(0010,0010) PatientName Replace with "DEIDENTIFIED"
(0010,0020) PatientID Replace with "DEIDENTIFIED"
(0010,0030) PatientBirthDate Remove
(0010,0040) PatientSex Preserve (clinically relevant, not identifying alone)
(0010,1000) OtherPatientIDs Remove
(0010,1001) OtherPatientNames Remove
(0010,1010) PatientAge Preserve (clinically relevant)
(0010,1020) PatientSize Preserve (height, clinically relevant)
(0010,1030) PatientWeight Preserve (clinically relevant)
(0010,21B0) AdditionalPatientHistory Remove (may contain narrative PII)
(0008,0050) AccessionNumber Replace with "DEIDENTIFIED"
(0008,0080) InstitutionName Replace with "DEIDENTIFIED"
(0008,0081) InstitutionAddress Remove
(0008,0090) ReferringPhysicianName Replace with "DEIDENTIFIED"
(0008,1048) PhysiciansOfRecord Remove
(0008,1050) PerformingPhysicianName Remove
(0008,1060) NameOfPhysiciansReadingStudy Remove
(0008,1070) OperatorsName Remove
(0010,0050) PatientInsurancePlanCode Remove
(0010,2154) PatientTelephoneNumbers Remove
(0010,2160) EthnicGroup Remove
(0010,21F0) PatientReligiousPreference Remove
(0020,000D) StudyInstanceUID Replace with generated UID
(0020,000E) SeriesInstanceUID Replace with generated UID
(0008,0018) SOPInstanceUID Replace with generated UID
(0040,A123) PersonName (in SR) Replace with "DEIDENTIFIED"
(0032,1032) RequestingPhysician Remove
(0032,1060) RequestedProcedureDescription Preserve (clinically relevant)

Tags to Preserve (Clinical Value)

All of these stay intact -- they carry clinical meaning without identifying the patient:

  • Modality (0008,0060)
  • BodyPartExamined (0018,0015)
  • StudyDate (0008,0020) -- Note: study date is preserved because knowing when the imaging was done is critical for procedure requirement matching (validity windows). Date alone does not identify a patient under Safe Harbor.
  • StudyDescription (0008,1030)
  • SeriesDescription (0008,103E)
  • Laterality (0020,0060)
  • ImageLaterality (0020,0062)
  • All pixel data tags
  • All acquisition parameter tags (SliceThickness, PixelSpacing, etc.)
  • Manufacturer, StationName, SoftwareVersions

Implementation Notes

  • Use pydicom's Dataset.walk() to traverse all sequences (including nested SR content)
  • After stripping, call ds.save_as() to produce clean bytes
  • Generate new UIDs using pydicom.uid.generate_uid() (maintains DICOM validity)
  • Log an audit event: {"event_type": "DICOM_DEIDENTIFIED", "tags_stripped": N, "document_id": "..."}
  • Never log the stripped values themselves

7. DICOM-SR Parsing

Supported SR Template Types (Phase 1)

SR Type IOD SOP Class UID Priority
Basic Text SR 1.2.840.10008.5.1.4.1.1.88.11 Comprehensive Must have
Enhanced SR 1.2.840.10008.5.1.4.1.1.88.22 Comprehensive Must have
Comprehensive SR 1.2.840.10008.5.1.4.1.1.88.33 Comprehensive Must have
Comprehensive 3D SR 1.2.840.10008.5.1.4.1.1.88.34 Comprehensive Nice to have
Key Object Selection 1.2.840.10008.5.1.4.1.1.88.59 N/A Skip (no clinical text)

SR Content Tree Traversal

DICOM-SR stores findings as a tree of Content Items. Each item has a ValueType and a ConceptNameCodeSequence. The parser walks this tree recursively:

ContentSequence
  +-- Container: "Imaging Report"
       +-- Container: "Findings"
       |    +-- TEXT: "Moderate tricompartmental osteoarthritis..."
       |    +-- NUM: 2.3 mm (meniscus tear measurement)
       |    +-- CODE: SNOMED 396230008 (osteoarthritis)
       +-- Container: "Impression"
            +-- TEXT: "Near-complete loss of medial compartment cartilage"

Value Type Handling

ValueType Extraction Maps To
TEXT Direct string extraction Finding text, impression, recommendation
NUM Value + MeasurementUnitsCodeSequence Observation with valueQuantity
CODE CodingSchemeDesignator + CodeValue + CodeMeaning Condition code (ICD/SNOMED if present)
PNAME De-identify, do not extract Stripped
DATE Parse as ISO date Finding date
TIME Parse as ISO time Finding time
UIDREF De-identify Stripped
COMPOSITE Reference to another DICOM object Log reference, do not follow
IMAGE Reference to image frame Log reference, do not follow

Finding Classification

SR findings are classified into types for downstream processing:

Finding Type Heuristic FHIR Mapping
impression Under "Impression" or "Conclusion" container DiagnosticReport.conclusion
finding Under "Findings" container Condition or Observation
measurement NUM value type with unit Observation.valueQuantity
recommendation Under "Recommendation" container CarePlan.activity (future)
coded_diagnosis CODE value type with SNOMED/ICD Condition.code

Fallback for Non-SR DICOM

Most DICOM files patients upload will be image instances, not SR. When no SR content is found: 1. Build text from metadata: "{Modality} of {BodyPartExamined}, {Laterality}, Study Date: {StudyDate}. {StudyDescription}" 2. Pass this text to Clinical Context Agent as report_type="imaging" 3. The agent will not extract diagnoses from metadata alone (correct behavior -- a knee MRI file does not contain findings, the radiology report does)


8. Frontend Changes

File Input

ConversationApp.tsx -- update the accept attribute:

Current:  accept=".pdf,.jpg,.jpeg,.png,.doc,.docx"
New:      accept=".pdf,.jpg,.jpeg,.png,.doc,.docx,.dcm,.dicom"

Frontend File Validation

Update the client-side extension check to include .dcm and .dicom. DICOM files can be large (50-200MB for multi-frame), but Phase 1 only supports single-frame files. Keep the 20MB limit for now; revisit in Phase 2 for series support.

File Attachment Card

When file_type === "dicom", show: - A medical imaging icon (Lucide ScanLine or FileImage) instead of the generic file icon - Badge: "DICOM" in teal - After processing: body part + modality label (e.g., "MR - Left Knee")

EHR Drawer / Documents Tab

When a document has extracted_data.thumbnail_key: - Fetch thumbnail via presigned URL - Display as a small preview (128x128) in the document row - Click to expand to 256x256 in a lightbox - Overlay: modality + body part + study date

When no thumbnail (e.g., DICOM-SR without pixel data): - Show generic imaging icon - Display metadata summary: modality, body part, study date


9. Testing Plan

Unit Tests (tests/test_dicom_parser.py)

Test Description Assertion
test_parse_valid_dicom Parse a synthetic DICOM file Returns metadata with body_part, modality, study_date
test_parse_dicom_missing_tags DICOM with minimal tags Graceful None values, no crash
test_parse_non_dicom_file Pass a PDF to the parser Raises DicomParseError, does not crash
test_deidentify_strips_patient_name Check PatientName tag Replaced with "DEIDENTIFIED"
test_deidentify_strips_patient_id Check PatientID tag Replaced with "DEIDENTIFIED"
test_deidentify_preserves_modality Check clinical tags Modality, BodyPartExamined unchanged
test_deidentify_preserves_study_date Check StudyDate Preserved
test_deidentify_replaces_uids Check instance UIDs New UIDs generated, old ones gone
test_deidentify_nested_sr SR with PersonName in content PersonName in ContentSequence stripped
test_sr_extraction_basic_text Basic Text SR Extracts TEXT findings with concept names
test_sr_extraction_measurements SR with NUM values Extracts value + unit
test_sr_extraction_coded_diagnosis SR with CODE values Extracts SNOMED/ICD codes
test_sr_no_content_sequence Image DICOM, no SR Returns empty findings, no crash
test_thumbnail_generation DICOM with pixel data Returns PNG bytes, dimensions <= 256x256
test_thumbnail_no_pixel_data DICOM-SR or metadata-only Returns None
test_build_text_representation Metadata + findings Produces coherent text string for the Clinical Context Agent
test_body_part_modality_matching KNEE + MR vs TKR requirements Returns match with 0.95+ confidence

Integration Tests (tests/test_dicom_pipeline.py)

Test Description
test_dicom_upload_confirm_flow Upload .dcm via presign, confirm, verify extracted_data populated
test_dicom_skips_ocr Upload .dcm, verify OCR not attempted, ocr_method="dicom_metadata"
test_dicom_deidentified_in_r2 Upload .dcm, download from R2, verify PatientName absent
test_dicom_thumbnail_stored Upload .dcm with pixel data, verify _thumb.png key exists in R2
test_dicom_sr_feeds_clinical_context Upload DICOM-SR, verify Clinical Context Agent produces FHIR resources
test_dicom_requirement_matching Upload knee MRI DICOM for TKR case, verify requirement auto-matched
test_dicom_feature_flag_off Disable dicom_support_enabled, upload .dcm, verify rejection

Test Fixtures

Create synthetic DICOM files using pydicom (no real patient data):

# tests/conftest.py
import pydicom
from pydicom.dataset import Dataset, FileDataset
from pydicom.uid import generate_uid

@pytest.fixture
def sample_dicom_bytes():
    ds = Dataset()
    ds.PatientName = "TEST^PATIENT"
    ds.PatientID = "TEST123"
    ds.Modality = "MR"
    ds.BodyPartExamined = "KNEE"
    ds.Laterality = "L"
    ds.StudyDate = "20260315"
    ds.StudyDescription = "MRI LEFT KNEE W/O CONTRAST"
    ds.BitsAllocated = 16
    ds.Rows = 64
    ds.Columns = 64
    ds.PixelData = b'\x00' * (64 * 64 * 2)
    # ... set required DICOM file meta
    # Return bytes via BytesIO

10. Risks and Mitigations

Risk Severity Likelihood Mitigation
Large DICOM files (>20MB) Medium High Phase 1 keeps the 20MB limit. Multi-frame/series support in Phase 2 with chunked upload. Warn user at frontend if file too large.
Compressed pixel data Medium Medium DICOM supports JPEG2000, RLE, JPEG-LS compression. pydicom can decompress most with pydicom[PIL] extras. If decompression fails, skip thumbnail but still extract metadata.
DICOM files without standard tags Low Medium Graceful degradation: missing tags return None, parser still succeeds. Minimum viable: if file parses as DICOM at all, accept it.
PII leak via pixel data High Low Burned-in patient annotations on images (name/MRN in pixel data). Phase 1 does NOT scrub pixel data (that requires OCR on the image itself). Risk accepted for MVP; flagged for Phase 2.
Malicious DICOM files High Low pydicom is a parser, not an executor. Limit pixel data memory to 512MB via pydicom.config.MAXIMUM_TAG_SIZE_IN_MEMORY. Reject files that fail to parse.
DICOM-SR without standard templates Low Medium Non-standard SR trees are walked generically. Unknown containers logged but not classified. TEXT values always extracted regardless of container name.
Railway memory limits Medium Low pydicom loads entire file into memory. 20MB DICOM is manageable. Phase 2 multi-frame series (500MB+) would need streaming -- out of scope.
R2 overwrite race condition Low Low De-identified file overwrites original atomically via R2 PUT. If the write fails, original stays (acceptable -- retry will deidentify again).

11. Dependencies

Python Packages

Package Version Purpose Size
pydicom >=2.4.0,<3.0 DICOM file parsing, tag access, SR traversal, UID generation ~8MB
Pillow >=10.0.0,<11.0 Pixel data to PNG thumbnail conversion ~3MB (likely already installed)

Optional (Not Required for Phase 1)

Package Purpose When
pylibjpeg + pylibjpeg-libjpeg JPEG compressed transfer syntax decompression If users upload JPEG-compressed DICOM
pylibjpeg-openjpeg JPEG2000 compressed transfer syntax If users upload J2K-compressed DICOM
highdicom Higher-level SR parsing, Measurement Report templates Phase 2 if SR parsing needs more sophistication
gdcm Handles exotic transfer syntaxes Post-MVP if decompression failures are common

No New Infrastructure

  • No new database tables (uses existing document_references.extracted_data JSONB)
  • No new R2 buckets (thumbnails stored alongside documents)
  • No new services or containers
  • No new external API calls

Appendix A: DICOM Modality Codes to Procedure Requirement Mapping

Modality Code Modality Name Typical Procedure Match
MR Magnetic Resonance Knee MRI, Brain MRI, Spine MRI
CT Computed Tomography CT Angiogram, CT Scan
CR / DX Computed/Digital Radiography X-Ray, Chest X-Ray
US Ultrasound Cardiac Echo, Abdominal Ultrasound
NM Nuclear Medicine Bone Scan, PET Scan
PT PET PET Scan
XA X-Ray Angiography Cardiac Catheterization
MG Mammography Mammogram
ECG Electrocardiography ECG/EKG
SR Structured Report Radiology Report (text, not imaging)

Appendix B: Body Part to Procedure Code Mapping

BodyPartExamined Procedure Codes
KNEE knee_replacement (27447), acl_reconstruction
HIP hip_replacement (27130)
SPINE / LSPINE / CSPINE spinal_fusion, laminectomy
HEART / CHEST cabg (33533), valve_replacement
SHOULDER shoulder_replacement, rotator_cuff
ABDOMEN bariatric (43775), hernia_repair
BRAIN / HEAD craniotomy
BREAST mastectomy

These mappings are used by the deterministic matcher in match_dicom_metadata_to_requirements() and will be stored in config/dicom_mappings.yaml for configurability.