JoeCode

Plan: Sort Antique Rifle Photos and Extract Serial Numbers

Plan: Sort Antique Rifle Photos and Extract Serial Numbers

A script that takes a folder of ~100–1,000 rifle photos shot in an assembly-line session and produces a CSV/JSON manifest mapping each image to a rifle group and an extracted serial number.

The shoot has four signals that simplify the problem:

  1. Each rifle’s photos are contiguous in capture order.
  2. Each group typically starts with a “hero” (full-rifle) shot.
  3. There is a ~10–20 second EXIF timestamp gap between groups, vs. ~2–5 seconds within a group.
  4. Background is consistent across rifles (removes “scene change” as a signal but simplifies LLM reads).

Because of (3), the bulk of the work is solved by a timestamp-gap split, not by visual similarity / embeddings. A vision LLM is used only for serial number OCR and for verifying ambiguous boundaries.

Pipeline

Phase 1 — Ingest and order

  1. Walk the input folder; collect all image files (jpg/jpeg/heic/png/tiff).
  2. Read EXIF DateTimeOriginal (fall back to DateTime, then file mtime) for each image.
  3. Sort images by capture timestamp ascending. Record the sorted order as the canonical sequence.

Phase 2 — Group by timestamp gap

  1. Compute the delta in seconds between each consecutive pair of images.
  2. Split into groups wherever the delta exceeds a threshold (default 8s; make it a CLI flag).
  3. Assign each group a stable ID (rifle_001, rifle_002, …).
  4. Emit a preliminary manifest (rifle_id → list of image paths, with per-image timestamps and intra-group deltas) for inspection before any API spend.
  1. For each group, ask a vision LLM whether the first image is a hero shot (whole rifle visible, side-on, fills most of frame).
  2. If the first image is not a hero shot, flag the group as needs_review in the manifest. Common causes:
    • User shot a close-up of the next rifle before its hero shot.
    • Gap threshold split a single rifle in two (no hero at the “new” group’s start).
  3. Do not auto-merge or auto-split at this stage — just flag. Manual review is faster than chasing edge cases programmatically.

Phase 4 — Serial number extraction

  1. For each group, send all images in a single vision LLM call. Prompt asks for structured JSON:
    • serial_numbers: list of all stamped/engraved numbers visible across images (receiver, barrel, stock cartouche — antiques often have several).
    • maker / model / caliber / markings: best-effort identification.
    • notes: free-text observations.
    • confidence: low / medium / high per serial number.
  2. Use the API’s JSON-mode / response schema feature so output is parseable.
  3. Cache responses keyed by group content hash so re-runs are free.

Phase 5 — Output

  1. Write manifest.csv and manifest.json with columns: rifle_id, image_count, image_files, first_timestamp, last_timestamp, serial_numbers, maker, model, caliber, markings, notes, confidence, needs_review, review_reason.
  2. Write a simple contact_sheet.html showing each group as a row of thumbnails with the extracted data — for fast human review and correction.

Tooling decisions

CLI shape

rifles-sort \
  --input ./photos \
  --output ./out \
  --gap-seconds 8 \
  --provider gemini \
  --model gemini-2.5-pro \
  --skip-llm        # phase 1+2 only, for cheap dry-run
  --verify-heroes   # phase 3
  --extract-serials # phase 4

Default behavior: run all phases.

Files to create

Verification

  1. Dry run on a small sample (10–20 images, 2–3 known rifles):
    • Confirm timestamp grouping matches reality.
    • Confirm at least one serial number is read correctly per group.
  2. Full run: spot-check 10% of groups in the contact sheet.
  3. Edge cases to deliberately test:
    • Group whose first image is a close-up (should flag needs_review).
    • A rifle with multiple visible serials.
    • A worn/illegible serial (should produce low confidence, not a guess).

Decisions and scope

Further considerations

  1. Gap threshold tuning: 8s is a starting guess based on the 10–20s observed lag. The dry-run phase output should make it easy to see actual intra- vs. inter-group deltas and adjust. Option: auto-detect by finding the bimodal split in observed deltas.
  2. HEIC handling: iPhone shots are often HEIC. pillow-heif handles this, but some LLM APIs prefer JPEG — may need a transcode step before upload.
  3. Cost guardrail: add a --max-images flag so an accidental point at a huge folder doesn’t burn API credits.