Plan: Sort Antique Rifle Photos and Extract Serial Numbers
Plan: Sort Antique Rifle Photos and Extract Serial Numbers
A script that takes a folder of ~100–1,000 rifle photos shot in an assembly-line session and produces a CSV/JSON manifest mapping each image to a rifle group and an extracted serial number.
The shoot has four signals that simplify the problem:
- Each rifle’s photos are contiguous in capture order.
- Each group typically starts with a “hero” (full-rifle) shot.
- There is a ~10–20 second EXIF timestamp gap between groups, vs. ~2–5 seconds within a group.
- Background is consistent across rifles (removes “scene change” as a signal but simplifies LLM reads).
Because of (3), the bulk of the work is solved by a timestamp-gap split, not by visual similarity / embeddings. A vision LLM is used only for serial number OCR and for verifying ambiguous boundaries.
Pipeline
Phase 1 — Ingest and order
- Walk the input folder; collect all image files (jpg/jpeg/heic/png/tiff).
- Read EXIF
DateTimeOriginal(fall back toDateTime, then file mtime) for each image. - Sort images by capture timestamp ascending. Record the sorted order as the canonical sequence.
Phase 2 — Group by timestamp gap
- Compute the delta in seconds between each consecutive pair of images.
- Split into groups wherever the delta exceeds a threshold (default 8s; make it a CLI flag).
- Assign each group a stable ID (
rifle_001,rifle_002, …). - Emit a preliminary manifest (rifle_id → list of image paths, with per-image timestamps and intra-group deltas) for inspection before any API spend.
Phase 3 — Hero-shot verification (optional but recommended)
- For each group, ask a vision LLM whether the first image is a hero shot (whole rifle visible, side-on, fills most of frame).
- If the first image is not a hero shot, flag the group as
needs_reviewin the manifest. Common causes:- User shot a close-up of the next rifle before its hero shot.
- Gap threshold split a single rifle in two (no hero at the “new” group’s start).
- Do not auto-merge or auto-split at this stage — just flag. Manual review is faster than chasing edge cases programmatically.
Phase 4 — Serial number extraction
- For each group, send all images in a single vision LLM call. Prompt asks for structured JSON:
serial_numbers: list of all stamped/engraved numbers visible across images (receiver, barrel, stock cartouche — antiques often have several).maker/model/caliber/markings: best-effort identification.notes: free-text observations.confidence: low / medium / high per serial number.
- Use the API’s JSON-mode / response schema feature so output is parseable.
- Cache responses keyed by group content hash so re-runs are free.
Phase 5 — Output
- Write
manifest.csvandmanifest.jsonwith columns:rifle_id, image_count, image_files, first_timestamp, last_timestamp, serial_numbers, maker, model, caliber, markings, notes, confidence, needs_review, review_reason. - Write a simple
contact_sheet.htmlshowing each group as a row of thumbnails with the extracted data — for fast human review and correction.
Tooling decisions
- Language: Python 3.11+. Mature EXIF / image libs and easy LLM SDKs.
- EXIF:
exifreadorPillow+pillow-heiffor HEIC support. - Vision LLM: Gemini 2.5 Pro or GPT-5 — both handle many images per call and engraved-metal OCR well. Pick whichever you already have an API key for. Estimated total cost at ~50 rifles: under $5.
- No embeddings / no CLIP / no Tesseract — the timestamp signal makes them unnecessary at this volume. Revisit only if the gap heuristic fails.
CLI shape
rifles-sort \
--input ./photos \
--output ./out \
--gap-seconds 8 \
--provider gemini \
--model gemini-2.5-pro \
--skip-llm # phase 1+2 only, for cheap dry-run
--verify-heroes # phase 3
--extract-serials # phase 4
Default behavior: run all phases.
Files to create
rifles/__init__.pyrifles/cli.py— argparse entry point.rifles/ingest.py— file walk + EXIF read + sort.rifles/group.py— gap-based grouping.rifles/llm.py— provider-agnostic vision call (Gemini + OpenAI adapters).rifles/verify.py— hero-shot check.rifles/extract.py— serial extraction prompt + JSON parse.rifles/output.py— CSV/JSON/HTML writers.rifles/cache.py— disk cache keyed by content hash.pyproject.toml— deps:pillow,pillow-heif,exifread,google-generativeaioropenai,jinja2(contact sheet),rich(logs).
Verification
- Dry run on a small sample (10–20 images, 2–3 known rifles):
- Confirm timestamp grouping matches reality.
- Confirm at least one serial number is read correctly per group.
- Full run: spot-check 10% of groups in the contact sheet.
- Edge cases to deliberately test:
- Group whose first image is a close-up (should flag
needs_review). - A rifle with multiple visible serials.
- A worn/illegible serial (should produce low confidence, not a guess).
- Group whose first image is a close-up (should flag
Decisions and scope
- In scope: grouping, serial extraction, manifest output, contact sheet.
- Out of scope (for v1):
- Visual-similarity-based grouping (embeddings/CLIP).
- Auto-merging / auto-splitting flagged groups.
- Web UI for review (HTML contact sheet only).
- Rifle identification beyond what the vision model returns from markings.
- Assumptions:
- EXIF timestamps are present and accurate.
- User has an API key for at least one major vision LLM provider.
- All images are from a single shoot session (no need to handle multi-session merging).
Further considerations
- Gap threshold tuning: 8s is a starting guess based on the 10–20s observed lag. The dry-run phase output should make it easy to see actual intra- vs. inter-group deltas and adjust. Option: auto-detect by finding the bimodal split in observed deltas.
- HEIC handling: iPhone shots are often HEIC.
pillow-heifhandles this, but some LLM APIs prefer JPEG — may need a transcode step before upload. - Cost guardrail: add a
--max-imagesflag so an accidental point at a huge folder doesn’t burn API credits.