Smartphone Photos to Synthetic Training Data: A 3D Reconstruction Pipeline
26th February 2026
This pipeline turns smartphone photos of your home into synthetic training data — RGB images, depth maps, and camera parameters from viewpoints that never existed. You capture photos, reconstruct a 3D model, then render unlimited novel views. Here’s how the five-stage pipeline works.
Stage 1: Validate Photos
The CaptureValidator scores every image on three metrics using OpenCV before anything touches COLMAP:
for each image:
blur = Laplacian(gray).variance() # higher = sharper
bright = mean(gray) # 0-255
contrast = std(gray) # pixel spread
if blur < 30: REJECTED # too blurry
if blur < 60: POOR # marginal
if bright < 40: too dark
if bright > 220: overexposed
if contrast < 20: flat scene
iPhone photos come as HEIC — the pipeline only reads JPG/PNG/TIFF/BMP/WebP. Convert first with macOS built-in sips:
for f in *.HEIC; do sips -s format jpeg "$f" --out "${f%.HEIC}.jpg"; done
You need at least 30 accepted images. The recommendation is 50-100 per room with 60-80% overlap between adjacent shots.
python home_3d_reconstruct.py validate --images ./photos/ --auto-reject
The --auto-reject flag moves rejected images to a rejected/ subfolder so COLMAP never sees them.
Stage 2: COLMAP Structure-from-Motion
COLMAP extracts camera poses and a sparse 3D point cloud. Three sequential steps:
┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
│ Feature Extraction │────▶│ Feature Matching │────▶│ Sparse Mapper │
│ SIFT + affine shape │ │ Exhaustive all-pairs │ │ Bundle adjustment │
│ max 2000px per side │ │ 32768 max matches │ │ → camera poses │
└──────────────────────┘ └──────────────────────┘ └──────────────────────┘
Default config assumes all images come from the same camera (iPhone) with a PINHOLE model. The output lands in sparse/0/ — camera intrinsics, extrinsics, and 3D points in COLMAP’s binary format.
python home_3d_reconstruct.py colmap --images ./photos/ --workspace ./workspace/
Stage 3: 3D Reconstruction
Two backends, four methods:
| Method | Backend | VRAM | Best For |
|---|---|---|---|
splatfacto | Nerfstudio | 8-12 GB | Fast iterations, real-time rendering |
nerfacto | Nerfstudio | 6-8 GB | High-quality mesh export |
nerfacto-big | Nerfstudio | 12+ GB | Maximum quality NeRF |
3dgut-mcmc | NVIDIA 3DGUT | 12-16 GB | Isaac Sim integration |
Nerfstudio runs ns-train for 30,000 iterations (4-hour timeout). If you provide COLMAP data it uses those camera poses directly; otherwise it runs its own ns-process-data preprocessing first. 3DGUT requires pre-computed COLMAP output.
python home_3d_reconstruct.py reconstruct --data ./photos/ --method splatfacto --colmap-path ./workspace/colmap/sparse/0
Stage 4: Export
The trained neural representation gets exported to standard 3D formats via ns-export:
pointcloud → PLY point cloud (1M points sampled from NeRF)
poisson → Watertight mesh via Poisson surface reconstruction
tsdf → Volumetric mesh (256³ voxel grid)
marching-cubes → Isosurface extraction
gaussian-splat → Raw 3DGS format
The PLY writer uses binary little-endian format — 3 floats for position, 3 floats for normals, 3 unsigned bytes for RGB per vertex. An optional decimation step uses vertex clustering to reduce face count.
python home_3d_reconstruct.py export --config outputs/home_3d/splatfacto/config.yml --format pointcloud
Stage 5: Generate Training Data
This is where the pipeline pays off. Given a point cloud, it generates novel-view synthetic data that never existed in the original photos.
Camera poses are distributed using a Fibonacci sphere — uniform angular spacing without clustering at the poles:
golden_ratio = (1 + √5) / 2
for view i (0 to num_views):
θ = 2π * i / golden_ratio # azimuth
φ = acos(1 - 2*(i+0.5)/n) # polar angle
r = random(1.0, 5.0) # orbital radius
y = random(0.5, 2.5) # camera height
position = [r*sin(φ)*cos(θ), y, r*sin(φ)*sin(θ)]
look_at(scene_center + jitter)
Each view is rendered with a z-buffer point cloud splatter — project every 3D point through the camera intrinsics, depth-sort, and splat with cv2.circle(). Holes get filled with morphological closing (3x3 ellipse kernel).
Three augmentations are applied per frame:
- Lighting — brightness α∈[0.7,1.3], contrast β∈[-20,20], per-channel color shift ±10
- Noise — Gaussian σ∈[0,10] on RGB, σ=0.02 on depth
- Viewpoint — small 2D rotation ±2° (simulates camera shake)
python home_3d_reconstruct.py generate-data --pointcloud exports/point_cloud.ply --num-views 500
Output structure:
training_data/
├── images/ view_00000.png ... view_00499.png (augmented RGB)
├── depth/ depth_00000.npy + depth_viz_00000.png (per-pixel depth + Viridis colormap)
├── annotations/
│ ├── camera_00000.json ... camera_00499.json (4x4 extrinsics + 3x3 intrinsics)
│ └── instances.json (COCO format)
└── metadata.json (scene bounds, point count, config)
200 views is the default. For training, 500+ gives better coverage. Each view includes the full camera matrix so you can project between depth and world coordinates.
Full Pipeline
One command runs all five stages sequentially:
python home_3d_reconstruct.py full-pipeline \
--images ./photos/ \
--output ./home_3d_output/ \
--method splatfacto \
--num-training-views 500
Summary
| Stage | What It Does | Command |
|---|---|---|
| Validate | Score blur/brightness/contrast, reject bad images | python home_3d_reconstruct.py validate -i ./photos/ |
| COLMAP | SIFT features → exhaustive matching → sparse SfM | python home_3d_reconstruct.py colmap -i ./photos/ -w ./work/ |
| Reconstruct | Train NeRF or 3D Gaussian Splatting (30K iterations) | python home_3d_reconstruct.py reconstruct -d ./photos/ -m splatfacto |
| Export | Sample 1M points → PLY/mesh/splat | python home_3d_reconstruct.py export -c config.yml -f pointcloud |
| Generate | Fibonacci camera orbits → z-buffer render → augment | python home_3d_reconstruct.py generate-data -p scene.ply -n 500 |
Validate ensures COLMAP gets clean input. COLMAP gives you camera poses. The reconstruction learns the full 3D scene. Export converts it to geometry. Generate creates unlimited training views from angles you never photographed.
Links
- Repo: github.com/avparkhi/home-3d-reconstruct
- Based on: NVIDIA Isaac Sim Real-to-Sim Workflow
- Nerfstudio: docs.nerf.studio
- COLMAP: colmap.github.io
- 3DGUT: github.com/nv-tlabs/3dgrut
More recent articles
- OpenUSD: Advanced Patterns and Common Gotchas. - 28th March 2026
- OpenUSD Mastery: From Composition to Pipeline — A SO-101 Arm Journey - 25th March 2026
- Learning OpenUSD — From Curious Questions to Real Understanding - 19th March 2026