Assignment submission

Tracked footage for review.

GitHub: EdgeVTP visualization
Some notes on visualization

What I got working

A full pipeline that takes raw highway video, detects cars with YOLO plus ByteTrack, feeds their movement history into EdgeVTP, and draws where the model thinks each car will go over the next 5 seconds.

How frames turn into model inputs

I grab the bottom-center of each car's bounding box as its anchor point, sample positions at 5Hz to match the training data, and build 15-step history windows. Coordinates stay in pixels because the Carolinas model was trained that way.

Sanity checks for live video

  • Trajectory clipping keeps predictions within reasonable lengths (60-135% of current speed).
  • Bearing alignment fixes predictions that suddenly flip direction.
  • Filters skip jittery or teleporting tracks.
  • An option limits predictions to cars coming toward the camera.

Limitations

The model was trained on clean, sparse scenes with 2-3 cars, but live video has 10+ cars and noisy tracks. With isolate_agents=True predictions get jittery, but with isolate_agents=False crowds cause weird swerves. Trajectory clipping hides how bad raw predictions really are, which is why I added --no-traj-clip to see the truth. If YOLO misses a car or a bounding box jumps, the model thinks the car teleported. And since there's no ground truth for live video, I can only eyeball whether the predictions look right.

  1. Carolina highway (better detected)

  2. Carolina highway (fairly detected)