I'm building a video-based learning platform where users can explore a product's supply chain, environmental impact, and health effects—all in an interactive, gamified way. Think Alchemy, where players combine “elements” to create products, or multiplayer quizzes with leaderboards to keep engagement high.
Instead of social media reels, people will be checking out product reels—discovering the fascinating stories behind everyday items through engaging, bite-sized video content that educates while it entertains.
Initially, I planned to generate videos using Veo 3, but with costs ranging from $15–$45 for a single one-minute video, mass adoption wasn't realistic. Video generation will eventually become cheaper, but I didn't want to wait. Instead, I designed a cost-optimized architecture that works now and can seamlessly upgrade to new tech later.
Rather than regenerating similar videos, every generated video is indexed with a detailed description. A semantic search system retrieves and reuses relevant footage, so as the library grows, generation costs approach zero.
Most products share many steps. For example, two chocolate products might have 80% identical supply chain footage. By segmenting videos into reusable steps or effects, I can assemble complete videos from existing clips without regenerating them.
I separate audio, text, and charts from the main visuals. This makes the video clips more reusable—translations become trivial, and real-time data-driven charts can be overlaid, making the learning experience richer than static, fully generated videos.
Even with segmentation, generating every clip as video is costly. To start, I use image sequences + text-to-speech to create videos at a fraction of the cost. Over time, I track which products and segments are most popular, then selectively replace those with higher-quality generated video.
I've already built a working pipeline:
This is powered by Groq LLaMA-4 for ultra-fast inference, and the result is incredibly cost-effective—$0.10 for a first-time generation, and $0 for replays or indexed reuse.
Over time, the platform will evolve from an image-based MVP into a fully cinematic, dynamically assembled video learning system—without ever losing cost efficiency.