Gemini Omni: Google’s Video AI Gets Too Good

3

Tuesday’s I/O conference wasn’t about small tweaks.
Google dropped Gemini Omni.

It’s different from the Veo tool they released earlier.
Veo turns text into video. Fine. But Omni? Omni takes anything. Text, images, existing clips. It eats it all.

The underlying architecture is still Gemini, but the application is distinct. A true multimodal system. In, out, repeat. At launch you get video outputs. Images and text generation will come later. Wait for the update.

AI slop fills our feeds. Meanwhile, the labs are building better simulators.

That’s the tension right now. The feeds are rotting. The technology is improving.
Google calls it a step toward “world models.” Not just guessing. Reasoning. Physics matters now. If you drop a glass in an Omni video, it shatters like glass, not like confused pixels. It grounds the output in reality. We live in.

The scary part is the edit button

You make a video.
You hate it.
You prompt a fix.

Omni accepts the original clip as input. You can swap elements. Change backgrounds. Alter the scene entirely.
Never really been possible before.

It’s impressive. It’s also terrifying.
Deepfakes get easier by the minute. With this much power, changing how someone appears or acts is trivial.
Did Google think of this? Yes. Guardrails exist.
SynthID watermarking. Every output carries a digital signature. It’s not foolproof. It’s a start. A tiny digital tag on a potentially massive lie.

Where do you play with it?

The redesigned Gemini app gets the treatment. One-click templates for your camera roll.
Make an avatar of yourself. Custom voice, custom face. Put it in videos. Weird, right?
Paid subscribers get early access. Google Flow. YouTube Shorts.
Developers get the APIs. Weeks from now.
Enterprise folks? Same timeline. Custom integrations wait in the queue.

Two flavors for now

Split models are standard fare for Gemini.
Omni Flash lands first. Good for quick, lightweight tasks.
Omni Pro? Still cooking. More powerful, more expensive, coming later.

We are watching the shift from generation to simulation.
From making something cool to mimicking reality perfectly.

Where does the line get drawn?