AI Learning Lab

AI's 'NanoBanana' Moment: Video Generation Reaches New Heights!

SY8IrZJDkVs
Video2026-05-265:05145 views

Description

Join us LIVE three nights a week for the AI Learning Lab, where Kyle explores breaking news, demos AI tools, and has live Q&A It's all happening in the AI Salon at 9:30 PM ET. RSVP HERE: https://aisalon.mn.co/posts/101413098?utm_source=manual #AI #Gemini #Multimodal #DeepMind #TechInnovation

Chapters

Transcript

0:00 The question was
0:02 um [clears throat]
0:03 I mentioned the Omni model as being
0:05 under hyped. The The person that
0:08 um
0:10 that announced
0:12 the new
0:15 Gemini Omni model which is this
0:19 it it outputs video
0:22 um was Demis Hassabis. And so Demis
0:25 Hassabis is the creator of DeepMind
0:28 which was acquired by Google. He won the
0:30 Nobel Prize for chemistry um for
0:33 AlphaFold for the protein folding um
0:36 work that they've done at DeepMind. And
0:38 And so DeepMind is up up to some
0:40 incredible stuff. Demis Hassabis has
0:42 been
0:43 thinking about AI and AGI and world
0:46 models since he was a little boy. Like
0:49 literally since he was a little boy.
0:50 This is This is kind of his life's work.
0:52 Um
0:56 How people are kind of characterizing
0:58 the new video output of the Gemini model
1:02 is as uh it's kind of like
1:06 nano banana
1:08 it's the nano banana moment for video.
1:10 So if you haven't used nano banana
1:14 it's the
1:15 image generation tool within Gemini and
1:18 and it does text really well and it does
1:20 it's just it's just really good, right?
1:22 And then the new image model inside
1:24 ChatGPT is also really good. And And
1:27 this new video model from Gemini um
1:32 does things really well. It does text
1:35 really well. It does concepts really
1:37 well. The whole model is both a large
1:39 language model. It's It's a true
1:41 multimodal model. It's the first one
1:44 that I know of
1:45 and I could stand corrected, but I don't
1:48 think so. This is the first model that's
1:51 anything in anything out. So you could
1:53 put in a PDF and say turn this PDF into
1:56 a video. You could put in a
1:59 um video and say, "Make this video
2:01 underwater." And it will it will do
2:03 that. The reason that I think it's
2:05 under-hyped is that we now have a model
2:08 that that understands all of what's in
2:11 the large language model, right? But it
2:13 also seems to understand how the world
2:16 works, how things interact with one
2:19 another, and how those things relate to
2:21 the concepts that sit underneath them.
2:24 So, it understands physics, and it
2:26 understands math, and it understands
2:28 history, and it understands art and
2:30 culture.
2:32 It understands lighting and shadows. It
2:34 it it understands how the world works.
2:36 And so, if you think about all of the
2:37 things that you can do with a large
2:39 language model, where if you give it
2:40 more context, if you tell it about a
2:42 company that you're working with,
2:45 you could now say,
2:48 "Why don't we make a video
2:50 of
2:55 Let's say some worker has an idea for a
2:57 new way of manufacturing something.
3:00 Let's make a video of what that might
3:02 look like on the factory floor."
3:06 And make like an explainer video of that
3:09 concept. And then you could say, "Okay,
3:12 now um now make it a spreadsheet that
3:15 that projects the the costs and profit
3:19 if we took that approach."
3:21 And you know, here's all the raw here's
3:23 all the raw data. And it'll it'll create
3:24 that. Now make a video of, you know,
3:28 someone talking about
3:30 the financials as if it were in a
3:33 quarterly meeting.
3:35 Right? You can create
3:38 videos of
3:40 architecture concepts that [laughter]
3:43 are derived from blueprints.
3:46 Um and have them be accurate, right? Um
3:52 So, so one my go-to
3:55 um
3:56 examples that I've always done with
3:57 image models was
3:59 a '70s muscle car in an abandoned
4:02 factory, some version of that.
4:04 And whenever I've done the videos, the
4:06 videos are just not good. The wheels
4:07 don't roll right, the wheels turn the
4:09 wrong direction, the car goes backwards
4:11 instead of forwards, things like that.
4:13 So, this [clears throat] was just a very
4:14 simple prompt about a '69 GTO Judge um
4:19 that that revs its motor, it's driven by
4:21 someone mysterious, and it leaves the
4:22 factory. And so, this is what it
4:24 created. So, pretty good.
4:26 Um here's some little documentary stuff.
4:28 Deep in the lush rainforests of the
4:30 Lesser Antilles lives the incredibly
4:32 rare pickle bird.
4:34 Its remarkable camouflage makes it
4:36 nearly impossible to find.
4:38 >> Yeah, his little buddy.
4:39 >> more elusive is the Skittle's
4:41 salamander. Its skin is adorned with a
4:43 dazzling array of bright spots. They
4:45 serve as a brilliant and effective
4:47 against the colorful backdrop of the
4:49 jungle. So, just just like it did all
4:51 the audio, it it invented the creatures,
4:54 it did the editing. So, so just seems to
4:57 understand things quite well.
4:59 Watch the full replay at
5:01 community.thesalon.ai.