
AI Learning Lab
AI's 'NanoBanana' Moment: Video Generation Reaches New Heights!

Video2026-05-265:05145 views
Description
Join us LIVE three nights a week for the AI Learning Lab, where Kyle explores breaking news, demos AI tools, and has live Q&A
It's all happening in the AI Salon at 9:30 PM ET.
RSVP HERE: https://aisalon.mn.co/posts/101413098?utm_source=manual
#AI #Gemini #Multimodal #DeepMind #TechInnovation
Chapters
Transcript
0:00 The question was 0:02 um [clears throat] 0:03 I mentioned the Omni model as being 0:05 under hyped. The The person that 0:08 um 0:10 that announced 0:12 the new 0:15 Gemini Omni model which is this 0:19 it it outputs video 0:22 um was Demis Hassabis. And so Demis 0:25 Hassabis is the creator of DeepMind 0:28 which was acquired by Google. He won the 0:30 Nobel Prize for chemistry um for 0:33 AlphaFold for the protein folding um 0:36 work that they've done at DeepMind. And 0:38 And so DeepMind is up up to some 0:40 incredible stuff. Demis Hassabis has 0:42 been 0:43 thinking about AI and AGI and world 0:46 models since he was a little boy. Like 0:49 literally since he was a little boy. 0:50 This is This is kind of his life's work. 0:52 Um 0:56 How people are kind of characterizing 0:58 the new video output of the Gemini model 1:02 is as uh it's kind of like 1:06 nano banana 1:08 it's the nano banana moment for video. 1:10 So if you haven't used nano banana 1:14 it's the 1:15 image generation tool within Gemini and 1:18 and it does text really well and it does 1:20 it's just it's just really good, right? 1:22 And then the new image model inside 1:24 ChatGPT is also really good. And And 1:27 this new video model from Gemini um 1:32 does things really well. It does text 1:35 really well. It does concepts really 1:37 well. The whole model is both a large 1:39 language model. It's It's a true 1:41 multimodal model. It's the first one 1:44 that I know of 1:45 and I could stand corrected, but I don't 1:48 think so. This is the first model that's 1:51 anything in anything out. So you could 1:53 put in a PDF and say turn this PDF into 1:56 a video. You could put in a 1:59 um video and say, "Make this video 2:01 underwater." And it will it will do 2:03 that. The reason that I think it's 2:05 under-hyped is that we now have a model 2:08 that that understands all of what's in 2:11 the large language model, right? But it 2:13 also seems to understand how the world 2:16 works, how things interact with one 2:19 another, and how those things relate to 2:21 the concepts that sit underneath them. 2:24 So, it understands physics, and it 2:26 understands math, and it understands 2:28 history, and it understands art and 2:30 culture. 2:32 It understands lighting and shadows. It 2:34 it it understands how the world works. 2:36 And so, if you think about all of the 2:37 things that you can do with a large 2:39 language model, where if you give it 2:40 more context, if you tell it about a 2:42 company that you're working with, 2:45 you could now say, 2:48 "Why don't we make a video 2:50 of 2:55 Let's say some worker has an idea for a 2:57 new way of manufacturing something. 3:00 Let's make a video of what that might 3:02 look like on the factory floor." 3:06 And make like an explainer video of that 3:09 concept. And then you could say, "Okay, 3:12 now um now make it a spreadsheet that 3:15 that projects the the costs and profit 3:19 if we took that approach." 3:21 And you know, here's all the raw here's 3:23 all the raw data. And it'll it'll create 3:24 that. Now make a video of, you know, 3:28 someone talking about 3:30 the financials as if it were in a 3:33 quarterly meeting. 3:35 Right? You can create 3:38 videos of 3:40 architecture concepts that [laughter] 3:43 are derived from blueprints. 3:46 Um and have them be accurate, right? Um 3:52 So, so one my go-to 3:55 um 3:56 examples that I've always done with 3:57 image models was 3:59 a '70s muscle car in an abandoned 4:02 factory, some version of that. 4:04 And whenever I've done the videos, the 4:06 videos are just not good. The wheels 4:07 don't roll right, the wheels turn the 4:09 wrong direction, the car goes backwards 4:11 instead of forwards, things like that. 4:13 So, this [clears throat] was just a very 4:14 simple prompt about a '69 GTO Judge um 4:19 that that revs its motor, it's driven by 4:21 someone mysterious, and it leaves the 4:22 factory. And so, this is what it 4:24 created. So, pretty good. 4:26 Um here's some little documentary stuff. 4:28 Deep in the lush rainforests of the 4:30 Lesser Antilles lives the incredibly 4:32 rare pickle bird. 4:34 Its remarkable camouflage makes it 4:36 nearly impossible to find. 4:38 >> Yeah, his little buddy. 4:39 >> more elusive is the Skittle's 4:41 salamander. Its skin is adorned with a 4:43 dazzling array of bright spots. They 4:45 serve as a brilliant and effective 4:47 against the colorful backdrop of the 4:49 jungle. So, just just like it did all 4:51 the audio, it it invented the creatures, 4:54 it did the editing. So, so just seems to 4:57 understand things quite well. 4:59 Watch the full replay at 5:01 community.thesalon.ai.