
AI Learning Lab
Feb 15, 2024 - (1 of 2) Sora: OpenAI's Mind-Blowing Text-to-Video Model

Video2024-03-1044:0719 views
Description
In this captivating live presentation, Kyle Shannon dives into the groundbreaking advancements of OpenAI's new text-to-video model, Sora, which promises to revolutionize the way we create and experience visual storytelling. With a focus on the model's ability to generate realistic and imaginative scenes from simple text prompts, Kyle discusses its potential to transform creative industries, highlighting its impressive coherence in character movement and scene composition. He emphasizes the model's deep understanding of language and real-world physics, while also acknowledging its current limitations, such as occasional inaccuracies in physical interactions. As he showcases various stunning video examples, Kyle invites viewers to consider the implications of such technology on filmmaking and content creation, sparking a dialogue about the future of AI in the arts.
For more insights and updates, check out the AI Learning Lab on TikTok: [AI Learning Lab](https://tiktok.com/@aiLearningLab).
#AI #OpenAI #TextToVideo #Sora #Filmmaking #CreativeTechnology #Innovation #artificialintelligenceconference
Chapters:
00:00:00 OpenAI Sora Launch
00:07:30 Sora
00:13:40 Sora Red Teaming
00:14:50 Sora and AGI
00:26:45 Importance of Storytelling
00:35:15 Sora's
00:37:45 Sora Safety Features
00:41:00 Sora's Diffusion Model
00:42:00 Sora and the Real World
Chapters
Transcript
0:29 e 0:59 e 1:10 all right good people of the tickety 1:13 talk was 1:16 happening um okay so here's what we're 1:20 doing this special live presentation of 1:23 the AI learning lab brought to you by 1:25 open 1:26 AI Sam alman's jealous of Kyle Shannon 1:29 but Kyle still Hawks his Wares anyway 1:33 all right let's 1:36 see where is my mouth where's my mouse 1:39 people listen I got to get rid of the 1:42 hey Kyle is this your Tick Tock to work 1:45 today yes it it's exactly what it is uh 1:50 all right let's see am I the only one 1:53 screaming inside right now holy crap 1:56 this this is a big deal 1:59 so I I I have made this statement now a 2:02 number of times 2023 the last normal 2:05 year um and then the month of January 2:08 went by without a 2:11 substantive um 2:12 announcement and then we're halfway 2:15 through February and we haven't seen one 2:16 that I think is a really big deal well 2:19 this just showed up so um this is a 2:22 really big deal um how many people we 2:24 have in here 77 so do me a favor did you 2:27 get access no they're just going into 2:29 red teaming right now what site is this 2:30 so this is 2:32 open.com Sora s o r a it's their new 2:37 text to video model it looks like you're 2:39 going to be able to do up to 60c videos 2:43 right now they're only showing 20 second 2:45 videos but when I show you the examples 2:48 here holy 2:50 [ __ ] holy [ __ ] is it mind-blowing so do 2:53 me a favor share this 2:55 live uh share it with anyone you think 2:58 might be into this or even people who 3:01 are not into 3:02 this and then I'm just going to like I 3:05 haven't read the whole thing yet but 3:06 this is in looking at the 3:08 examples I'm like I'm like I gotta go 3:12 live I got to go live and just talk 3:13 about it this is this is a a big [ __ ] 3:16 deal this is like Runway ML and paa at 3:20 this point I was looking at them as 3:24 toys with potential right I mean you 3:27 could do you could absolutely do 3:29 storytelling with ML like if you look at 3:31 some of the work that was done for the 3:32 Gen 48 video contest even in the AI 3:35 Salon like that stuff was good but this 3:39 is um this is different this is 3:41 different what's the link again 3:45 open.com a can I put text on screen here 3:51 anywhere flip camera quick pull draw and 3:57 guess 3:59 comment 4:00 what's that do oh I just am making a 4:05 comment uh oh how do I get out of this 4:08 oh there we go okay um I was like why is 4:11 he live now oh [ __ ] something is 4:12 happening yep something's happening okay 4:16 all right oh pin a chat 4:20 um I don't know how to pin stuff I I 4:23 usually have mods for that well if any 4:25 mods are in here um do me a favor and 4:30 pin a chat uh that is 4:33 open ai.com 4:38 how you don't get access to it right now 4:41 as far as I know okay so creating video 4:44 from text Sora is an AI model that can 4:47 create realistic and imaginative scenes 4:49 from text instructions all videos on 4:51 this page were generated directly by 4:53 Sora without any 4:56 modification um comment and pin pin it 4:59 yourself all right let's see 5:03 comment 5:05 uh 5:07 open 5:08 ai.com 5:12 SLS 5:15 a and 5:17 then pin all right there you go is that 5:20 pinned do you see it pinned I don't see 5:23 it 5:25 pinned yeah oh my God I know I 5:28 know 5:31 pin this 5:32 comment couldn't pin the comment you can 5:35 try 5:36 again I can't pin it well if if someone 5:40 else can pin it great anyway uh it's in 5:43 there I'm looking at the same time how 5:45 cool yeah feel free to obviously go to 5:47 the site but I'll I'll I'm going to go 5:49 through this whole page here because 5:50 there's a 5:51 lot okay we're teaching AI to understand 5:54 and simulate um real world motion the 5:57 goal of the model is to help solve 6:00 problems our text to video model Sora 6:02 can generate videos up to a minute long 6:05 while maintaining visual quality and 6:07 adherence to The users's Prompt holy 6:10 [ __ ] sometimes you can't pin URLs 6:14 oh oh and I but it wasn't even a URL oh 6:17 there you go oh open AI do so let me do 6:19 okay uh hang on so I'll do um 6:23 comment 6:25 open Ai and then I'll do parentheses d o 6:32 t c o m SLS o r 6:38 a boom and then let me see if I can pin 6:42 that there we go okay so there's the 6:46 pin all right that's where I am what's 6:49 going on so hey Danielle um what's going 6:52 on is why are we live um open AI 6:57 just announced it's not live yet but 7:00 they just announced their text to video 7:03 model and it looks Bonkers good so I'm 7:07 just going to go through the 7:09 announcement hey Danielle um I'm just 7:11 going to go through the announcement and 7:12 we're going to look at all the video 7:14 examples because they're just 7:16 crazy um and there's a lot of them and 7:20 then we're going to read about it so 7:21 right now apparently this is going into 7:23 red teaming which is where they're doing 7:25 internal testing for for safety it's 7:28 gorgeous so this is okay prompt a 7:31 stylish woman walks down a Tokyo Street 7:33 filled with warm glowing neon animated 7:36 City signage she's wearing a black 7:37 leather jacket a long red dress and 7:40 black boots 7:42 more and carries a black purse she wears 7:44 sunglasses and red lipsticks she walks 7:47 confidently and casually the street is 7:49 damp and reflective creating a mirror 7:51 effect of the col colorful lights many 7:54 pedestrians walk about notice her legs 7:57 don't look you know how walking on a lot 8:00 of these models just looks weird like 8:03 she actually looks like she's got legs 8:06 and they're moving normally it looks 8:08 like the bottom of her shoes get thicker 8:11 occasionally 8:13 but the prompt coherence here is kind of 8:16 insane I mean the the the visual visual 8:20 coherence this is one clip this is a 59 8:24 second 8:26 video this was generated from a prompt 8:30 with no edits with no modifications 8:36 apparently um again I I don't know if 8:40 any of this shit's true but this is you 8:42 know 8:45 open.com okay that's one here's the next 8:48 one let me go to this prompt several 8:51 giant woolly 8:53 mammoths these look a little odd it's a 8:57 10-second clip 9:00 the physics on 9:02 their legs is a little odd but it's not 9:07 bad like I feel like we've come to 9:10 forgive just absolute 9:13 weirdness and this is not I like I don't 9:15 immediately go to weird I I immediately 9:17 go to is that a puppet but I don't go to 9:20 that's AI this one was crazy I saw this 9:22 one before this one is a movie trailer 9:25 featuring The Adventures of a 9:26 30-year-old Spaceman wearing a red wool 9:29 knitted motorcycle helmet which he has 9:33 blue sky salt desert cinematic style 9:36 shot on 35 M millimeter film Vivid 9:39 colors 17 seconds 9:44 long do you get to determine the length 9:47 of the video I don't know minor 9:49 jankiness yeah this is this is minor 9:52 jankiness 9:54 but this is really flipping close 9:58 people this is really flipping 10:05 close and I I gotta tell you you know 10:08 when they when they start rolling in 10:10 thank you Roberto when you start rolling 10:12 in things like music generation and you 10:15 know script generation and voice 10:17 generation right you're these things we 10:20 are not we are probably by the end of 10:23 the year we're going to be able to 10:24 generate these things you know with 10:28 other multimedia elements in them all 10:30 right like I and by the way I haven't 10:31 even scr scratched the surface this one 10:34 is staggering like watch the lighthouse 10:36 here it it um normally when you see 10:40 stuff like 10:41 this something like the lighthouse is 10:44 just going to go away like like it'll 10:46 it'll morph and get weird but it's 10:48 staying very very consistent as are the 10:51 people right 10:53 there as are the 10:56 waves like there's no weird morphing 11:00 like you normally see with water scenes 11:03 and look how coherent the cliff is 11:05 staying [ __ ] 11:09 Bonkers I 11:12 mean I looked at Runway stuff and Pika 11:15 stuff and I'm like if I were a filmmaker 11:17 I'd look at that stuff and I'd be like 11:19 yeah I'm not worried if I were if I were 11:22 a drone cinematographer and looked at 11:25 this I'd be like oh [ __ ] guess that's 11:28 over 11:30 [ __ ] crazy animated scene of a 11:33 closeup of a short fluffy monster 11:35 kneeling beside a melted red candle the 11:37 art style is 3D and realistic focus on 11:42 lighting the flame doesn't seem real but 11:47 it's a it's an 11:48 animated thing so I think you could 11:51 forgive 11:52 that a gorgeously rendered papercraft 11:56 world of a coral reef rif with colorful 12:00 fish and sea 12:02 creatures 20 12:06 seconds Amelio's wife shared the live 12:09 thank you there's only 41 people in here 12:11 be like ao's wife share this 12:15 live this is crazy crazy crazy 12:19 crazy cio's wife share this 12:23 live this is crazy crazy crazy 12:28 crazy 12:38 I may need to take that I may need to 12:39 come back in uh uh 12:42 [Music] 12:43 uh photo realistic closeup video of two 12:46 pirate ships battling each other as they 12:49 sail inside a cup of 12:51 coffee wow 12:54 wow there's a little jankiness on the 12:56 right 12:58 ship 13:02 right there it sort of pops out of the 13:06 water but 13:10 damn I love that one very nice yeah that 13:13 one's 13:14 cool a young 13:16 man at his 20s sitting on a piece of 13:20 cloud in the sky reading a book there's 13:22 some jankiness with the pages but look 13:25 his hand looks good no flickering on the 13:28 face it almost looks like he said 13:30 something there he's definitely sitting 13:32 on a 13:34 cloud all right so that's the first set 13:37 of 13:39 examples today Sora is becoming 13:42 available to Red teamers so red teamers 13:45 are people who test for critical areas 13:48 for harm or risk we are also granting 13:51 access to a number of visual artists 13:53 designers and filmmakers to gain 13:56 feedback on how to advance this model to 13:58 be most helpful for Creative 14:00 professionals that's actually a smart 14:02 move right getting the creative cuz cuz 14:05 this is absolutely a threat to that 14:08 category so getting them involved is 14:10 smart I don't know that they're going to 14:13 be into it but what whatever we're 14:15 sharing our research project early to 14:17 start working with and getting feedback 14:20 from people outside of open Ai and give 14:23 the public a sense of what AI 14:25 capabilities are on the horizon remember 14:28 I've been talking about for the last two 14:31 weeks Sam 14:33 Alman talks about chat gp4 being best 14:37 understood as as a preview and progress 14:41 here is not linear meaning when this 14:44 thing gets way better it's going to get 14:46 way better on a lot of 14:48 fronts um this is going to cost a lot 14:51 more it may um historical footage of 14:54 California during the Gold Rush that's 14:57 pretty amazing 15:01 look at the coherence of like the the 15:04 people are not morphing into other like 15:08 look at that horse right there or that 15:12 person walking look at these horses here 15:15 like they're not morphing into other 15:19 things this is pretty crazy pretty crazy 15:27 good um Amelio's wife Story vine what's 15:31 I don't know what you mean by that 15:32 that's my company close-up view of a 15:35 glass sphere that has a Zen Garden 15:37 within it hang on I got to check 15:39 something let me see why Monique was 15:41 calling 15:47 me let me just ask her hang 15:56 on and then let me grab 16:00 gra this hold please hold 16:28 please 16:35 all right um zen garden there's a small 16:38 dwarf in the sphere let's see wonder how 16:41 oh wait uh I hope that includes 16:45 you oh I hope that you and our sto Story 16:48 vine are included in the red team 16:50 release I don't know we've established 16:52 that that Sam Alman is jealous of me so 16:54 I would assume I'm not going to have 16:55 access it would be nice if we had access 16:58 what's the hot news the hot news is 17:02 open.com Sora s o a and I'm going 17:06 through all the all of this release it's 17:09 not live yet but it's being released to 17:12 red teamers and it's being released to 17:14 creative professionals to get input and 17:17 they're they're they're releasing this 17:19 early to give us a sense of what's 17:22 coming so if I were Runway ML and if I 17:24 were PBS right now I'd be pooping my 17:27 pants and if they've got something 17:29 that's close to this that they haven't 17:30 released yet this will probably Inspire 17:33 them to release it early all right so 17:35 there's another video here's another 17:37 one wow holy [ __ ] extreme closeup of a 17:42 24-year-old woman's eye 17:44 blinking standing in maresh during magic 17:48 hour oh my god look the there's maresh 17:51 in the reflection of her eye cinematic 17:54 film shot in 70 mimer depth of field 17:57 Vivid colors cinematic 17:59 15 seconds of a closeup of an eye with 18:01 the reflection of a specific location at 18:03 hey cam Katen looks so 18:06 real 18:08 um cartoon kangaroo of disco 18:13 dances um do me a favor if there's a mod 18:17 in here um put in the comment open a 18:21 open and don't put the dot like write 18:24 the dot in parenthesis do in 18:27 parentheses. Sora and pin that so that 18:30 the url's pinned for people to check out 18:33 a cartoon kangaroo disco 18:38 dances thank you very much I appreciate 18:40 that I'm late what are we looking at 18:42 we're looking at open ai's new text to 18:45 video model that 18:47 looks 18:49 stunning like you know me if it looks 18:51 like [ __ ] I'll tell you it looks like 18:52 [ __ ] 18:54 this this is just the the teensiest of 18:57 jankiness now 18:59 we don't get to play with this yet it's 19:01 just it's just in 19:02 testing but there was a little janky 19:05 there I saw some foxes earlier that sort 19:08 of meld into one another so it's still 19:10 going to be janky but like compared to 19:12 Runway and Pika and the and the stable 19:15 video stuff this is crazy you can do up 19:19 to 59 seconds of video with a single 19:23 prompt and it's doing cuts it looks like 19:26 it's doing Cuts within a single video 19:28 like edits within a single video this is 19:31 a beautiful homemade video showing 19:34 people of losos Nigeria in in the year 19:37 2056 shot with a mobile phone 19:42 camera crazy thank you ao's 19:46 wife um all right a petri dis dish with 19:50 a bamboo 19:51 forest and tiny red pandas running 19:54 around yeah this is janky and weird 19:55 they're just they're materializing 19:58 they're materializing out of nowhere 20:00 from behind the behind the uh thing but 20:04 that's it's still that's pretty 20:07 decent in a Petri 20:11 dish 20:14 huh and they all only have two 20:17 legs that's got some Jank in 20:21 it the plastic chair is 20:25 epic all right let's see 20:29 the camera rotates around a large stack 20:31 of vintage television showing different 20:33 different programs 1950s Sci-Fi movies 20:37 horror movies news static a 1970s 20:42 sitcom inside a New York City Museum 20:47 Gallery so the camera's panning but it's 20:49 s it's panning really 20:52 slow but you got 20:57 horror there's horror there's 21:02 horror there's maybe a 21:05 sitcom that's crazy this is crazy people 21:10 good Jank something for us to feel 21:12 nostalgic about later exactly I'm 21:15 telling you the reason I'm going live is 21:17 this is the first thing I've seen in 21:19 2024 that starts to feel like 2024 is 21:22 going to get really 21:24 weird 21:26 because this is now getting I mean I I 21:29 would I would consider that janky enough 21:32 that it definitely looks AI but a lot of 21:34 this other stuff doesn't Sora is able to 21:37 generate complex scenes with multiple 21:39 characters specific types of motion and 21:42 accurate details of the subject and 21:44 background the model understands not 21:46 only what the user has asked for in The 21:49 Prompt but also how those things exist 21:51 in the real world all right don't know 21:54 how they're doing 21:56 that look at this the the dust looks a 22:00 little 22:01 fake but like you'd have to be looking 22:04 hard at 22:06 it but the terrain looks real there's a 22:11 little bit of physics weirdness in the 22:14 the the car sort of going around the 22:16 corner 22:18 weird but that's pretty [ __ ] 22:22 crazy this is Reflections in the window 22:25 of a train traveling through Tokyo sub 22:28 look at the reflection of the woman when 22:30 they go 22:31 past a building watch this right here 22:35 right there holy 22:41 [ __ ] that looks like real footage to 22:46 me that you could absolutely pass off 22:48 for real 22:51 footage that reflection right there you 22:54 could absolutely pass that off pass that 22:56 off as real footage it makes me wonder 22:58 if they picked up Google Maps content 23:00 it's pretty good yeah I I honestly don't 23:02 know I'm assuming they're using some 23:04 sort of diffusion technology which if 23:06 they are it's not they're not 3D mapping 23:09 stuff I don't think but I I have no 23:12 [ __ ] idea 23:13 people good 23:16 Lord a drone camera circles around a 23:19 beautiful historic church built on a 23:21 rocky outcropping on the Amalfi 23:24 Coast The View showcases historical and 23:27 magnificent oh wait there's 23:30 more architecture details and tiered 23:33 Pathways and patios waves are seen 23:35 crashing against the 23:39 shore several distant people are seen 23:42 walking and enjoying Vistas on 23:45 patios I mean just as it's panning all 23:47 of these people are staying coherent 23:49 they're not morphing into one another 23:51 they're not turning into weirdness that 23:53 thing isn't turning into some weird 23:55 penis shape well it already is but 24:00 not more explicitly makes me wonder oh 24:04 yeah if they picked up the Google thing 24:05 I don't think so reference libraries I 24:09 yeah who 24:10 knows I I mean it looks like so this is 24:13 a large red octopus is seen resting on 24:15 the bottom of the ocean floor blending 24:18 into the Sandy Rocky terrain its 24:21 tentacles are so let's just look at the 24:25 weirdness 24:27 here the crab's got a little octopus 24:30 flowiness to 24:32 him but like let's look at the octopus 24:34 the the legs seem pretty 24:37 coherent they're oh yeah they're not 24:39 quite an anatomically they have like 24:42 suckers on both sides of the leg rather 24:44 than just on the bottom and the the 24:46 suckers right here kind of animate along 24:48 the leg right wait where is it right 24:51 there they did 24:54 it right there see how they just zipped 24:57 around the leg 24:58 but holy crap holy crap this is good 25:03 like this is 25:07 close a flock of paper air Flames fluts 25:10 through a dense jungle weaving around 25:12 trees as if they were migrating 25:17 birds 20 seconds of 25:20 this it's happening it is happening 25:23 Danielle wow that looks real yeah this 25:25 is okay so if you're new here there's 25:27 not a ton of people here not a ton of 25:28 new people but what we're looking at is 25:31 open AI there you go Danielle got it 25:34 open.com Sora is this 25:38 site 25:40 um let me check something here CU I am 25:44 at work could you please share a sample 25:46 of the higher level output yep 25:49 okay got it got 25:55 that okay that's something I got to pay 25:57 attention to all 26:04 right love the way you share new 26:06 technology thank you very much I 26:07 appreciate it no this is exciting we're 26:09 going to need a larger hard drive we 26:11 certainly 26:12 are I mean 26:18 listen with the current state of video 26:24 technology you actually have to work 26:26 pretty hard to get it to be not janky 26:28 enough to be able to tell a 26:31 story 26:34 this but like like like this Step Up in 26:38 quality means that way more people are 26:41 going to be able to tell video stories 26:43 than could 26:45 before cuz right now you still have to 26:47 be a decently accomplished video 26:51 Storyteller to understand how to edit 26:53 around shitty shitty content okay that's 26:57 weird and janky oh yeah her her arm just 27:01 turned into a 27:04 pillow look at her head just goes and 27:07 then her arm turns into a pillow all 27:09 right that's Jank ified the the cat has 27:13 a single paw turns into a double paw 27:18 but if you if you take out the weirdness 27:21 like you 27:22 could right okay after that right here 27:26 you could probably cut to that see there 27:28 the extra arm came in all right that one 27:30 that one I would redo I don't know why 27:31 they use that one as an example that 27:33 feels like a better version of 27:38 Runway but look at this look at the 27:40 coherence of these birds this is 27:43 crazy and look at the coherence of the 27:46 water normally in Runway the water's 27:48 doing some weird ass thing where it's 27:50 like morphing into the log or something 27:52 like that this must be what they're 27:54 talking about where it understands the 27:55 3D relationships to the objects 28:00 this is absolutely amazing this is 28:01 absolutely amazing that's that that is a 28:04 yes who said that fly life campaign yes 28:08 that is the perfect description this is 28:10 absolutely 28:12 amazing and and like the coherence in 28:15 this Dragon at this parade and the 28:17 coherence of the 28:21 people yeah this is not like anything 28:24 else we've got right now by the way this 28:27 is open AI thanks thanks Danielle for 28:28 putting that up hey 28:31 everybody all right that's that one all 28:34 right the model has deep understand 28:37 understanding of language enabling it to 28:40 accurately interpret prompts and 28:42 generate compelling characters that 28:44 Express vibrant emotions Sora can also 28:48 create multiple shots within a single 28:50 generated video so they're doing editing 28:53 within the video or it'll appear that it 28:56 would be you're editing together 28:57 multiple 28:59 shots within a single video that 29:02 accurately persist characters and visual 29:06 style that line right there that 29:09 actually is it free we don't know yet I 29:10 well I assume it won't be free I assume 29:13 this will be part of chat 29:16 GPT like you're going to be able to 29:18 generate this kind of video I'm assuming 29:21 just like you create Dolly images right 29:23 now inside chat GPT you'll be able to do 29:26 this within it is my guess I'm 29:28 speculating welcome to chat 29:31 TMZ I mean if I'm filming a Purina 29:37 commercial other than the fact that the 29:39 snow's a little weird on their 29:43 nose you could get away with that if you 29:46 if you had 29:47 to if you had to you know cut cut a shot 29:51 in we want puppies in the snow because 29:53 we're in 29:56 Colorado 30:00 I suspect this will make Puppy and red P 30:02 panda videos for fun I like it that's 30:05 not bad all right let me go look at the 30:08 other ones 30:10 here this one's pretty amazing tour of a 30:13 Galler Gallery with many beautiful look 30:16 it just cut to a different room that's 30:18 within the same 30:24 video wow though holy cow Dena I know 30:27 Master path to freedom I know crazy 30:29 right all right let's look at this one 30:32 there's there's one I think it's in here 30:33 that I saw this is remarkable notice 30:36 that their legs when they're walking 30:37 actually look like legs they don't look 30:39 like they're floating in a weird way you 30:42 cut that into a movie you you tell me if 30:44 someone would suspect that's 30:47 AI I don't think 30:50 so beautiful snowy Tokyo is bustling the 30:54 camera moves through the bustling City 30:56 street following several people enjoying 30:58 the beautiful snowy weather shopping at 31:02 nearby stalls gorgeous Sakura pedals are 31:05 flying through the wind along with the 31:08 snowflakes so there's the Sakura 31:14 pedals some snow 31:16 falling you could totally get away with 31:18 that in a movie or in a 31:21 video Sam Altman asked for 7 billion no 31:25 retr punk Sam Altman is asking for seven 31:27 trillion he's trying to raise between 5 31:30 and 7 trillion 31:32 dollar to to out compete 31:37 Nvidia 31:39 trillion pricing and access model have 31:41 not been specifically mentioned no they 31:43 haven't uh oh let's keep going because 31:46 the next thing just said something about 31:47 the model's weakness that that'll be 31:49 interesting to 31:51 hear stop motion anim animation of a 31:54 flower growing out of a window sill of a 31:56 Suburban House let's see see what gets 31:58 weird here if 32:03 anything weird camera motion but like I 32:07 think you can get away like that doesn't 32:09 look real that looks geometrically weird 32:12 and the it looks like plastic rather 32:15 than Pottery but 32:18 damn story of A robot's life in a 32:21 cyberpunk setting oh look so the prompt 32:24 here the story of a of a robot's 32:28 life in a cyberpunk setting so this is 32:30 19 seconds there's the third 32:33 scene fourth 32:40 scene so 32:44 one 32:52 two that's a weird camera move 32:56 three 32:58 so four Cuts so it understands that 33:01 you're doing a narrative and in film 33:03 it's cool but we'll wait to see how it 33:06 looks outside of the curated results 33:08 yeah yeah absolutely Pate totally agree 33:11 and and there's even some things in here 33:12 Pate as you can imagine that are um why 33:16 would they have included 33:18 that so I guess I guess maybe they 33:21 they're like why don't we put in four 33:22 things that look shitty does it talk it 33:25 doesn't talk yet but we're not we're not 33:26 that far from that I would I would think 33:28 that by the end of the year we'll have 33:30 videos that have either audio behind 33:32 them or even scripts written about them 33:35 this is an extreme closeup of a 33:37 gray-haired man with a beard in his 33:41 60s in deep thought pondering the 33:44 history of the universe as he sits in a 33:46 cafe in Paris his eyes focus on 33:49 people off screen as they walk he sits 33:53 mostly motionless he's dressed in a wool 33:56 coat yep but button- down shirt he wears 33:59 a Brown Beret yep glasses and has a very 34:02 professional appearance yep in the end 34:05 offers a subtle closed mouth Smile as if 34:09 he found the answer to the meaning of 34:10 life let's see if he does 34:15 that nope no weird smile but 34:20 that's depth of field cinematic that's 34:24 that's pretty [ __ ] 34:26 good 34:28 realberg all right that's that 34:31 one that looked 34:33 fake beautiful silhouette animation okay 34:36 it's an animation so you could argue 34:37 it's stylized there one wolf just turned 34:40 into 34:45 two there it's walking weird that's got 34:47 a little Jank to 34:50 it this has got weird physics but it's 34:52 still pretty [ __ ] 34:56 cool 34:59 so like all of these things are in water 35:01 and then you've got the surface of the 35:03 water down here so like there's just 35:05 weird like there's a surrealism to this 35:07 one which if you want it surreal this 35:10 works there's the puppies all right the 35:14 current model has weaknesses it may 35:16 struggle with accuracy of simulating the 35:18 physics that's what we've noticed of a 35:20 complex scene it may not understand 35:23 specific instances of cause and effect 35:26 for example a person might take a bite 35:28 out of a cookie but afterwards the 35:30 cookie might not have the bite mark so 35:33 so um continuity errors we get those in 35:35 real films 35:44 [Music] 35:47 anyway um the model may also confuse 35:51 spacial details of a prompt for example 35:53 mixing up left and right May struggle 35:55 with precise descriptions of a 35:58 events like following a specific camera 36:02 trajectory all right let's go look at 36:07 these scene of a person running 36:10 cinematic film oh weakness Sora 36:13 sometimes creates physically impossible 36:19 motion well it's actually that one's 36:21 actually kind of fun cuz it looks like 36:24 he's he's on a a car that runs by you 36:26 running on 36:27 [Laughter] 36:29 it this is this is cool that they're 36:31 showing the weaknesses ah okay prompt 36:34 five the weakness animals or people can 36:36 spontaneously appear notice that they're 36:38 they're all emerging out of the other 36:40 puppies all 36:44 right an example of inaccurate physical 36:49 modeling basketball through hoop that 36:54 explodes yeah oh yeah that the ball just 36:56 went right through the like that's 36:58 pretty good up to right there actually 37:02 up to there you're fine there it's 37:05 bad still pretty [ __ ] good good Lord 37:09 okay that looks weird Sora fails to 37:12 model a chair as a rigid object there 37:14 you go inaccurate physical interactions 37:18 yep oh yeah look at the chair flying 37:21 around it's possessed catch 37:26 it just a flesh wound simulating complex 37:30 interactions between objects and 37:32 multiple characters is often 37:35 challenging she she missed blowing out 37:37 the candles I've seen that in real life 37:40 before I'm like how can you not blow out 37:45 candles all right 37:47 safety this is interesting we'll be 37:50 taking several important safety steps 37:52 ahead of making Sora available in open 37:55 ai's products so that means on the API 37:59 side hopefully that means in chat GPT 38:01 we're working with red te red teamers 38:04 domain experts in areas like 38:06 misinformation hateful content and bias 38:09 who will be adversarially testing the 38:11 model we're also building tools that 38:13 detect misleading 38:15 content uh such as detector classifi so 38:18 that we can tell the video is generated 38:20 by Sor we plan to include c2p metadata 38:23 in the 38:26 future um in addition to us developing 38:28 new techniques we're leveraging our 38:31 existing safety models for example once 38:33 in an open AI product our text 38:35 classifier will check and reject text 38:39 input prompts that are in violation of 38:41 our usage they're already doing that 38:44 we'll be engaging with poly policy 38:46 makers and 38:49 Educators and then here's that one's got 38:51 some weird physics going on and some 38:53 weird 38:55 biology does he walk around 38:58 that yeah he just kind of walked through 39:01 that shutter all right that one's 39:06 weird that one is actually really 39:12 good cute the water looks a little a 39:22 ie that's not how they move their feet 39:24 but still cute that's cute 39:27 dog taking a 39:31 selfie okay the seagull was 39:36 weird okay that's weird uncanny 39:40 valley 39:43 that's that's not bad but it's got weird 39:48 contrast that's 39:50 weird that's 39:55 weird 39:58 that's a little 40:02 weird that's pretty 40:06 good H wow the scales are funny reality 40:10 bending yeah crazy crazy all right 40:13 research techniques it's a diffusion 40:15 model which generates video by starting 40:17 off with one that looks like static 40:19 noise and gradually transform so this is 40:21 the same Tech of all the image 40:23 generation tools we play 40:25 with cap of generating entire videos at 40:28 once or extending a generated video to 40:30 make them 40:33 longer we represent videos and images as 40:36 collections of smaller units of data 40:38 called 40:39 patches Each of which is akin to a token 40:42 in 40:43 GPT by unifying how we represent data we 40:46 can train diffusion 40:48 Transformers on a wider range of visual 40:51 data than was possible before spanning 40:54 different durations resolutions and 40:56 aspects ratios so it sounds like they're 40:58 breaking elements up within the image 41:01 and kind of rendering them independently 41:04 somehow together that's weird and 41:07 cool Sora Builds on past research in Di 41:11 and GPT 41:13 models it uses the rec captioning 41:16 technique from Dolly 3 which involves 41:18 generating highly descriptive captions 41:21 okay so they're rewriting your prompt 41:23 for you as a result the model will be 41:26 able to follow the user's text 41:27 instructions better in addition to be 41:29 able to generate solely let's see it 41:33 serves as a 41:36 foundation that can understand and 41:38 simulate a real world the hell was 41:42 that we believe will be important 41:44 milestone for achieving AGI that's a 41:46 really interesting statement Sora serves 41:48 as a foundation for models that can 41:51 understand and simulate the real 41:55 world this is not just making video what 41:57 they're doing here Sam's Valentine Ann 42:00 anniversary gift to you I guess that's 42:02 it Brandon right Sora serves as a 42:05 foundation for models that can 42:07 understand and simulate the real world a 42:09 capability We Believe will be an 42:11 important milestone for achieving 42:16 AGI yeah so imagine the model being able 42:19 to understand not just the 42:22 map but the the world it represents holy 42:33 [ __ ] 42:37 yaza that's [ __ ] Bonkers it's Bonkers 42:41 people it's just 42:42 Bonkers just Bonkers all right I'm gonna 42:45 go thank you for sharing you're 42:47 welcome um peace out yeah holy crap 42:50 there's a lot of holy crap in there um 42:52 yeah that's it open.com you can't play 42:54 with it yet unless you get access as a 42:57 red teamer or some 42:59 specialized visual creative 43:03 artist um all right I'll keep paying 43:06 attention to this there was something 43:07 else oh um I don't know if you saw it 43:10 but um Google released um Gemini Pro 1.5 43:18 today or or announced it I don't know if 43:20 they announced it or released it I've 43:22 been busy um that looks like it's better 43:26 than Gemini I Advanced which is their 43:27 Ultra 1.0 like their 43:30 middle they released the 43:33 middle 43:35 model not the high-end model but the 1.5 43:39 version of it it looks quite amazing so 43:41 um so I think the Google shit's going to 43:43 get better quickly too 43:46 so the the the dominoes are starting to 43:49 fall 2024 is going to get really weird 43:51 people it's going to get really weird 43:52 really fast so hang on it's coming go 43:57 back to work or go back to your siping 43:59 hot cocoa whatever the hell you were 44:00 doing peace follow my channel damn it 44:05 bye