AI Learning Lab

Feb 15, 2024 - (1 of 2) Sora: OpenAI's Mind-Blowing Text-to-Video Model

eWGh3Eu_oGQ
Video2024-03-1044:0719 views

Description

In this captivating live presentation, Kyle Shannon dives into the groundbreaking advancements of OpenAI's new text-to-video model, Sora, which promises to revolutionize the way we create and experience visual storytelling. With a focus on the model's ability to generate realistic and imaginative scenes from simple text prompts, Kyle discusses its potential to transform creative industries, highlighting its impressive coherence in character movement and scene composition. He emphasizes the model's deep understanding of language and real-world physics, while also acknowledging its current limitations, such as occasional inaccuracies in physical interactions. As he showcases various stunning video examples, Kyle invites viewers to consider the implications of such technology on filmmaking and content creation, sparking a dialogue about the future of AI in the arts. For more insights and updates, check out the AI Learning Lab on TikTok: [AI Learning Lab](https://tiktok.com/@aiLearningLab). #AI #OpenAI #TextToVideo #Sora #Filmmaking #CreativeTechnology #Innovation #artificialintelligenceconference Chapters: 00:00:00 OpenAI Sora Launch 00:07:30 Sora 00:13:40 Sora Red Teaming 00:14:50 Sora and AGI 00:26:45 Importance of Storytelling 00:35:15 Sora's 00:37:45 Sora Safety Features 00:41:00 Sora's Diffusion Model 00:42:00 Sora and the Real World

Chapters

Transcript

0:29 e
0:59 e
1:10 all right good people of the tickety
1:13 talk was
1:16 happening um okay so here's what we're
1:20 doing this special live presentation of
1:23 the AI learning lab brought to you by
1:25 open
1:26 AI Sam alman's jealous of Kyle Shannon
1:29 but Kyle still Hawks his Wares anyway
1:33 all right let's
1:36 see where is my mouth where's my mouse
1:39 people listen I got to get rid of the
1:42 hey Kyle is this your Tick Tock to work
1:45 today yes it it's exactly what it is uh
1:50 all right let's see am I the only one
1:53 screaming inside right now holy crap
1:56 this this is a big deal
1:59 so I I I have made this statement now a
2:02 number of times 2023 the last normal
2:05 year um and then the month of January
2:08 went by without a
2:11 substantive um
2:12 announcement and then we're halfway
2:15 through February and we haven't seen one
2:16 that I think is a really big deal well
2:19 this just showed up so um this is a
2:22 really big deal um how many people we
2:24 have in here 77 so do me a favor did you
2:27 get access no they're just going into
2:29 red teaming right now what site is this
2:30 so this is
2:32 open.com Sora s o r a it's their new
2:37 text to video model it looks like you're
2:39 going to be able to do up to 60c videos
2:43 right now they're only showing 20 second
2:45 videos but when I show you the examples
2:48 here holy
2:50 [ __ ] holy [ __ ] is it mind-blowing so do
2:53 me a favor share this
2:55 live uh share it with anyone you think
2:58 might be into this or even people who
3:01 are not into
3:02 this and then I'm just going to like I
3:05 haven't read the whole thing yet but
3:06 this is in looking at the
3:08 examples I'm like I'm like I gotta go
3:12 live I got to go live and just talk
3:13 about it this is this is a a big [ __ ]
3:16 deal this is like Runway ML and paa at
3:20 this point I was looking at them as
3:24 toys with potential right I mean you
3:27 could do you could absolutely do
3:29 storytelling with ML like if you look at
3:31 some of the work that was done for the
3:32 Gen 48 video contest even in the AI
3:35 Salon like that stuff was good but this
3:39 is um this is different this is
3:41 different what's the link again
3:45 open.com a can I put text on screen here
3:51 anywhere flip camera quick pull draw and
3:57 guess
3:59 comment
4:00 what's that do oh I just am making a
4:05 comment uh oh how do I get out of this
4:08 oh there we go okay um I was like why is
4:11 he live now oh [ __ ] something is
4:12 happening yep something's happening okay
4:16 all right oh pin a chat
4:20 um I don't know how to pin stuff I I
4:23 usually have mods for that well if any
4:25 mods are in here um do me a favor and
4:30 pin a chat uh that is
4:33 open ai.com
4:38 how you don't get access to it right now
4:41 as far as I know okay so creating video
4:44 from text Sora is an AI model that can
4:47 create realistic and imaginative scenes
4:49 from text instructions all videos on
4:51 this page were generated directly by
4:53 Sora without any
4:56 modification um comment and pin pin it
4:59 yourself all right let's see
5:03 comment
5:05 uh
5:07 open
5:08 ai.com
5:12 SLS
5:15 a and
5:17 then pin all right there you go is that
5:20 pinned do you see it pinned I don't see
5:23 it
5:25 pinned yeah oh my God I know I
5:28 know
5:31 pin this
5:32 comment couldn't pin the comment you can
5:35 try
5:36 again I can't pin it well if if someone
5:40 else can pin it great anyway uh it's in
5:43 there I'm looking at the same time how
5:45 cool yeah feel free to obviously go to
5:47 the site but I'll I'll I'm going to go
5:49 through this whole page here because
5:50 there's a
5:51 lot okay we're teaching AI to understand
5:54 and simulate um real world motion the
5:57 goal of the model is to help solve
6:00 problems our text to video model Sora
6:02 can generate videos up to a minute long
6:05 while maintaining visual quality and
6:07 adherence to The users's Prompt holy
6:10 [ __ ] sometimes you can't pin URLs
6:14 oh oh and I but it wasn't even a URL oh
6:17 there you go oh open AI do so let me do
6:19 okay uh hang on so I'll do um
6:23 comment
6:25 open Ai and then I'll do parentheses d o
6:32 t c o m SLS o r
6:38 a boom and then let me see if I can pin
6:42 that there we go okay so there's the
6:46 pin all right that's where I am what's
6:49 going on so hey Danielle um what's going
6:52 on is why are we live um open AI
6:57 just announced it's not live yet but
7:00 they just announced their text to video
7:03 model and it looks Bonkers good so I'm
7:07 just going to go through the
7:09 announcement hey Danielle um I'm just
7:11 going to go through the announcement and
7:12 we're going to look at all the video
7:14 examples because they're just
7:16 crazy um and there's a lot of them and
7:20 then we're going to read about it so
7:21 right now apparently this is going into
7:23 red teaming which is where they're doing
7:25 internal testing for for safety it's
7:28 gorgeous so this is okay prompt a
7:31 stylish woman walks down a Tokyo Street
7:33 filled with warm glowing neon animated
7:36 City signage she's wearing a black
7:37 leather jacket a long red dress and
7:40 black boots
7:42 more and carries a black purse she wears
7:44 sunglasses and red lipsticks she walks
7:47 confidently and casually the street is
7:49 damp and reflective creating a mirror
7:51 effect of the col colorful lights many
7:54 pedestrians walk about notice her legs
7:57 don't look you know how walking on a lot
8:00 of these models just looks weird like
8:03 she actually looks like she's got legs
8:06 and they're moving normally it looks
8:08 like the bottom of her shoes get thicker
8:11 occasionally
8:13 but the prompt coherence here is kind of
8:16 insane I mean the the the visual visual
8:20 coherence this is one clip this is a 59
8:24 second
8:26 video this was generated from a prompt
8:30 with no edits with no modifications
8:36 apparently um again I I don't know if
8:40 any of this shit's true but this is you
8:42 know
8:45 open.com okay that's one here's the next
8:48 one let me go to this prompt several
8:51 giant woolly
8:53 mammoths these look a little odd it's a
8:57 10-second clip
9:00 the physics on
9:02 their legs is a little odd but it's not
9:07 bad like I feel like we've come to
9:10 forgive just absolute
9:13 weirdness and this is not I like I don't
9:15 immediately go to weird I I immediately
9:17 go to is that a puppet but I don't go to
9:20 that's AI this one was crazy I saw this
9:22 one before this one is a movie trailer
9:25 featuring The Adventures of a
9:26 30-year-old Spaceman wearing a red wool
9:29 knitted motorcycle helmet which he has
9:33 blue sky salt desert cinematic style
9:36 shot on 35 M millimeter film Vivid
9:39 colors 17 seconds
9:44 long do you get to determine the length
9:47 of the video I don't know minor
9:49 jankiness yeah this is this is minor
9:52 jankiness
9:54 but this is really flipping close
9:58 people this is really flipping
10:05 close and I I gotta tell you you know
10:08 when they when they start rolling in
10:10 thank you Roberto when you start rolling
10:12 in things like music generation and you
10:15 know script generation and voice
10:17 generation right you're these things we
10:20 are not we are probably by the end of
10:23 the year we're going to be able to
10:24 generate these things you know with
10:28 other multimedia elements in them all
10:30 right like I and by the way I haven't
10:31 even scr scratched the surface this one
10:34 is staggering like watch the lighthouse
10:36 here it it um normally when you see
10:40 stuff like
10:41 this something like the lighthouse is
10:44 just going to go away like like it'll
10:46 it'll morph and get weird but it's
10:48 staying very very consistent as are the
10:51 people right
10:53 there as are the
10:56 waves like there's no weird morphing
11:00 like you normally see with water scenes
11:03 and look how coherent the cliff is
11:05 staying [ __ ]
11:09 Bonkers I
11:12 mean I looked at Runway stuff and Pika
11:15 stuff and I'm like if I were a filmmaker
11:17 I'd look at that stuff and I'd be like
11:19 yeah I'm not worried if I were if I were
11:22 a drone cinematographer and looked at
11:25 this I'd be like oh [ __ ] guess that's
11:28 over
11:30 [ __ ] crazy animated scene of a
11:33 closeup of a short fluffy monster
11:35 kneeling beside a melted red candle the
11:37 art style is 3D and realistic focus on
11:42 lighting the flame doesn't seem real but
11:47 it's a it's an
11:48 animated thing so I think you could
11:51 forgive
11:52 that a gorgeously rendered papercraft
11:56 world of a coral reef rif with colorful
12:00 fish and sea
12:02 creatures 20
12:06 seconds Amelio's wife shared the live
12:09 thank you there's only 41 people in here
12:11 be like ao's wife share this
12:15 live this is crazy crazy crazy
12:19 crazy cio's wife share this
12:23 live this is crazy crazy crazy
12:28 crazy
12:38 I may need to take that I may need to
12:39 come back in uh uh
12:42 [Music]
12:43 uh photo realistic closeup video of two
12:46 pirate ships battling each other as they
12:49 sail inside a cup of
12:51 coffee wow
12:54 wow there's a little jankiness on the
12:56 right
12:58 ship
13:02 right there it sort of pops out of the
13:06 water but
13:10 damn I love that one very nice yeah that
13:13 one's
13:14 cool a young
13:16 man at his 20s sitting on a piece of
13:20 cloud in the sky reading a book there's
13:22 some jankiness with the pages but look
13:25 his hand looks good no flickering on the
13:28 face it almost looks like he said
13:30 something there he's definitely sitting
13:32 on a
13:34 cloud all right so that's the first set
13:37 of
13:39 examples today Sora is becoming
13:42 available to Red teamers so red teamers
13:45 are people who test for critical areas
13:48 for harm or risk we are also granting
13:51 access to a number of visual artists
13:53 designers and filmmakers to gain
13:56 feedback on how to advance this model to
13:58 be most helpful for Creative
14:00 professionals that's actually a smart
14:02 move right getting the creative cuz cuz
14:05 this is absolutely a threat to that
14:08 category so getting them involved is
14:10 smart I don't know that they're going to
14:13 be into it but what whatever we're
14:15 sharing our research project early to
14:17 start working with and getting feedback
14:20 from people outside of open Ai and give
14:23 the public a sense of what AI
14:25 capabilities are on the horizon remember
14:28 I've been talking about for the last two
14:31 weeks Sam
14:33 Alman talks about chat gp4 being best
14:37 understood as as a preview and progress
14:41 here is not linear meaning when this
14:44 thing gets way better it's going to get
14:46 way better on a lot of
14:48 fronts um this is going to cost a lot
14:51 more it may um historical footage of
14:54 California during the Gold Rush that's
14:57 pretty amazing
15:01 look at the coherence of like the the
15:04 people are not morphing into other like
15:08 look at that horse right there or that
15:12 person walking look at these horses here
15:15 like they're not morphing into other
15:19 things this is pretty crazy pretty crazy
15:27 good um Amelio's wife Story vine what's
15:31 I don't know what you mean by that
15:32 that's my company close-up view of a
15:35 glass sphere that has a Zen Garden
15:37 within it hang on I got to check
15:39 something let me see why Monique was
15:41 calling
15:47 me let me just ask her hang
15:56 on and then let me grab
16:00 gra this hold please hold
16:28 please
16:35 all right um zen garden there's a small
16:38 dwarf in the sphere let's see wonder how
16:41 oh wait uh I hope that includes
16:45 you oh I hope that you and our sto Story
16:48 vine are included in the red team
16:50 release I don't know we've established
16:52 that that Sam Alman is jealous of me so
16:54 I would assume I'm not going to have
16:55 access it would be nice if we had access
16:58 what's the hot news the hot news is
17:02 open.com Sora s o a and I'm going
17:06 through all the all of this release it's
17:09 not live yet but it's being released to
17:12 red teamers and it's being released to
17:14 creative professionals to get input and
17:17 they're they're they're releasing this
17:19 early to give us a sense of what's
17:22 coming so if I were Runway ML and if I
17:24 were PBS right now I'd be pooping my
17:27 pants and if they've got something
17:29 that's close to this that they haven't
17:30 released yet this will probably Inspire
17:33 them to release it early all right so
17:35 there's another video here's another
17:37 one wow holy [ __ ] extreme closeup of a
17:42 24-year-old woman's eye
17:44 blinking standing in maresh during magic
17:48 hour oh my god look the there's maresh
17:51 in the reflection of her eye cinematic
17:54 film shot in 70 mimer depth of field
17:57 Vivid colors cinematic
17:59 15 seconds of a closeup of an eye with
18:01 the reflection of a specific location at
18:03 hey cam Katen looks so
18:06 real
18:08 um cartoon kangaroo of disco
18:13 dances um do me a favor if there's a mod
18:17 in here um put in the comment open a
18:21 open and don't put the dot like write
18:24 the dot in parenthesis do in
18:27 parentheses. Sora and pin that so that
18:30 the url's pinned for people to check out
18:33 a cartoon kangaroo disco
18:38 dances thank you very much I appreciate
18:40 that I'm late what are we looking at
18:42 we're looking at open ai's new text to
18:45 video model that
18:47 looks
18:49 stunning like you know me if it looks
18:51 like [ __ ] I'll tell you it looks like
18:52 [ __ ]
18:54 this this is just the the teensiest of
18:57 jankiness now
18:59 we don't get to play with this yet it's
19:01 just it's just in
19:02 testing but there was a little janky
19:05 there I saw some foxes earlier that sort
19:08 of meld into one another so it's still
19:10 going to be janky but like compared to
19:12 Runway and Pika and the and the stable
19:15 video stuff this is crazy you can do up
19:19 to 59 seconds of video with a single
19:23 prompt and it's doing cuts it looks like
19:26 it's doing Cuts within a single video
19:28 like edits within a single video this is
19:31 a beautiful homemade video showing
19:34 people of losos Nigeria in in the year
19:37 2056 shot with a mobile phone
19:42 camera crazy thank you ao's
19:46 wife um all right a petri dis dish with
19:50 a bamboo
19:51 forest and tiny red pandas running
19:54 around yeah this is janky and weird
19:55 they're just they're materializing
19:58 they're materializing out of nowhere
20:00 from behind the behind the uh thing but
20:04 that's it's still that's pretty
20:07 decent in a Petri
20:11 dish
20:14 huh and they all only have two
20:17 legs that's got some Jank in
20:21 it the plastic chair is
20:25 epic all right let's see
20:29 the camera rotates around a large stack
20:31 of vintage television showing different
20:33 different programs 1950s Sci-Fi movies
20:37 horror movies news static a 1970s
20:42 sitcom inside a New York City Museum
20:47 Gallery so the camera's panning but it's
20:49 s it's panning really
20:52 slow but you got
20:57 horror there's horror there's
21:02 horror there's maybe a
21:05 sitcom that's crazy this is crazy people
21:10 good Jank something for us to feel
21:12 nostalgic about later exactly I'm
21:15 telling you the reason I'm going live is
21:17 this is the first thing I've seen in
21:19 2024 that starts to feel like 2024 is
21:22 going to get really
21:24 weird
21:26 because this is now getting I mean I I
21:29 would I would consider that janky enough
21:32 that it definitely looks AI but a lot of
21:34 this other stuff doesn't Sora is able to
21:37 generate complex scenes with multiple
21:39 characters specific types of motion and
21:42 accurate details of the subject and
21:44 background the model understands not
21:46 only what the user has asked for in The
21:49 Prompt but also how those things exist
21:51 in the real world all right don't know
21:54 how they're doing
21:56 that look at this the the dust looks a
22:00 little
22:01 fake but like you'd have to be looking
22:04 hard at
22:06 it but the terrain looks real there's a
22:11 little bit of physics weirdness in the
22:14 the the car sort of going around the
22:16 corner
22:18 weird but that's pretty [ __ ]
22:22 crazy this is Reflections in the window
22:25 of a train traveling through Tokyo sub
22:28 look at the reflection of the woman when
22:30 they go
22:31 past a building watch this right here
22:35 right there holy
22:41 [ __ ] that looks like real footage to
22:46 me that you could absolutely pass off
22:48 for real
22:51 footage that reflection right there you
22:54 could absolutely pass that off pass that
22:56 off as real footage it makes me wonder
22:58 if they picked up Google Maps content
23:00 it's pretty good yeah I I honestly don't
23:02 know I'm assuming they're using some
23:04 sort of diffusion technology which if
23:06 they are it's not they're not 3D mapping
23:09 stuff I don't think but I I have no
23:12 [ __ ] idea
23:13 people good
23:16 Lord a drone camera circles around a
23:19 beautiful historic church built on a
23:21 rocky outcropping on the Amalfi
23:24 Coast The View showcases historical and
23:27 magnificent oh wait there's
23:30 more architecture details and tiered
23:33 Pathways and patios waves are seen
23:35 crashing against the
23:39 shore several distant people are seen
23:42 walking and enjoying Vistas on
23:45 patios I mean just as it's panning all
23:47 of these people are staying coherent
23:49 they're not morphing into one another
23:51 they're not turning into weirdness that
23:53 thing isn't turning into some weird
23:55 penis shape well it already is but
24:00 not more explicitly makes me wonder oh
24:04 yeah if they picked up the Google thing
24:05 I don't think so reference libraries I
24:09 yeah who
24:10 knows I I mean it looks like so this is
24:13 a large red octopus is seen resting on
24:15 the bottom of the ocean floor blending
24:18 into the Sandy Rocky terrain its
24:21 tentacles are so let's just look at the
24:25 weirdness
24:27 here the crab's got a little octopus
24:30 flowiness to
24:32 him but like let's look at the octopus
24:34 the the legs seem pretty
24:37 coherent they're oh yeah they're not
24:39 quite an anatomically they have like
24:42 suckers on both sides of the leg rather
24:44 than just on the bottom and the the
24:46 suckers right here kind of animate along
24:48 the leg right wait where is it right
24:51 there they did
24:54 it right there see how they just zipped
24:57 around the leg
24:58 but holy crap holy crap this is good
25:03 like this is
25:07 close a flock of paper air Flames fluts
25:10 through a dense jungle weaving around
25:12 trees as if they were migrating
25:17 birds 20 seconds of
25:20 this it's happening it is happening
25:23 Danielle wow that looks real yeah this
25:25 is okay so if you're new here there's
25:27 not a ton of people here not a ton of
25:28 new people but what we're looking at is
25:31 open AI there you go Danielle got it
25:34 open.com Sora is this
25:38 site
25:40 um let me check something here CU I am
25:44 at work could you please share a sample
25:46 of the higher level output yep
25:49 okay got it got
25:55 that okay that's something I got to pay
25:57 attention to all
26:04 right love the way you share new
26:06 technology thank you very much I
26:07 appreciate it no this is exciting we're
26:09 going to need a larger hard drive we
26:11 certainly
26:12 are I mean
26:18 listen with the current state of video
26:24 technology you actually have to work
26:26 pretty hard to get it to be not janky
26:28 enough to be able to tell a
26:31 story
26:34 this but like like like this Step Up in
26:38 quality means that way more people are
26:41 going to be able to tell video stories
26:43 than could
26:45 before cuz right now you still have to
26:47 be a decently accomplished video
26:51 Storyteller to understand how to edit
26:53 around shitty shitty content okay that's
26:57 weird and janky oh yeah her her arm just
27:01 turned into a
27:04 pillow look at her head just goes and
27:07 then her arm turns into a pillow all
27:09 right that's Jank ified the the cat has
27:13 a single paw turns into a double paw
27:18 but if you if you take out the weirdness
27:21 like you
27:22 could right okay after that right here
27:26 you could probably cut to that see there
27:28 the extra arm came in all right that one
27:30 that one I would redo I don't know why
27:31 they use that one as an example that
27:33 feels like a better version of
27:38 Runway but look at this look at the
27:40 coherence of these birds this is
27:43 crazy and look at the coherence of the
27:46 water normally in Runway the water's
27:48 doing some weird ass thing where it's
27:50 like morphing into the log or something
27:52 like that this must be what they're
27:54 talking about where it understands the
27:55 3D relationships to the objects
28:00 this is absolutely amazing this is
28:01 absolutely amazing that's that that is a
28:04 yes who said that fly life campaign yes
28:08 that is the perfect description this is
28:10 absolutely
28:12 amazing and and like the coherence in
28:15 this Dragon at this parade and the
28:17 coherence of the
28:21 people yeah this is not like anything
28:24 else we've got right now by the way this
28:27 is open AI thanks thanks Danielle for
28:28 putting that up hey
28:31 everybody all right that's that one all
28:34 right the model has deep understand
28:37 understanding of language enabling it to
28:40 accurately interpret prompts and
28:42 generate compelling characters that
28:44 Express vibrant emotions Sora can also
28:48 create multiple shots within a single
28:50 generated video so they're doing editing
28:53 within the video or it'll appear that it
28:56 would be you're editing together
28:57 multiple
28:59 shots within a single video that
29:02 accurately persist characters and visual
29:06 style that line right there that
29:09 actually is it free we don't know yet I
29:10 well I assume it won't be free I assume
29:13 this will be part of chat
29:16 GPT like you're going to be able to
29:18 generate this kind of video I'm assuming
29:21 just like you create Dolly images right
29:23 now inside chat GPT you'll be able to do
29:26 this within it is my guess I'm
29:28 speculating welcome to chat
29:31 TMZ I mean if I'm filming a Purina
29:37 commercial other than the fact that the
29:39 snow's a little weird on their
29:43 nose you could get away with that if you
29:46 if you had
29:47 to if you had to you know cut cut a shot
29:51 in we want puppies in the snow because
29:53 we're in
29:56 Colorado
30:00 I suspect this will make Puppy and red P
30:02 panda videos for fun I like it that's
30:05 not bad all right let me go look at the
30:08 other ones
30:10 here this one's pretty amazing tour of a
30:13 Galler Gallery with many beautiful look
30:16 it just cut to a different room that's
30:18 within the same
30:24 video wow though holy cow Dena I know
30:27 Master path to freedom I know crazy
30:29 right all right let's look at this one
30:32 there's there's one I think it's in here
30:33 that I saw this is remarkable notice
30:36 that their legs when they're walking
30:37 actually look like legs they don't look
30:39 like they're floating in a weird way you
30:42 cut that into a movie you you tell me if
30:44 someone would suspect that's
30:47 AI I don't think
30:50 so beautiful snowy Tokyo is bustling the
30:54 camera moves through the bustling City
30:56 street following several people enjoying
30:58 the beautiful snowy weather shopping at
31:02 nearby stalls gorgeous Sakura pedals are
31:05 flying through the wind along with the
31:08 snowflakes so there's the Sakura
31:14 pedals some snow
31:16 falling you could totally get away with
31:18 that in a movie or in a
31:21 video Sam Altman asked for 7 billion no
31:25 retr punk Sam Altman is asking for seven
31:27 trillion he's trying to raise between 5
31:30 and 7 trillion
31:32 dollar to to out compete
31:37 Nvidia
31:39 trillion pricing and access model have
31:41 not been specifically mentioned no they
31:43 haven't uh oh let's keep going because
31:46 the next thing just said something about
31:47 the model's weakness that that'll be
31:49 interesting to
31:51 hear stop motion anim animation of a
31:54 flower growing out of a window sill of a
31:56 Suburban House let's see see what gets
31:58 weird here if
32:03 anything weird camera motion but like I
32:07 think you can get away like that doesn't
32:09 look real that looks geometrically weird
32:12 and the it looks like plastic rather
32:15 than Pottery but
32:18 damn story of A robot's life in a
32:21 cyberpunk setting oh look so the prompt
32:24 here the story of a of a robot's
32:28 life in a cyberpunk setting so this is
32:30 19 seconds there's the third
32:33 scene fourth
32:40 scene so
32:44 one
32:52 two that's a weird camera move
32:56 three
32:58 so four Cuts so it understands that
33:01 you're doing a narrative and in film
33:03 it's cool but we'll wait to see how it
33:06 looks outside of the curated results
33:08 yeah yeah absolutely Pate totally agree
33:11 and and there's even some things in here
33:12 Pate as you can imagine that are um why
33:16 would they have included
33:18 that so I guess I guess maybe they
33:21 they're like why don't we put in four
33:22 things that look shitty does it talk it
33:25 doesn't talk yet but we're not we're not
33:26 that far from that I would I would think
33:28 that by the end of the year we'll have
33:30 videos that have either audio behind
33:32 them or even scripts written about them
33:35 this is an extreme closeup of a
33:37 gray-haired man with a beard in his
33:41 60s in deep thought pondering the
33:44 history of the universe as he sits in a
33:46 cafe in Paris his eyes focus on
33:49 people off screen as they walk he sits
33:53 mostly motionless he's dressed in a wool
33:56 coat yep but button- down shirt he wears
33:59 a Brown Beret yep glasses and has a very
34:02 professional appearance yep in the end
34:05 offers a subtle closed mouth Smile as if
34:09 he found the answer to the meaning of
34:10 life let's see if he does
34:15 that nope no weird smile but
34:20 that's depth of field cinematic that's
34:24 that's pretty [ __ ]
34:26 good
34:28 realberg all right that's that
34:31 one that looked
34:33 fake beautiful silhouette animation okay
34:36 it's an animation so you could argue
34:37 it's stylized there one wolf just turned
34:40 into
34:45 two there it's walking weird that's got
34:47 a little Jank to
34:50 it this has got weird physics but it's
34:52 still pretty [ __ ]
34:56 cool
34:59 so like all of these things are in water
35:01 and then you've got the surface of the
35:03 water down here so like there's just
35:05 weird like there's a surrealism to this
35:07 one which if you want it surreal this
35:10 works there's the puppies all right the
35:14 current model has weaknesses it may
35:16 struggle with accuracy of simulating the
35:18 physics that's what we've noticed of a
35:20 complex scene it may not understand
35:23 specific instances of cause and effect
35:26 for example a person might take a bite
35:28 out of a cookie but afterwards the
35:30 cookie might not have the bite mark so
35:33 so um continuity errors we get those in
35:35 real films
35:44 [Music]
35:47 anyway um the model may also confuse
35:51 spacial details of a prompt for example
35:53 mixing up left and right May struggle
35:55 with precise descriptions of a
35:58 events like following a specific camera
36:02 trajectory all right let's go look at
36:07 these scene of a person running
36:10 cinematic film oh weakness Sora
36:13 sometimes creates physically impossible
36:19 motion well it's actually that one's
36:21 actually kind of fun cuz it looks like
36:24 he's he's on a a car that runs by you
36:26 running on
36:27 [Laughter]
36:29 it this is this is cool that they're
36:31 showing the weaknesses ah okay prompt
36:34 five the weakness animals or people can
36:36 spontaneously appear notice that they're
36:38 they're all emerging out of the other
36:40 puppies all
36:44 right an example of inaccurate physical
36:49 modeling basketball through hoop that
36:54 explodes yeah oh yeah that the ball just
36:56 went right through the like that's
36:58 pretty good up to right there actually
37:02 up to there you're fine there it's
37:05 bad still pretty [ __ ] good good Lord
37:09 okay that looks weird Sora fails to
37:12 model a chair as a rigid object there
37:14 you go inaccurate physical interactions
37:18 yep oh yeah look at the chair flying
37:21 around it's possessed catch
37:26 it just a flesh wound simulating complex
37:30 interactions between objects and
37:32 multiple characters is often
37:35 challenging she she missed blowing out
37:37 the candles I've seen that in real life
37:40 before I'm like how can you not blow out
37:45 candles all right
37:47 safety this is interesting we'll be
37:50 taking several important safety steps
37:52 ahead of making Sora available in open
37:55 ai's products so that means on the API
37:59 side hopefully that means in chat GPT
38:01 we're working with red te red teamers
38:04 domain experts in areas like
38:06 misinformation hateful content and bias
38:09 who will be adversarially testing the
38:11 model we're also building tools that
38:13 detect misleading
38:15 content uh such as detector classifi so
38:18 that we can tell the video is generated
38:20 by Sor we plan to include c2p metadata
38:23 in the
38:26 future um in addition to us developing
38:28 new techniques we're leveraging our
38:31 existing safety models for example once
38:33 in an open AI product our text
38:35 classifier will check and reject text
38:39 input prompts that are in violation of
38:41 our usage they're already doing that
38:44 we'll be engaging with poly policy
38:46 makers and
38:49 Educators and then here's that one's got
38:51 some weird physics going on and some
38:53 weird
38:55 biology does he walk around
38:58 that yeah he just kind of walked through
39:01 that shutter all right that one's
39:06 weird that one is actually really
39:12 good cute the water looks a little a
39:22 ie that's not how they move their feet
39:24 but still cute that's cute
39:27 dog taking a
39:31 selfie okay the seagull was
39:36 weird okay that's weird uncanny
39:40 valley
39:43 that's that's not bad but it's got weird
39:48 contrast that's
39:50 weird that's
39:55 weird
39:58 that's a little
40:02 weird that's pretty
40:06 good H wow the scales are funny reality
40:10 bending yeah crazy crazy all right
40:13 research techniques it's a diffusion
40:15 model which generates video by starting
40:17 off with one that looks like static
40:19 noise and gradually transform so this is
40:21 the same Tech of all the image
40:23 generation tools we play
40:25 with cap of generating entire videos at
40:28 once or extending a generated video to
40:30 make them
40:33 longer we represent videos and images as
40:36 collections of smaller units of data
40:38 called
40:39 patches Each of which is akin to a token
40:42 in
40:43 GPT by unifying how we represent data we
40:46 can train diffusion
40:48 Transformers on a wider range of visual
40:51 data than was possible before spanning
40:54 different durations resolutions and
40:56 aspects ratios so it sounds like they're
40:58 breaking elements up within the image
41:01 and kind of rendering them independently
41:04 somehow together that's weird and
41:07 cool Sora Builds on past research in Di
41:11 and GPT
41:13 models it uses the rec captioning
41:16 technique from Dolly 3 which involves
41:18 generating highly descriptive captions
41:21 okay so they're rewriting your prompt
41:23 for you as a result the model will be
41:26 able to follow the user's text
41:27 instructions better in addition to be
41:29 able to generate solely let's see it
41:33 serves as a
41:36 foundation that can understand and
41:38 simulate a real world the hell was
41:42 that we believe will be important
41:44 milestone for achieving AGI that's a
41:46 really interesting statement Sora serves
41:48 as a foundation for models that can
41:51 understand and simulate the real
41:55 world this is not just making video what
41:57 they're doing here Sam's Valentine Ann
42:00 anniversary gift to you I guess that's
42:02 it Brandon right Sora serves as a
42:05 foundation for models that can
42:07 understand and simulate the real world a
42:09 capability We Believe will be an
42:11 important milestone for achieving
42:16 AGI yeah so imagine the model being able
42:19 to understand not just the
42:22 map but the the world it represents holy
42:33 [ __ ]
42:37 yaza that's [ __ ] Bonkers it's Bonkers
42:41 people it's just
42:42 Bonkers just Bonkers all right I'm gonna
42:45 go thank you for sharing you're
42:47 welcome um peace out yeah holy crap
42:50 there's a lot of holy crap in there um
42:52 yeah that's it open.com you can't play
42:54 with it yet unless you get access as a
42:57 red teamer or some
42:59 specialized visual creative
43:03 artist um all right I'll keep paying
43:06 attention to this there was something
43:07 else oh um I don't know if you saw it
43:10 but um Google released um Gemini Pro 1.5
43:18 today or or announced it I don't know if
43:20 they announced it or released it I've
43:22 been busy um that looks like it's better
43:26 than Gemini I Advanced which is their
43:27 Ultra 1.0 like their
43:30 middle they released the
43:33 middle
43:35 model not the high-end model but the 1.5
43:39 version of it it looks quite amazing so
43:41 um so I think the Google shit's going to
43:43 get better quickly too
43:46 so the the the dominoes are starting to
43:49 fall 2024 is going to get really weird
43:51 people it's going to get really weird
43:52 really fast so hang on it's coming go
43:57 back to work or go back to your siping
43:59 hot cocoa whatever the hell you were
44:00 doing peace follow my channel damn it
44:05 bye