Time and time again, OpenAI’s multiphase chatbots, with ChatGPT hailed as its crowning achievement, have far surpassed their expected capabilities. From successfully passing the bar exam in multiple states to attempting to represent individuals in court, OpenAI’s capabilities have far-reaching consequences among both legal and standard workforces. Now, only a few weeks shy of the Oscars, OpenAI is set to unveil a new application extension called Sora, a photorealistic video creation tool that can allow anyone to master the art of cinema without having to attend film school.
What is Sora?
In this ever-evolving landscape of artificial intelligence and generative AI tools, Sora, OpenAI's latest creation, now emerges as a groundbreaking advancement in text-to-video tech. While OpenAI remains adamant the app is still in its research phase, Sora has already begun to garner a significant amount of attention for its vast array of capabilities, offering only a small glimpse into the future of digital content creation. With its ability to translate textual prompts into immersive video sequences, Sora represents a timely combination of AI and human creativity, promising to revolutionize visual storytelling across various media platforms.
With OpenAI’s current level of large language model (LLM) capabilities, Sora leverages the principles of quantum computing to process complex user-submitted data, generating a lifelike visual. Unlike traditional computing systems, which operate on binary code, Sora utilizes quantum computing to carry out actions at a much quicker rate. In turn, this enables Sora to analyze the input text in question and produce a detailed video clip for consumer usage.
"Powered by a version of the diffusion model used by OpenAI's Dalle-3 image generator as well as the transformer-based engine of GPT-4, Sora does not merely churn out videos that fulfill the demands of the prompts but does so in a way that shows an emergent grasp of cinematic grammar," explains Tim Brooks, a research scientist currently working on the project, as originally reported by Wired.
According to OpenAI, what sets Sora apart from its limited number of predecessors is its ability to inject its generative creations with both narrative depth and cinematic quality. By analyzing textual prompts and interpreting them through a cinematic lens, Sora attempts to go beyond basic replication to create a visual story. In one early image documented on social media, when prompted to depict a coral reef absolutely teeming with life, Sora provided visually stunning imagery while also including dynamic camera angles and shot changes to enhance the piece’s overall narrative fluency.
"There's actually multiple shot changes. These are not stitched together but generated by the model in one go. We didn't tell it to do that, it just automatically did it,” stated Bill Peebles, another researcher on the project.
Deepfakes and Legal Ramifications
OpenAI hopes to further enhance Sora's already shocking capabilities, including the ability to generate videos from single images. While this feature holds an immense amount of potential, OpenAI warns the industry of its potential for misuse, particularly in the realm of deepfakes and misinformation, which have gained traction recently.
Pushing troves of deepfakes and misinformation aside, Sora, like most other generative AI products available right now, is trained based on a continuously growing catalog of online data, leaving it rather susceptible to forming an opinion or producing inherently biased work. Lawyers and other legal experts warn that this open data collection approach is one fell swoop away from pending legal action. The potential threat it poses to creatives like voice actors and other multimedia professionals could have far-reaching legal ramifications if left unchecked.
The average citizen is also at risk. Dr. Andrew Newell, chief scientific officer for identify verification firm iProov, told CBS Money Wach that “Sora will make it even easier for malicious actors to generate high-quality video deepfakes, and give them greater flexibility to create videos that could be used for offensive purposes.” People who rely on certain tools and organizations every day, like social media sites and banks, could become greater targets for deepfake scams. Celebrities, too, could become caught in the crosshairs of their likeness being used without consent in AI-created imagery and videos.
Guardrails are necessary to ensure that this type of AI technology is used solely for good. In addition to moderating video prompts and the subsequently produced content, OpenAI plans to implement a "detection classifier" that can quickly identify when a video has been produced by the app, according to an online statement released by the company last week. OpenAI also stated it would include a popularly recognized tag, which essentially amounts to a digital watermark.
Despite the transformative potential of Sora and similar technologies, it's evident that text-to-video AI tools won't exactly be replacing traditional filmmaking methods anytime soon. However, it promises to significantly alter or almost democratize, as OpenAI put it, content creation on social media platforms, empowering users to produce high-quality videos on their own. As Sora is likely to only keep evolving, it highlights the ongoing intersection of AI and creativity, presenting opportunities, challenges and continuous legal hurdles for the future of digital storytelling.