Front page layout
Site theme
Sign up or login to join the discussions!

In a tweet posted this morning, artificial intelligence company Runway teased a new feature of its AI-powered web-based video editor that can edit video from written descriptions, often called “prompts.” A promotional video appears to show very early steps toward commercial video editing or generation, echoing the hype over recent text-to-image synthesis models like Stable Diffusion but with some optimistic framing to cover up current limitations.
Runway’s “Text to Video” demonstration reel shows a text input box that allows editing commands such as “import city street” (suggesting the video clip already existed) or “make it look more cinematic” (applying an effect). It depicts someone typing “remove object” and selecting a streetlight with a drawing tool that then disappears (from our testing, Runway can already perform a similar effect using its “inpainting” tool, with mixed results). The promotional video also showcases what looks like still-image text-to-image generation similar to Stable Diffusion (note that the video does not depict any of these generated scenes in motion) and demonstrates text overlay, character masking (using its “Green Screen” feature, also already present in Runway), and more.
Make any idea real. Just write it.

Text to video, coming soon to Runway.

Sign up for early access:
Video generation promises aside, what seems most novel about Runway’s Text to Video announcement is the text-based command interface. Whether video editors will want to work with natural language prompts in the future remains to be seen, but the demonstration shows that people in the video production industry are actively working toward a future in which synthesizing or editing video is as easy as writing a command.
Raw AI-based video generation (sometimes called “text2video”) is in a primitive state due to its high computational demands and the lack of a large open-video training set with metadata that can train video-generation models equivalent to LAION-5B for still images. One of the most promising public text2video models, called CogVideo, can generate simple videos in low resolution with choppy frame rates. But considering the primitive state of text-to-image models just one year ago versus today, it seems reasonable to expect the quality of synthetic video generation to increase by leaps and bounds over the next few years.
Runway is available as a web-based commercial product that runs in the Google Chrome browser for a monthly fee, which includes cloud storage for about $35 per month. But the Text to Video feature is in closed “Early Access” testing, and you can sign up for the waitlist on Runway’s website.
You must to comment.
Join the Ars Orbital Transmission mailing list to get weekly updates delivered to your inbox.
CNMN Collection
WIRED Media Group
© 2022 Condé Nast. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated 1/1/20) and Privacy Policy and Cookie Statement (updated 1/1/20) and Ars Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from links on this site. Read our affiliate link policy.
Your California Privacy Rights | Do Not Sell My Personal Information
The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Condé Nast.
Ad Choices