Asst. Prof. Rana Hanocka Receives NSF Grant to Develop New AI-Driven 3D Modeling Tools
The recent surge in AI-driven technologies that turn text prompts into images opened up new possibilities for the creation of bespoke graphics by non-experts. But the output of models such as DALL-E and Stable Diffusion is uniformly two-dimensional.
With a new grant from the National Science Foundation, assistant professor Rana Hanocka seeks to add depth to this new approach to generating and manipulating graphics. The funding, her first from the NSF, will fuel the development of 3DStylus, a suite of tools for editing, transforming, and creating 3D content using only natural language instructions.
The project builds upon Hanocka’s previous work developing AI tools that make it easier for users to work with 3D graphics. With Text2Mesh, Hanocka and her team enabled style editing for 3D objects; users can type in phrases such as “colorful crochet candle” or “astronaut horse” to change the texture and color of a 3D model. Now with 3DHighlighter, users can select specific parts of a 3D object – such as the wheels of a vehicle or the feet of a human or animal – using only a text command.
Hanocka’s vision for 3DStylus will expand upon this work into three broader tools: 3D Editor, 3D Morpher, and 3D Creator. The 3D Editor will allow users to modify pre-existing 3D models without changing the base shape, such as changing the color or pattern of the cushions on a chair. With the 3D Morpher, users can make significant changes to the geometry of an object, such as modifying the shape of a chair’s legs or back. Finally, the 3D Creator will produce novel 3D objects based on text prompts, analogous to how the recent wave of generative art AI models work.
“3DStylus has the potential to revolutionize 3D content creation just as DALL-E has already done in 2D – and we aim to go a step further by giving users far greater control over what is generated,” Hanocka said. “The tools in 3DStylus can synergize to enable fine-grained and intuitive control over 3D content creation. An entire 3D model life-cycle can be realized through exchanges between the tools to facilitate new, previously unimagined 3D content. As an example, starting with generating a 3D model through text using the 3D Creator, we could then add text-driven textures via the 3D Editor and update the shape geometry using the 3D Morpher, and continue to iterate and refine the 3D model through this enhanced creative process.”
Each of these tools requires an additional layer of technical work compared to today’s text-to-image models. While the AI underlying DALL-E and its peers can be trained using the massive amount of two-dimensional image data available on the internet, 3D data is far more scarce. To meet this challenge, Hanocka has developed new approaches that use 2D images as a signal for creating and manipulating 3D objects.
“Creating 3D training data is hard,” Hanocka said, “but with the novel approaches we are proposing, we can make significant advancements in the 3D space without needing lots of 3D data.”
By solving this problem and building accessible tools, 3DStylus can fulfill Hanocka’s mission of democratizing 3D graphics and reducing technical barriers so that experts and non-experts alike can more easily create 3D art, animations, and models for industry and engineering. Read more about Hanocka’s work at the website for her research group, 3DL.