Tavus CEO Hassaan Raza on building the AI avatar developer platform

Explore more on GenAI Applications through our industry hub

Tavus is a video personalization platform designed to improve content creation and engagement, particularly for sales and marketing teams. It simplifies video production by allowing users to record a single template, which is then used to generate personalized AI-driven videos.

Source: SPEEDA Edge research

The following interview was conducted by Sacra—June 2024

Background

Hassan Raza is the CEO and co-founder of Tavus. Sacra talked with Hassaan about the major use cases driving adoption of AI avatar videos, using custom models rather than foundation models, and the research challenges in generating highly realistic digital humans.

Questions

What’s the big use case that you see driving adoption of AI replicas or talking head videos?

Being able to have your videos instantly translated into dozens of languages has become a key selling point for AI avatar platforms. Can you talk about how big of an impact translation will have as a feature? What use case do you see it being most useful for—marketing, sales, learning & development, etc?

How do you think about the gross margins for a company like Tavus? We've already seen GPT-4 fall 85% in price after roughly a year. Do you see compute costs coming down for you in a similar way?

What do you make of the comparison that folks have made between where AI infrastructure companies are today with CDNs during the early web? Is there a parallel there in terms of how the price of all models are going down?

With AI avatars, the once-human actor has become a configurable part of the video player. How is Tavus approaching the redesign of the traditional video player and creation interface to support granular control over avatar appearance, branding, and settings?

How do you expect AI talking heads, text-to-video and video editing via editing text to impact the overall volume and frequency of video content creation by businesses?

Today, people might use Synthesia to generate a talking head, generate B-roll via Stable Diffusion, edit it in Descript, and turn it into clips for social media with Opus Clip. What do these platforms have in common and how are they different? Do you anticipate AI talking head platforms expanding into some of these use cases that are adjacent to video creation?

Can you talk about the “face” vs. the “body” in AI talking heads and text-to-video? How much more difficult is it to generate a believable, non-canny valley face and what’s the current state of the art here?

What role do you see text-to-video models playing in Tavus' product roadmap, and how might they intersect with your AI avatar technology to enable end-to-end video generation?

Tavus uses its own models for talking head generation and for lip-syncing and dubbing. Can you talk about the advantages there over using a foundational model? In what way does using your own model allow you to improve the results over time and make Tavus models the best for avatar generation?

Say everything goes right for Tavus over the next 5 years. What does Tavus look like and how is the world different as a result?

Contact us

Gain access to all industry hubs, market maps, research tools, and more

Get a demo

By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.