Why we built the ClipZap workflow

Published -

January 8, 2025

Why we built the ClipZap workflow

Something old.

In 2019 I and a few of my ByteDance colleagues built a huge creative engine for Chinese agencies and brands, Mega Creative (cc.oceanengine.com). The year also became marked as a turning point in the advertising space, with thousands of agencies scrambling to use it to create huge amounts of video. Video advertising officially went from individual creation to batch generation. We call it the year of programmatic advertising.

But the biggest downside to programmatic advertising is duplicate content. Since we have a limited number of content sources (original content) available, but an infinite number of videos needed for placement. So until there is AI-generative technology, all programmatic ad users will face a huge risk of content duplication.

This also means that out of the 100,000 videos we generate, only 1% may bring 99% return.

Does that mean we only need to produce that 1% of videos? The answer is no. Because 99% of the videos bring in hedged test data against the recommendation algorithm machine, and without 99% of this portion of the videos, there is no 1% chance of winning.

This continues until 2021.

A brand new way of playing in the programmatic advertising industry appeared, which is called empty lens strategy in China, and more people call it B-Roll video in the global market. In fact, the principle is very simple to explain, although the number of video sources is fixed, but some similarity of the screen is able to play a complementary role to make the video appear different. For example, a game advertisement video can use some funny videos or historical video footage to cover up certain video clips in order to achieve the requirement of reducing the video repetition rate.

Looks good right? But it still doesn't solve the problem of repetition in programmatic advertising. Without solving the repetition problem, companies won't be able to gain the ability to continue to grow at scale.

The Beginning of the Business

When we saw GPT3 inside ByteDance, we were still hesitant in our mind. Because not only is there copy in the video, the highest percentage is still a lot of video footage. If the video frames can be massively generated by AI based on cue words, even if the usability rate is less than 50% it's still a direction you can all bet on. Because we know that video creation accounts for at least 25% of the tens of billions of dollars spent on advertising each year. A product that wants to achieve a stable mass buy must have more than 1,000 videos on the ad backend every day. Some key video consuming industries such as gaming and e-commerce are consuming more than 10,000 videos per day.

When we hesitate, opportunity always knocks on the door in advance. We received a request from a client who was more than happy to prioritize the use of AI technology for video scriptwriters, because behind the support of 500 video copywriters per day is the result of a team of more than 10 creatives, who can barely sustain the result after constant brainstorming from 8:00 am to 11:00 pm every day.

The feedback from the copywriting AI technology was beyond expectations, and even more stunning ideas burst out while they were using it. “This is so much fun!” Clients give us frequent feedback, and providing AI technology services hasn't increased the number of Customer Success Managers we've had exponentially, even though we've prepared for it well in advance. But the highly personalized service of AI technology made those preparations seem completely unnecessary.

So we immediately left ByteDance and started building a new programmatic advertising product.

Initially, we also wanted to cut our teeth on AI copywriting technology, but we realized that there were already established copywriting products like jasper.ai and copy.ai already serving our clients. So after thinking about it for a long time, we decided to start with video content production, where we have the most experience.

Video content production faces two big challenges:

  1. Is generated video content highly original?
  2. How can globalized videos face the challenge of multilingual localization?

In addition to this there is actually another dilemma, problem 1 and problem 2, which should be the highest priority problem to solve. Actually in 2023 we don't have a clear picture of how realistic generative AI content, especially in the field of video generation, can actually be. For example, although OpenAI released a demo video of SORA, but the frequent penetration shots and anti-physics problems in the picture still let us slightly disappointed. After all, we need to provide high-quality video services for commercial services, not produce tons of content junk.

So our choices became clear, we started by globalizing video and quickly gained support from cold-start customers in mainland China, Hong Kong and Singapore. These customers' businesses are globalized, especially in Southeast Asia and East Asia, where the language gap between different countries is very large. One of the biggest headaches is the inconsistency in the length of languages in different countries, for example:

in English: This is a nice looking dress;

In Japanese: これは素敵なドレスだ

In Indonesian: Ini adalah gaun yang terlihat bagus

In Arbic: هذا الفستان جميل المظهر

We can see that even partially pronounced, the length varies greatly. For this reason, we have specifically trained the language model for fine-tuning at the translation level, using a large number of advertisement corpus targeted optimization. 

Once we had the video translation model in place, we got more feedback from our customers on the localized faces. This is based on the data accumulated by our long-term clients on advertisement video material, localized actors will affect the conversion rate of about 5pp-8pp than non-localized actors. If the placement amount is large enough, this 5pp-8pp conversion rate will most likely be on the scale of tens or hundreds of thousands of dollars in revenue.

So we started quickly trying to solve the localized faces problem. There are two strategies to solve the localized faces problem: avatar generation or video face swapping. The advantage of avatar generation is the realistic effect, but the disadvantage is also obvious: brand new avatars are very expensive. We have estimated that a video clip generated from a brand new avatar is 8-10 times more expensive than a video face-swapping model, because we have to train a brand new image of the avatar model first. However, the video face-swapping model also has the very obvious disadvantage of a higher failure rate of replacement in some side or facial incomplete images.

But the cost of using this model is low, and although the success rate is low clients are willing to accept multiple uses and pick the segments that work best. (Of course we are continuously improving the face swap model to achieve the best results)

Now, the video creation workflow

Let's not forget that our original genesis was how to enable our clients to utilize AIGC technology for large-scale programmatic advertising video production, and we're still on the path to get there.

We built the AIGC video editor that belongs to the programmatic advertising space based on the comfyUI concept. Customers can connect the production process using various types of AIGC cards provided by us, which are modeled from powerful OpenAI or Anthropic, and also from famous generative video modeling vendors kling,luma,minimax,pikalabs,haiper and so on. We've finished ending most of the APIs so that customers don't have to jump across multiple platforms to use them, they can use them directly in ClipZap's workflow. And we also offer the most favorable points program, which saves more time cost and labor cost than calling these APIs directly.

Prior to the launch of ClipZap 1.0, we have customers in China, Singapore, Canada, the United States and other regions, and we will gradually share our stories with our customers later. We are rapidly moving more product features onto the clipZap workflow editor to make this editor even more powerful. Currently, in order to collect more feedback, our editor is completely free to use, you can experience any features in the editor without any limitations.

Started with a special customer interview

As we have suggested above, it was one of our longtime business clients who pressed the start button. Their main business is episodic production, including TV series, but they also have a large amount of business involved in streamlined shorts, short videos, and so on. On an average day, their team of writers and directors (a group of about 10 people, each dedicated to the creation of a short episode) produces more than 500 video copy and script references per day for the filming party or group director to extract for review. As you know, these types of episodes have to be created with a constant push for continued success.

But human attention span and creative power are finite assets. So behind the mandatory daily content output is actually a lot of plagiarism, lifting, and light rewriting action going on in support, and such content obviously doesn't pass muster with the writers and directors.

So the emergence of GPT brought them a turnaround.

But what we want to talk about is not how to serve the short-form industry well, which is, of course, one of our sources of clients, haha. What we want to talk about is a massive change in the content production model, from single boutique content creation, to personalized scale creation.

This shift applies to the advertising industry, the short-form industry, the e-commerce industry, and the SEO marketing services industry. More personalized, professional and large-scale content production can help clients get more quality user resources. The emergence of AIGC has greatly improved the readability and personalization of content, we no longer need to massively splice content, or hire more writers to write blog posts, only need to train an AI robot that meets the generation needs to complete 80% of the content work.

How are we changing the content production game?

First of all we would like to list a few examples of similar products to help you understand clipzap's core competencies.

01 Zapier | Automate Your Work

Core Product Philosophy: Zapier's No-Code Product Philosophy is A Simple Application for Automation

Key features include:

  • No-code or low-code automation: Users can create automated workflows without having to write programming code
  • Wide integration with apps: Over 6,000 app integrations supported
  • Customizable workflows: Users can set up complex automated processes that fit their needs

Competitive Advantages

  • User-friendly interface: The system is designed with the average user in mind
  • Availability of integration: Connects with a wide range of software, achieving most popular ones
  • Good example of SEO Strategy: Relevant search results through focused landing pages
  • Capital efficiency: Dared large revenue with limited venture capital
  • Flexible operations boost efficiency in a remote work model

Market Share

  • Although Zapier does not share exact market share details, it remains one of the best-rated automation tools.
  • As of March 2021:ARR (Annual Recurring Revenue): $140M
  • Valuation: $5 billion
  • Paying customers: 125,000 (Integromat: 5,000 in early 2020)

02 Palantir | Get AI Into Operations

Core Product Philosophy

Palantir's core product strategy is a highly customized data fusion and analytics solution that is tailored for complex use cases.

Main products include:

  • Gotham: Built with a focus on complex data pattern recognition for use in intelligence and defense sectors
  • Foundry: A data integration and analysis platform for business users
  • Comprehensive data analysis and automation solution using large language models

Competitive Advantages

  • Powerful customization capabilities: Provides language for very intricate worlds
  • Strong data integration skills: Able to import and merge complex data from different heterogeneous sources
  • Data leader in AI technology: Known for integrating generative AI and traditional data insights
  • Focus on high-value industries: Keeping up with defense, government, and energy sectors
  • Security and reliability: Since data at rest is always encrypted, is used in industries with very high data security requirements like military and finance

Doesn't that sound surprising? The high market share of automation software products represents a strong potential for profit. The underlying technology behind these automation software is workflow, known as a modular API microservice aggregator.

We differ from it by using video content production as an entry point, as the demand for video content production is blowing up and we can see more customers in this space who are struggling with video production. (Thank you tiktok, thank you Instagram, thank you x, you are the ones who made video the new medium of communication.)

Pirate's Treasure

If you ask me what a pirate's greatest treasure is, I believe it would be the Black Pearl. The gems and gold coins in front of you may briefly bulge your wallet, but only the Black Pearl can lead you to more endless treasures.

ClipZap is your Black Pearl. Under ideal conditions, you can equip your Black Pearl with as many heavy artillery pieces as you like, such as the powerful AI Video Generator, AI Auto Mixer. Or go with the best helmsman and first officer, such as the AI Video Effects Generator or the AI Copy Generator. Once you've done that, you can sail the Black Pearl to faraway places, such as your personalized landing page, your e-commerce shelf platform, or an advertising platform full of opportunities and challenges.

How to choose these collocations is very important for you, we not only provide each user with unlimited possibilities of combinations and collocations, but also provide users with a highly secure data and content protection system. Every step of data and content generated in the workflow is traceable and will support viewing history in the future. At the same time, we use the world's most secure anti-attack barriers and the most trusted cloud service vendors.

When you have these superpowers, we're confident that the Black Pearl will take you to places full of treasure.

Related Articles