2 Blogging Workflows For Creators That Want To Work Across The Text-Audio-Video Divide

Two suggested workflows for using text, audio, and video in your blogging strategy

I’ve written previously about the divide between text, video, and audio.

We’re currently living during an exciting era in content marketing.

The “gaps” that delineate between the three are rapidly closing with tools like voice synthesization and AI video generators chipping away at the divisions and making it easier than ever before for content creators to embrace a future in which content creation is truly format agnostic.

My predictions about whether text or video are going to dominate for content marketers tomorrow can be summarized as follows (at least at the time being):

  • What I call ‘format agnosticism’ is soon going to become entrenched as an expectation among readers. Our readers are going to expect that we’re also working across other formats — and that, at the minimum, we’ve invested in automated (AI-backed) solutions to bridge the divide. If we don’t provide an easy means for them to, say, consumer our blog posts in podcast format, they’re going to quickly loose interest in what we have to say.
  • Today, content creators can take advantage of the fact that jumping from text to audio to video (let’s call it the TAV divide) is relatively cheap and easy. This is an easy win that just about everybody should be taking liberal advantage of. On today’s market: a mixture of manual and AI-backed approaches. To take text to audio as a simple example, you can record a podcast of your recent blog post (that’s the manual method) or you can use a TTS engine like Play (speech synth) to do that for you. In the latter, you’re serving as the human overseer of a robot. Both approaches are valid.

For much more in depth thinking about that, please check out this post (naturally, it has an accompanying video).

If you want to get started and you blog, then here are two workflows.

Text To Audio To Video (TAV)

As most of my content creation is textual (e.g. the post you’re reading) let me start by looking at things from this perspective first. It’s where my natural affinity lies.

We start with a blog post.

If you’re blogging, then you’re likely already familiar with how to do that.

You have a CMS. You have resources on your team that blog. And — if you’re a content marketer — you’ve got some kind of a strategy in place that looks at things like keywords and marketing mission to distribute the type of content you know is going to be most advantageous to your business.

How do we get to audio? If we want to begin leveraging audio to distribute our textual content as a podcast then we can use two approaches:

  • Firstly we can use our own voice to record a podcast version to accompany our blog post.
  • Alternatively we can use text to speech (TTS) in order to use an AI bot to “narrate” the podcast on our behalf.

Clearly each approach has its advantages.

The first method is much more personal. Listeners get to deepen their connection with the author by hearing the nuances in his or her actual voice.

The second requires less effort to produce.

What should you do in any case?

Honor the format. That’s the guiding principles here.

You don’t want to just copy and paste a ream of text into a TTS generator — or read it off. To do so risks coming across as lazy and insincere — you’re just trying to blast the same content out across every channel.

Instead, you want to optimize the content for the format it’s being delivered in.

That might involve:

  • Editing the blog post in order to simplify language and streamline the flow of text. When speechwriting, writers are encouraged to ‘write for the ear’ as their central guiding principle. This is a good approach to take when writing for podcasts too. Edit the transcript before you either run it through a generator or sit down to record your own version.

For making the jump to video, similar principles apply.

If you’re going to create a video blog version of a post that was originally published as text, for instance, you’re going to want to honor the format there too:

  • Video is more immersive than audio. It’s difficult to watch a video while riding a bicycle. Therefore, aim for brevity. Consider whittling your text post down to just its bare essentials.
  • Video provides you with the ability to leverage … well, a visual medium. You can do more than just record yourself sitting at a desk running over the same thoughts you blogged about. Can you think of ways to make the video more immersive? A large budget isn’t even a pre-requisite. You can use a video stock library, for instance, to affordably add some B-roll to break up the narration.

Video To Audio To Text (VAT)

Let’s say that you decide to record a video blog.

How can you work in that direction?

Let’s skip a little faster with this explanation.

Honor the format. The same principle applies.

We don’t need any fancy technology to extract audio out of video. It’s already in there.

But we can and should honor the format:

  • Give your embedded video audio its own postproduction workflow if you’re shooting for a podcast or other audio product as the final output. Edit out pauses. Remaster the track. Consider skipping sections entirely. Record a personalized intro and outro.

No AI is needed here because … well, if you started with a video blog then you recorded the original. So we’ve got a leg up over the TTS version we might have used in the previous workflow here.

Next to break down to audio:

  • If the video blog was uploaded to YouTube then we can actually download the automatically generated captions file and get a strong leg up on the job of creating a text version of the video blog.

We can use a subtitle editor in order to take out the timestamps and just work with the text that YouTube automatically created from our video. Then we can attempt to model that into a blog.

A subtitle file automatically generated by YouTube being inspected on a computer. Photo: author.
After using a subtitle editor to extract a plain text version and remove the timestamp tags

Next we need to … once again, honor the format.

We’ll need to spend a bit of time — potentially quite a bit of time — taking our thoughts as we conveyed them over video and editing them into a format that’s going to look good in text.

We might need to rewrite entire sections, skip over others, and add headers. The final product will be a blog post that is designed to be read.

It’s quite a bit of work. But if you want to maximize distribution options and make readers want to read the text version, then you’re going to have to offer them something more attractive than the raw output of the automatically generated YouTube file.

The Final Step: Bring It All Together

The final step to take in order to make your content as format-agnostic as possible: bringing it all together.

If your blog post was originally published to YouTube as a video blog:

  • Link off to the blog version of the video that you put together using the above methodology.
  • Link to the podcast version that you edited too.

If your blog post was originally published in writing (say here on Medium):

  • Link to the audio version from the blog post
  • Embed the video

If you blog on owned channels–like your blog — consider using share icons to make this more visually attractive.

Daytime: tech-focused MarCom. Night-time: somewhat regular musings here. Or the other way round. Likes: Linux, tech, beer. https://www.danielrosehill.com