South African start-up using AI to make video content more accessible

201

Nick Argyros, chief operating officer, and Francois Schreuder, CEO, Phonetik AI

Phonetik AI, a South African start-up building large language models that create subtitles, closed captions and narration, is helping the visually and audio impaired get more information from the video content they consume.

Speaking to TechCentral at the AWS Summit in Johannesburg last week, Phonetik AI founder and CEO Francois Schreuder said making the creation of accessibility tools cheaper and quicker can help content producers incorporate the technology into their workflows instead of viewing it as a grudge purchase.

“Around 24% of the global population is – to varying degrees – either visually or hearing impaired. This means they can’t [necessarily] access entertainment, educational and training material that could help them better their lives,” said Schreuder.

“If you’re talking about a piece of content, say an episode, it could take two days to manually make the subtitles and descriptors that would make the content accessible. With our tools, we can cut that down to 30 minutes.”

According to Schreuder, the use of automated tools also reduces costs by around two-thirds. The main driver for broadcasters and content creators is to reach a wider audience.

Phonetik chief operating officer Nick Argyros said the addressable market for the company’s tools substantial – and it has global ambitions.

Another driver making it imperative for broadcasters to improve the accessibility of their content is the European Accessibility Act, introduced in June. The act has made the inclusion of accessibility tools mandatory for all newly placed products and services. It covers advertising content on social media platforms including YouTube, broadcast content on television and even training material for employees in a factory.

Schroder and Argyros said similar legislation is in the works in the US, and it’s only a matter of time before South Africa develops laws of its own.

Language support

“Just like South Africa did with GDPR and Popia, we believe South Africa will have its own accessibility legislation,” said Argos. “In South Africa’s context, some people can’t see very well but they cannot afford to have their eyes checked, and this will help them access content better.”

Producing accurate subtitles is perhaps the easier of the tasks Phonetik AI has taken on, with languages supported including English, Portuguese and Hindi. Other languages will be added as development progresses, especially in the South African context. The availability of training data has guided which languages Phonetik AI has chosen first.

The tasks become more complex when “contextual descriptors” need to be produced. This textual or auditory information describes aspects of what is happening on screen that cannot be decoded visually by those who can see but can’t hear or from dialogue by those who can hear but can’t see.

Read: The SA start-up using AI to read X-rays – and save lives

“If a gunshot goes off on screen, there is no need to add descriptors for it for those who can see. But if a bomb goes off and all that is visible is shock on a character’s face, the person who cannot hear it must be told it is happening,” said Schreuder.

An important aspect of creating accurate textual descriptors for the hearing impaired is the development of a database of sounds that the AI models can use as reference points for identifying similar sounds in video content. To this end, Phonetik AI is working with Audio Militia – a company specialising in creating sounds effects – to name and categorise different sounds.

“These are the guys who would make the sound of crickets or footsteps in those old-school radio stories, so they have thousands of sounds in their database, but now we are doing it in reverse,” said Argyros.

To create the auditory descriptors, Phonetik AI is training LLMs to speak using base models from other companies. However, one of Phonetik AI’s goals is to give the visually impaired the option to listen to subtitles and contextual audio descriptors in an accent closer to their own. This involves creating and training its own base models using local voices.

“We’ve been working on creating base model voices where I try and clone my own voice. It has this undertone of being American or from the UK, because the base models are from there. Creating our own base models will allow us to create voices with South African accents,” said Argyros. – © 2025 NewsCentral Media

Get breaking news from TechCentral on WhatsApp. Sign up here.

Don’t miss:

Haibo! AI language models for Zulu and Sotho in the works