Beeld en Geluid

HOSAN 2

Speech recognition — the technology that automatically converts spoken words into text — is becoming increasingly important for applications such as subtitling, audiovisual archiving, and voice-driven systems. However, current systems often perform less well for people with regional accents, dialect speakers, or those who switch between Dutch and other languages. This creates practical and societal challenges: subtitling becomes less reliable, archives contain less representative transcripts, and users become frustrated when technology fails to understand their natural speech.

HOSAN 2 builds on the earlier HOSAN – High-quality Speech Recognition for All Dutch, which explored how speech recognition could become more inclusive. That first phase showed that current evaluation methods mainly measure overall accuracy, but do not explain why systems fail for specific speakers or in particular contexts.

This follow-up project develops new diagnostic evaluation methods to better understand these causes. The project examines not only the audio signal of speech, but also transcripts and contextual factors such as background noise and conversational dynamics. It also redefines what “good performance” means in different use cases: for example, live subtitling requires readability, while archiving prioritizes preserving meaning.

Together with national, regional, and local public broadcasters, the project will develop pilot tests and a dataset containing recordings, transcripts, and contextual information. These will make it possible to identify when and why speech recognition systems fail, allowing researchers and companies to improve them more effectively.

By developing context-aware and transparent evaluation methods, HOSAN 2 contributes to more inclusive speech technology that better reflects the diversity of spoken Dutch — from television subtitling to voice interfaces.

€151,800 will be used as a PPP program grant.