DeepSpeechUnity

Try

Source code in Github

Binaries in Github (Win64)

Description

It's an application to transcribe audio to text in subtitle format (.srt) using machine learning with Mozilla's DeepSpeech software.

This application accepts a path to both audio and video file or a folder containing any combination of both. From these files, it extracts the audio using ffmpeg, which creates some threads that process the audio using DeepSpeech. Once the transcription is available, it is formated as a SubRip file and saved in the file system with the same name in the same folder, or selected by the user.

In the case of an audio input, the application offers the possibility to create a video with the audio as a subtitle container for later viewing.

Keywords: Use of machine learning, Multithreading, Use of external libraries and other programming languages, Automatic creation of subtitle files

Developed: June 2021 - September 2021 (Unity2021.3.12f1 Windows 11).

Skills employed for the development:

Parallel processing.
Using a machine learning tool.
Responsive interface.
Using of Yasirkula's File Explorer.
Saving and automatic loading of settings via file system.
Complex unit tests.
Use of coroutines.
Using various features of FFMPEG.
Recognition and selection of language models.
Integration of a console through log redirection and a highlightable but non-editable input field.
Personalized icon.
Particular attention to UX.

Possible improvements

Add a language manager. One could use it to download models from various places on the Internet just by choosing the language. This would require another screen with a list of servers providing all of these languages.

A more beautiful user interface.

Allowing the user to provide get result in text format (instead of actual subtitle).