DeepSpeechUnity

Description

It's an application to transcribe audio to text in subtitle format (.srt) using machine learning with Mozilla's DeepSpeech software.

This application accepts a path to both audio and video file or a folder containing any combination of both. From these files, it extracts the audio using ffmpeg, which creates some threads that process the audio using DeepSpeech. Once the transcription is available, it is formated as a SubRip file and saved in the file system with the same name in the same folder, or selected by the user.

In the case of an audio input, the application offers the possibility to create a video with the audio as a subtitle container for later viewing.


Keywords: Use of machine learning, Multithreading, Use of external libraries and other programming languages, Automatic creation of subtitle files

Developed: June 2021 - September 2021 (Unity2021.3.12f1 Windows 11).

Skills employed for the development:

Possible improvements

Add a language manager. One could use it to download models from various places on the Internet just by choosing the language. This would require another screen with a list of servers providing all of these languages.

A more beautiful user interface.

Allowing the user to provide get result in text format (instead of actual subtitle).