This model searches for a given set of keywords within speech in an audio file, and if they are found, returns the timestamps at which they occur. This capability can be especially useful for identifying sections and entities of interest within audio and video recordings.
See the model in action with a Modzy MLOps platform demo or start a trial
7.5% Word Error Rate
This model achieves a 7.5% word error rate on the LibriSpeech clean test corpus, which consists of around 1,000 hours of English speech derived from read audiobooks.
Word Error Rate measures the performance of speech recognition or machine translation at the word level and is derived from the Levenshtein distance.
This model uses the open source Speech-to-Text engine named DeepSpeech, implemented by Mozilla, which is based on the Deep Speech algorithm. It uses a recurrent neural network architecture with 5 hidden layers, each containing 2,048 neurons.
This model was trained on the combined Fisher, LibriSpeech, Switchboard, and Common Voice English datasets, in addition to approximately 1,700 hours of transcribed WAMU (NPR) radio shows. It was trained for 75 epochs using a learning rate of 0.0001 and a batch size of 128. After training, the weights with the best validation loss were selected. This model was trained using Quadro RTX 6000 GPUs.
This model was validated on the LibriSpeech clean test corpus which consists of approximately 1,000 hours of 16kHz English speech derived from read audiobooks from the LibriVox project.
The input(s) to this model must adhere to the following specifications:
The “word.txt” file should contain words to be spotted in the audio (one word per line, case independent). The “input.wav” file contains the audio to be searched for occurrences of the words specified in “word.txt”.
This model will output the following:
The “results.json” file will contain the detected word occurrences and corresponding timestamps in the following JSON format: [{“word”: “keyword1”, “start_time “: startTime, “duration”: duration}, {“word”: “keyword2”, “start_time “: startTime, “duration”: duration}]
See how quickly you can deploy and run models, connect to pipelines, autoscale resources, and integrate into workflows with Modzy MLOps platform
d o n o t fill t h i s . f i e l d d o n o t fill t h i s . f i e l d