This model converts speech from Arabic language 16Khz audio and other files into text. It accepts audio files including WAV, MP3, PCM and other popular formats and outputs text in JSON, XML, TEXT or SRT formats. The model includes punctuation, capitalization, timecodes, word confidence score, and speaker diarization. This model can be used to transcribe meetings, interviews and other 16Khz recorded content.
Create a Modzy account to get started →
AppTek’s ASR models achieve approximately the same accuracy on real world data as the top cloud service providers. When we build ASR systems for academic tasks, following the comparable training and evaluation conditions as other ASR teams in the community, we achieve state of the art results on popular US English benchmark tasks like LibriSpeech (5.5% on “test-other”) or Switchboard (11.7% on “Hub5 2000 eval”).
AppTek’s acoustic models are backed by bi-directional recurrent neural networks with LSTM units. The models are trained using the RETURNN toolkit — a software package for neural sequence-to-sequence models, developed jointly by of the RWTH Aachen University, Germany and AppTek. The toolkit is built upon the TensorFlow backend and allows flexible and efficient specification, training, and deployment of different neural models.
AppTek trains all ASR models on very large collections of annotated audio data. We compile the training data from a wide variety of sources in order to achieve a high level of generalization.
Get a video demo of Modzy, the ModelOps and MLOps software platform that businesses use to deploy, integrate, run, and monitor AI—anywhere.