Urdu to English Translation

Model by Modzy

This model translates text from Urdu to English. It uses state-of-the-art technology to not only provide high quality English translations, but also has the ability to translate large bodies of text as opposed to other widely used translation tools.
  • Description

    Product Description

    PERFORMANCE METRICS:

    This model achieves a BLEU score of 0.1584 on a combined dataset containing: Bible, GlobalVoices, GNOME, JW300, OpenSubtitles, QED, Tanzil, Tatoeba and Ubuntu parallel corpora.

    The Bleu is a method for assessing the quality of text that has been machine-translated from one language to another. The closer the machine translation is to expert human translation, the better the score.

    Further information here.

    OVERVIEW:

    This model uses the Transformer architecture, which is currently the basis for many state-of-the-art translation models. The essence of the Transformer model is the encoder-decoder architecture with an attention mechanism. Multiple encoders are stacked, each consisting of a self-attention layer and a feed-forward neural network. Word embeddings are fed through the encoding layers and then passed to the decoding layers which generate the translated output sequences. The transformer used by this model is further described here. This implementation uses the OpenNMT framework.

    TRAINING:

    This model is trained on the Bible, GlobalVoices, GNOME, JW300, OpenSubtitles, QED, Tanzil, Tatoeba and Ubuntu parallel corpora. These total 1,182,111 lines of parallel text and can be found at here. This model was trained for 200,000 steps on 4 GPUs.

    VALIDATION:

    This model was validated on 2,500 parallel sentences and achieves a BLEU score of 0.1584.

    INPUT SPECIFICATION

    The input(s) to this model must adhere to the following specifications:

    Filename Maximum Size Accepted Format(s)
    input.txt 1M .txt

    OUTPUT DETAILS

    This model will output the following:

    Filename Maximum Size Format
    results.json 1M .json

    The “results.json” file will contain the translated text in the following format: {"text": "text translated from Urdu"}