40% Bleu Score
This model is trained on several corpora that are a part of the WMT-19. These consist of the parallel corpora that are in the Paracrawl, Common Crawl, news-commentary, Yandex, Wiki-titles and the UN datasets. This model achieves a Bleu score of 40.0 which is state of the art for 2019. The Bleu is a commonly used metric for translation tasks and it is a way of comparing model generated text to a gold standard to see how similar they are.
The Bleu is a method for assessing the quality of text that has been machine-translated from one language to another. The closer the machine translation is to expert human translation, the better the score.
Further information here.
This model uses the Facebook Fairseq sequence modeling toolkit. This model is based on the big Transformer architecture as implemented in Fairseq.
In a standard Transformer model, there are stacked encoders which communicate with stacked decoders. As a part of these components, there are sublayers which implement Attention. This attempts to enable the model to look at other parts of the input and output sentence as it translates instead of just looking at the individual word. The input text is converted to embeddings which are then fed into the layers of the model. Further information about the Transformer model can be found here.
The authors found that by increasing the feedforward network size they were able to achieve a reasonable improvement in performance while maintaining a manageable network size. Further information about their modifications can be found here.
This model is trained on several corpora that are a part of the WMT-19. These consist of the parallel corpora that are in the Paracrawl, Common Crawl, news-commentary, Yandex, Wiki-titles and the UN datasets. More information about these datasets can be found here. The data had several preprocessing steps including language identification, large scale back-translation, ensembling, re-ranking as well as training first on lower quality datasets and then fine-tuning on higher quality ones.
This model was validated on the test set of the WMT-19 dataset and achieves a Bleu score of 40.0. This test set was created out of headlines from September-November 2018. Additionally, in the WMT-19 competition there was also human evaluation.
The input(s) to this model must adhere to the following specifications:
This model will output the following:
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience and Modzy product offering.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.