Figure 1: Transcription Word Error Rate (WER) from Google and Speechmatics on the CoVoST2 speech translation test set. Lower scores are better.
Figure 2: BiLingual Evaluation Understudy (BLEU) scores from Google and Speechmatics on the CoVoST2 speech translation test set. Higher scores are better.
Figure 3: COMET scores from Google and Speechmatics on the CoVoST2 speech translation test set. Higher scores are better.
References | [1] Wang, C et al. "CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus." arXiv:2007.10310 (2020). [2] Papineni, et al. "Bleu: a method for automatic evaluation of machine translation." Proceedings of the 40th annual meeting of the Association for Computational Linguistics (2002). [3] Rei, R., et al. "Unbabel’s Participation in the WMT20 Metrics Shared Task." In Proceedings of the Fifth Conference on Machine Translation, pages 911–920, Online. Association for Computational Linguistics (2020). [4] Conneau, A., et al. Unsupervised Cross-lingual Representation Learning at Scale. n Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics (2019). |
Author | Caroline Dockes |
Acknowledgements | Ana Olssen, Andrew Innes, Benedetta Cevoli, Chris Waple, Dominik Jochec, Dumitru Gutu, Georgina Robertson, James Gilmore, John Hughes, Markus Hennerbichler, Nelson Kondia, Nick Gerig, Owais Aamir Thungalwadi, Owen O'Loan, Stuart Wood, Tom Young, Tomasz Swider, Tudor Evans, Venkatesh Chandran, Vignesh Umapathy and Yahia Abaza. |