Natural Language Inference in Tamil: Dataset and Evaluation

K. Ezhilarasi, L. Jayasree

K. Ezhilarasi, L. Jayasree

Abstract

Natural Language Inference (NLI) has been believed to test a model's language understanding capability. Recent works like Multilingual BERT and XLM-Roberta has raised significant interest in zero-shot cross-lingual NLI in the Natural Language Processing (NLP) community. We observed that the current Cross-Lingual Natural Language Inference (XNLI) not having any language from the Dravidian family of languages. Therefore, in this work, we generate a new Cross-lingual Natural Language Inference (NLI) dataset for the Tamil Language through translation -- both human and machine translation -- the Cross-Lingual Natural Language Inference (XNLI) test dataset. Further, we provide baselines on our dataset. This dataset would help improve the Natural Language Processing in Tamil, especially with the ongoing research in cross-lingual learning.