Abstract
XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages Tushar Abhishek, Shivprasad Sagare, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta∗ and Vasudeva Varma Information Retrieval and Extraction Lab, IIIT Hyderabad, India {tushar.abhishek,shivprasad.sagare,bhavyajeet.singh,anubhav.sharma}@research.iiit.ac.in, {manish.gupta,vv}@iiit.ac.in Abstract Multiple critical scenarios (like Wikipedia text generation given English Infoboxes) need au- tomated generation of descriptive text in low resource (LR) languages from English fact triples. Previous work has focused on English fact-to-text (F2T) generation. To the best of our knowledge, there has been no previous at- tempt on cross-lingual alignment or generation for LR languages. Building an effective cross- lingual F2T (XF2T) system requires alignment between English structured facts and LR sen- tences. We propose two unsupervised meth- ods for cross-lingual alignment. We contribute XALIGN, an XF2T dataset with 0.45M pairs across 8 languages, of which 5402 pairs have been manually annotated. We also train strong baseline XF2T generation models on XAlign dataset