Abstract
Discourse parsing, which involves understand- ing the structure, information flow, and mod- eling the coherence of a given text, is an im- portant task in natural language processing. It forms the basis of several natural language pro- cessing tasks such as question-answering, text summarization, and sentiment analysis. Dis- course unit segmentation is one of the funda- mental tasks in discourse parsing and refers to identifying the elementary units of text that combine to form a coherent text. In this pa- per, we present a transformer based approach towards the automated identification of dis- course unit segments and connectives. Early approaches towards segmentation relied on rule-based systems using POS tags and other syntactic information to identify discourse seg- ments. Recently, transformer based neural sys- tems have shown promising results in this do- main. Our system, SegFormers, employs this transformer based approach to perform multi- lingual discourse segmentation and connective identification across 16 datasets encompassing 11 languages and 3 different annotation frame- works. We evaluate the system based on F1 scores for both tasks, with the best system re- porting the highest F1 score of 97.02% for the treebanked English RST-DT dataset