Abstract
Maintaining a unified ontology across various languages is expected to result in effective and consistent organization of Wikipedia entities. Such organization of the Wikipedia knowledge base (KB) will in turn improve the effectiveness of various KB oriented multilingual downstream tasks like entity linking, question answering, fact checking, etc. As a first step toward a unified ontology, it is important to classify Wikipedia entities into consistent fine-grained categories across 30 languages. While there is existing work on finegrained entity categorization for rich-resource languages, there is hardly any such work for consistent classification across multiple low-resource languages. Wikipedia webpage format variations, content imbalance per page, imbalance with respect to categories across languages make the problem challenging. We model this problem as a document classification task. We propose a novel architecture, RNN_GNN_XLM-R, which leverages the strengths of various popular deep learning architectures. Across ten participant teams at the NTCIR-15 Shinra 2020-ML Classification Task, our proposed model stands second in the overall evaluation.