Abstract
—Stack Overflow is a Community Question Answering service that attracts millions of users to seek answers to their questions. Maintaining high-quality content is necessary for relevant question retrieval, question recommendation, and enhancing the user experience. Manually removing low-quality content from the platform is time-consuming and challenging for site moderators. Thus, it is imperative to assess the content quality by automatically detecting and ‘closing’ the low-quality questions. Previous works have explored lexical, communitybased, vote-based, and style-based features to detect low-quality questions. These approaches are limited to writing styles, textual, and handcrafted features. However, these features fall short in understanding semantic features and capturing the implicit relationships between tags and questions. In contrast, we propose LQuaD (Low-Quality Question Detection), a multi-tier hybrid framework that, a) incorporates semantic information of questions associated with each post using transformers, b) includes the question and tag information that enables learning via a graph convolutional network. LQuaD outperforms the state-of-the-art methods by a 21% higher F1-score on the dataset of 2.8 million questions. Furthermore, we apply survival analysis which acts as a proactive intervention to reduce the number of questions closed by informing users to take appropriate action. We find that the timeframe between the stages from the question’s creation till it gets ‘closed’ varies significantly for tags and different ‘closing’ reasons for these questions. Index Terms—Stack Overflow, Community Question Answering, Low-quality questions