Abstract
The task in Document Understanding Conferences (DUC1) 2006 is to generate a fixed length, user oriented, multi document summary, which remains same as that of DUC 2005. We have used two features to score the sentences based. The sentences are picked to form the summary based on the calculated score. The first feature is a query dependent scoring of a sentence which is an improvement over the HAL feature. The second feature is based on the observation that sentence importance, which is independent of the query, needs to be captured in the current approaches. We have explored the use of web in scoring the sentences in a query independent manner. Experiments show a performance gain of 6-7% over HAL feature by the inclusion of two new features. Our summarization system was ranked 1st in all automatic evaluations with significant margin from second best system, 5th in responsiveness and 9th in linguistic quality evaluations in DUC 2006. Relatively lower performance in linguistic quality can be attributed to the stripped off sentences at the end of summary, when the summary length is exceeding 250 words.