Abstract
The number of individuals having identical names on the internet is increasing. Thus making the task of searching for a specific individual tedious. The user must vet through many profiles with identical names to get to the actual individual of interest. The online presence of an individual forms the profile of the individual. We need a solution that helps users by consolidating the profiles of such individuals by retrieving factual information available on the web and providing the same as a single result. We present a novel solution that retrieves web profiles belonging to those bearing identical Full Names through an end-to-end pipeline. Our solution involves information retrieval from the web (extraction), LLM-driven Named Entity Extraction (retrieval), and standardization of facts using Wikipedia, which returns profiles with fourteen multi-valued attributes. After that, profiles that correspond to the same real-world individuals are determined. We accomplish this by identifying similarities among profiles based on the extracted facts using a Prefix Tree inspired data structure (validation) and utilizing ChatGPT's contextual comprehension (revalidation). The system offers varied levels of strictness while consolidating these profiles, namely strict, relaxed, and loose matching. The novelty of our solution lies in the innovative use of GPT -- a highly powerful yet an unpredictable tool, for such a nuanced task. A study involving twenty participants, along with other results, found that one could effectively retrieve information for a specific individual.