Abstract
Pattern mining is an important task of data mining and involves the extraction of interesting associations from large databases. In the literature, pattern mining approaches have been investigated to extract the different kinds of patterns such as frequent patterns, sequential patterns, periodic patterns, utility patterns and coverage patterns from transactional databases. The pattern mining approaches have been widely employed to improve the performance of applications in the areas of recommendation systems, e-commerce, customer relation management, internet advertising, decision support systems, social media and so on. One of the key issues of developing pattern mining algorithms is as follows: a pattern mining algorithm has to process a huge volume of transactional dataset. Research efforts are going on
to develop pattern mining algorithms to operate on large datasets by exploiting different parallel processing paradigms, namely shared memory, shared disk, shared-nothing, GPUs, incremental, MapReduce and Grid.
Typically, a given transactional database D gets updated due to the addition and/or deletion of transactions. Consequently, some of the previously discovered patterns may become invalid, while some
new patterns may emerge. To extract the new knowledge we should mine the database every time it
gets updated. One of the main issues is to mine huge Dynamic Databases for every time instant from
scratch is computationally costly. The concept of incremental mining is evolved to solve the issue of
mining patterns for every instant efficiently for Dynamic Databases and has become an active research
area. When the database is updated, with Incremental mining, there is an opportunity to improve the
performance by limiting the computation to updated parts of the database and with the required additional processing. In the literature, Incremental mining approaches have been investigated to extract
different kinds of patterns such as frequent patterns, sequential patterns, periodic patterns and utility
patterns from transactional databases. In this thesis, we propose an improved approach for extracting
Coverage Patterns for Dynamic Databases under the incremental paradigm.
Given a transactional database, the coverage pattern mining involves the extraction of multiple sets of
items, where each set (covers a certain percentage of transactions) satisfies relative frequency, coverage
support and the overlap ratio constraints. The applicability of coverage patterns has been demonstrated
in improving the performance of search engine advertising, banner advertising and visibility mining. In
the literature, a level-wise iterative coverage pattern mining (CPM) and a pattern growth approach have
been proposed.In this thesis, we propose a Comprehensive Coverage Pattern Mining (CCPM) approach, for efficiently extracting Coverage Patterns under the incremental paradigm, when a set of transactions are
added only, set of transactions are deleted only and both set of transactions are added and deleted from
the original database. For each case, we have identified a set of rules by identifying the potential opportunities to avoid unnecessary processing as compared to the rudimentary option of the processing from the scratch. Furthermore, it has been observed that, when a set of transactions are added and/or
deleted from the original database, if the order of items in the frequency list (Flist) of the database is
changed, additional patterns have to be validated and require additional processing. Based on the rules
and by considering the issue of change in the order of items in the Flist, we have proposed improved approaches to extract coverage patterns, by considering the case of (i) addition of transactions (ii) deletion of transactions and (iii) addition and deletion of transactions. We have also performed extensive experiments on two real click-stream datasets and one synthetic dataset and demonstrated the overall
effectiveness of our proposed approach.