|
Project Information
Featured
|
ApproxMAP : Approximate Sequential Pattern Mining via Multiple Alignment Sequential pattern mining is an important data mining task with broad applications. Conventional methods meet inherent difficulties in mining databases with long sequences and noise. They may generate a huge number of short and trivial patterns but fail to find the interesting underlying patterns. To attack these problems, in this project we propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. We present an efficient and effective algorithm, ApproxMAP (APPROXimate Multiple Alignment Pattern mining), to mine approximate consensus sequential patterns from large databases. The method works in three steps. First, sequences are clustered by similarity. Second, the clusters are compressed into weighted sequences through multiple alignment. Third, the longest underlying pattern best fitting each cluster is generated from the weighted sequences. Our extensive experimental results on synthetic and real data show that ApproxMAP is very robust to noise and does well in mapping the high dimensional noisy data into approximate sequential patterns. See wiki for tutorial of the method. |