My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Wiki pages

ApproxMAP : Approximate Sequential Pattern Mining via Multiple Alignment

Sequential pattern mining is an important data mining task with broad applications. Conventional methods meet inherent difficulties in mining databases with long sequences and noise. They may generate a huge number of short and trivial patterns but fail to find the interesting underlying patterns. To attack these problems, in this project we propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences.

We present an efficient and effective algorithm, ApproxMAP (APPROXimate Multiple Alignment Pattern mining), to mine approximate consensus sequential patterns from large databases. The method works in three steps. First, sequences are clustered by similarity. Second, the clusters are compressed into weighted sequences through multiple alignment. Third, the longest underlying pattern best fitting each cluster is generated from the weighted sequences. Our extensive experimental results on synthetic and real data show that ApproxMAP is very robust to noise and does well in mapping the high dimensional noisy data into approximate sequential patterns.

See wiki for tutorial of the method.

Powered by Google Project Hosting