Mining Temporal Sequential Patterns Based on Multi-granularities

Naiqian Li, Xinhui Yao, Dongpin Tian

Abstract


Sequential pattern mining is an important data mining problem that can extract frequent subsequences from sequences. However, the times between successive items in a sequence is typically used as user-specified constraints to pre-process the input data or to prune the pattern search space. In either cases, the times cannot be used to identify item intervals of sequential patterns. In this paper, we introduce a form of multi-granularity sequence patterns, which is a sequential pattern where each transition time is annotated with multi-granularity boundary interval and average time derived from the source data rather than the user-predetermined time interval or only a typical time. Then we present a novel algorithm, MG-PrefixSpan, of multiple granularity sequential patterns based on PrefixSpan[, which discovers all such patterns. Empirical evaluation shows that MG-PrefixSpan scales up linearly as the size of database, and has a good scalability with respect to the length of sequence and the size of transaction.


Keywords


Data Mining Algorithm, Sequential Pattern Mining, Sequential Data, Time Granularity, Temporal Patterns

Full Text:

PDF

References


Constantinescu, Z, Marinoiu, C, Vladoiu, M, "Driving Style Analysis Using Data Mining Techniques," International Journal Of Computers Communications and Control, Vol.5, No.5, pp. 654- 663, Dec 2010.

Andonie, R, "Extreme Data Mining: Inference from Small Datasets," International Journal Of Computers Communications and Control, Vol.5, No.3, pp. 280-291, Sep 2010.

R. Agrawal and R. Srikant, "Mining sequential patterns," Proc. of the 7th International Conference on Data Engineering (ICDE'95), pp. 3-14, March, 1995.
http://dx.doi.org/10.1109/ICDE.1995.380415

R. Srikant and R. Agrawal, "Mining sequential patterns: Generalizations and performance improvements," Proc. of the 5th International Conference on Extending Database Technology, pp. 3-17, March, 1996.

M. Zaki, "An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, Vol. 40, pp. 31-60, 2000.

J. Pei, J. Han and H. Pinto et al, "Mining Sequential Pattern-Growth: The PrefixSpan Approach," IEEE Transactions on Knowledge and Engineering, Vol.16, No.11, pp. 1424-1440, 2004.
http://dx.doi.org/10.1109/TKDE.2004.77

H. Manila, H. Toivonen, and A.I. Verkamo, "Discovery of frequent episodes in event sequences," Data Mining and Knowledge Discovery, Vol. 1, No. 3, pp. 256-289, 1997.

M.N. Garofalakis, R. Rastogi and K. Shim, "SPIRIT: Sequential Pattern Mining with Regular Expression Constraints," Proc. of the 25th International Conference on Very Large Data Bases (VLDB'99), pp. 223-234, September, 1999.

J. Pei, J. Han and W. Wang, "Constraint-based Sequential Pattern Mining: The Pattern-growth Methods," Journal of Intelligent Information Systems, Volume 28, Issue 2, pp. 133-160, April, 2007.
http://dx.doi.org/10.1007/s10844-006-0006-z

M. Yoshida et al. "Mining sequential patterns including time intervals", Proc. of SPIE Conf.- DMKD, pp. 213-220, April, 2000.

Y.-L. Chen, M.-C. Chiang and M.-T. Ko, "Discovering time-interval sequential patterns in sequence databases," Expert System with Applications, Volume 25, Issue3, Pp. 343-354, October, 2003.
http://dx.doi.org/10.1016/S0957-4174(03)00075-7

R. Algawal and R. Srikant, "Fast algorithm for mining association rules in Large Databases.," Proc. of the 20th International Conference on Very Large Data bases (VLDB'94), pp. 487-499, September 1994.

Y. Hirate, H. Yamana, "Generalized Pattern Mining with Item Intervals," Journal of Computers, Vol.1, No3, pp. 51-60, June, 2006.
http://dx.doi.org/10.4304/jcp.1.3.51-60

F. Giannotti, M. Nanni, and D. Pedreschi. "Efficient Mining of Temporally Annotated Sequences," Proc. of the 6th SIAM International Conference on Data Mining, pp. 346-357, April, 2006.

C. Bettini, X.S. Wang and S. Jajodia et al, "Discovering Temporal Relationships with Multiple Granularities in Time Sequences," IEEE Transactions on Knowledge and Data Engineering, Vol. 10 (2), pp. 222-237, 1998.
http://dx.doi.org/10.1109/69.683754




DOI: https://doi.org/10.15837/ijccc.2012.3.1390



Copyright (c) 2017 Naiqian Li, Xinhui Yao, Dongpin Tian

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2);

SCImago Journal & Country Rank

Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.