Data Dimensionality Reduction for Data Mining: A Combined Filter-Wrapper Framework

Mirela Danubianu; Stefan Gheorghe Pentiuc; Dragos Mircea Danubianu

Authors

Mirela Danubianu "Stefan cel Mare" University of Suceava
Stefan Gheorghe Pentiuc "Stefan cel Mare" University of Suceava
Dragos Mircea Danubianu "Stefan cel Mare" University of Suceava

Keywords:

data mining, feature selection, filters, wrappers

Abstract

Knowledge Discovery in Databases aims to extract new, interesting and potential useful patterns from large amounts of data. It is a complex process whose central point is data mining, which effectively builds models from data. Data type, quality and dimensionality are some factors which affect performance of data mining task. Since the high dimensionality of data can cause some troubles, as data overload, a possible solution could be its reduction. Sampling and filtering reduce the number of cases in a dataset, whereas features reduction can be achieved by feature selection. This paper aims to present a combined method for feature selection, where a filter based on correlation is applied on whole features set to find the relevant ones, and then, on these features a wrapper is applied in order to find the best features subset for a specified predictor. It is also presented a case study for a data set provided by TERAPERS a personalized speech therapy system.

Author Biography

Mirela Danubianu, "Stefan cel Mare" University of Suceava

Department of Mathematics and Computer Science

References

Danubianu M., Pentiuc S.G., Tobolcea I., Schipor O.A., Advanced Information Technology - Support of Improved Personalized Therapy of Speech Disorders, INT J COMPUT COMMUN, ISSN 1841-9836, 5(5): 684-692, 2010.

Kohavi R., John G., Wrappers for feature subset selection, Artificial Intelligence, Special issue on relevance, 97(1-2):273-324, 1997.

Hall, M., Correlation-based feature selection for discrete and numeric class machine learning, Proc. of International Conference on Machine Learning, 359-365, Morgan Kaufmann, 2000.

Douik A., Abdellaoui M., Cereal Grain Classification by Optimal Features and Intelligent Classifiers, INT J COMPUT COMMUN, ISSN 1841-9836, 5(4):506-516, 2010.

Peng H. Long F., Ding C., Feature Selection based on mutual Information: Criteria of Max- Dependency, Max-Relevance and Min-Redundancy, IEEE Transaction on Pattern Analysis and Machine Intelligence, 27(8):1226 - 1238, 2005. http://dx.doi.org/10.1109/TPAMI.2005.159

John G.H., Kohavi R., Pfleger P., Irrelevant features and the subset selection problem, Machine Learning: Proceedings of the Eleventh International Conference, 121-129, Morgan Kaufman, 1994.

Gennari J.H., Langley P., Fisher D., Models of incremental concept formation, Artificial Intelligence, (40):11-16, 1989. http://dx.doi.org/10.1016/0004-3702(89)90046-5

Quinlan J.R., C4.5: Programs for Machine Learning, Morgan Kaufman, 1993.

Yu, L., Liu, H., Efficient Feature Selection via Analysis of Relevance and Redundancy, Journal of Machine Learning Research, 5:1205-1224, 2005 .

Danubianu M., Pentiuc St. Gh., Socaciu T., Towards the Optimized Personalized Therapy of Speech Disorders by Data Mining Techniques, The Fourth International Multi Conference on Computing in the Global Information Technology ICCGI 2009, Vol: CD, 23-29 August, Cannes - La Bocca, France, 2009.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I., The WEKA Data Mining Software: An Update, SIGKDD Explorations, 11(1):10-18, 2009. http://dx.doi.org/10.1145/1656274.1656278

Data Dimensionality Reduction for Data Mining: A Combined Filter-Wrapper Framework

Authors

Keywords:

Abstract

Author Biography

Mirela Danubianu, "Stefan cel Mare" University of Suceava

References

Published

Issue

Section

License

Most read articles by the same author(s)