Lossless Compression of Data Tables in Mobile Devices by Using Co-clustering

Authors

  • Bo Han Wuhan University
  • Bolang Li International School of Software, Wuhan University 37 Luoyu Road, Wuhan, Hubei, China, 430079

Keywords:

data tables, lossless compression, co-clustering, redundancy

Abstract

Data tables have been widely used for storage of a collection of related records in a structured format in many mobile applications. The lossless compression of data tables not only brings benefits for storage, but also reduces network transmission latencies and energy costs in batteries. In this paper, we propose a novel lossless compression approach by combining co-clustering and information coding theory. It reorders table columns and rows simultaneously for shaping homogeneous blocks and further optimizes alignment within a block to expose redundancy, such that standard lossless encoders can significantly improve compression ratios. We tested the approach on a synthetic dataset and ten UCI real-life datasets by using a standard compressor 7Z. The extensive experimental results suggest that compared with the direct table compression without co-clustering and within-block alignment, our approach can boost compression rates at least 21% and up to 68%. The results also show that the compression time cost of the co-clustering approach is linearly proportional to a data table size. In addition, since the inverse transform of co-clustering is just exchange of rows and columns according to recorded indexes, the decompression procedure runs very fast and the decompression time cost is similar to the counterpart without using co-clustering. Thereby, our approach is suitable for lossless compression of data tables in mobile devices with constrained resources.

Author Biography

Bo Han, Wuhan University

Associate Professor

International School of Software

References

Rao S., Bhat P. (2015); Evaluation of lossless compression techniques, IEEE International Conference on Communications and Signal Processing (ICCSP), 1655-1659. http://dx.doi.org/10.1109/iccsp.2015.7322799

Kontoyiannis I., Verdu S. (2014); Optimal lossless data compression: Non-asymptotics and asymptotics, IEEE Transactions on Information Theory, 60(2): 777-795. http://dx.doi.org/10.1109/TIT.2013.2291007

D. Salomon (2007); Data Compression: The Complete Reference, 4th ed. New York: Springer, 2007.

K. Sayood (2000); Introduction to Data Compression, 2nd ed. San Francisco, CA: Morgan Kaufmann, 2000.

Mielikainen J., Huang B.(2012); Lossless compression of hyperspectral images using clustered linear prediction with adaptive prediction length, IEEE Geoscience and Remote Sensing Letters, 9(6): 1118-1121. http://dx.doi.org/10.1109/LGRS.2012.2191531

Venugopal D., Mohan S., Raja S.(2016); An efficient block based lossless compression of medical images, Optik-International Journal for Light and Electron Optics, 127(2): 754-758.

Patauner C. et al. (2011); A lossless data compression system for a real-time application in HEP data acquisition, IEEE Transactions on Nuclear Science, 58(4): 1738-1744. http://dx.doi.org/10.1109/TNS.2011.2142193

Kolo J.G., et al. (2012); An adaptive lossless data compression scheme for wireless sensor networks, Journal of Sensors, vol. 2012, Article ID 539638, 1-20.

Kolo J.G. et al. (2015); Fast and efficient lossless adaptive compression scheme for wireless sensor networks, Computers & Electrical Engineering, 41: 275-287. http://dx.doi.org/10.1016/j.compeleceng.2014.06.008

A.L. Buchsbaum et al. (2000); Engineering the compression of massive tables: an experimental approach, Proceedings of the ACM-SIAM Annual Symposium on Discrete Algorithms, San Francisco, CA, 213-222.

A.L. Buchsbaum, G.S. Fowler, R. Giancarlo (2002); Improving table compression with combinatorial optimization, Proceedings of the ACM-SIAMAnnual Symposium on Discrete Algorithms, San Francisco, CA, 175-184.

Q. Yang, S. Lonardi (2005); A compression-boosting transform for 2D data, Data Compression Conference (DCC'05), 492-492.

H. Shan, A. Banerjee (2008); Bayesian co-clustering, Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08), 530-539.

F. Pan, X. Zhang, W. Wang (2008); CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'08), 173-184.

A. Banerjee et al. (2007); A generalized maximum entropy approach to Bregman coclustering and matrix approximation, Journal of Machine Learning Research, 1919-1986.

I.S. Dhillon, S. Mallela, D.S. Modha (2003); Information-theoretic co-clustering, Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03), 89-98.

R.G. Pensa, J.F. Boulicaut, F. Cordero, M. Atzori (2010); Co-clustering numerical data under user-defined constraints, Statistical Analysis and Data Mining,3(1): 38-55.

R.G. Pensa, J.F. Boulicaut (2008); Constrained co-clustering of gene expression data, Proceedings in Applied Mathematics 130, 8th SIAM International Conference on Data Mining 2008, 1:25-36.

Wu X., Zurita-Milla R., Kraak M.J. (2015); Co-clustering geo-referenced time series: exploring spatio-temporal patterns in Dutch temperature data, International Journal of Geographical Information Science, 29(4): 624-642. http://dx.doi.org/10.1080/13658816.2014.994520

http://www.7-zip.org

http://en.wikipedia.org/wiki/7z

F. Coenen, The LUCS-KDD Discretised/normalized ARM and CARM Data Library, In http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS-KDD-DN/DataSets/dataSets.html

Published

2016-10-17

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.