Efficient Historical Query in HBase for Spatio-Temporal Decision Support

Authors

  • Xiaoying Chen National University of Defense Technology
  • Chong Zhang National University of Defense Technology
  • Bin Ge National University of Defense Technology
  • Weidong Xiao National University of Defense Technology

Keywords:

spatio-temporal query, HBase, range query, kNN query, GNN query, load balancing

Abstract

Comparing to last decade, technologies to gather spatio-temporal data are more and more developed and easy to use or deploy, thus tens of billions, even trillions of sensed data are accumulated, which poses a challenge to spatio-temporal Decision Support System (stDSS). Traditional database hardly supports such huge volume, and tends to bring performance bottleneck to the analysis platform. Hence in this paper, we argue to use NoSQL database, HBase, to replace traditional back-end storage system. Under such context, the well-studied spatio-temporal querying techniques in traditional database should be shifted to HBase system parallel. However, this problem is not solved well in HBase, as many previous works tackle the problem only by designing schema, i.e., designing row key and column key formation for HBase, which we don’t believe is an effective solution. In this paper, we address this problem from nature level of HBase, and propose an index structure as a built-in component for HBase. STEHIX (Spatio-TEmporal Hbase IndeX) is adapted to two-level architecture of HBase and suitable for HBase to process spatio-temporal queries. It is composed of index in the meta table (the first level) and region index (the second level) for indexing inner structure of HBase regions. Base on this structure, three queries, range query, kNN query and GNN query are solved by proposing algorithms, respectively. For achieving load balancing and scalable kNN query, two optimizations are also presented. We implement STEHIX and conduct experiments on real dataset, and the results show our design outperforms a previous work in many aspects.

References

Van Orshoven et al. (2011), Upgrading geographic information systems to spatio-temporal decision support systems, Mathematical and Computational Forestry & Natural Resource Sciences, 3(1): 36-41.

Wiki, H. HBase: bigtable-like structured storage for Hadoop HDFS. 2012-02-23)[2012-04- 17]. http://wiki. apache, org/hadoop/Hbase.

Ralph Kimball, Margy Ross (1996), The data warehouse toolkit, Wiley.

Ralph Kimball, Margy Ross (2012), The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Edition, Wiley.

Nishimura, S., Das, S., Agrawal, D., Abbadi, A. E. (2011, June). MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In Mobile Data Management (MDM), 2011 12th IEEE International Conference on, 1: 7-16.

Hsu, Y. T., Pan, Y. C., Wei, L. Y., Peng, W. C., Lee, W. C. (2012), Key formulation schemes for spatial index in cloud data managements. In Mobile Data Management (MDM), 2012 IEEE 13th International Conference on, 21-26. http://dx.doi.org/10.1109/MDM.2012.67

Zhou, X., Zhang, X., Wang, Y., Li, R., Wang, S. (2013), Efficient distributed multidimensional index for big data management. In Web-Age Information Management, Springer Berlin Heidelberg, 130-141. http://dx.doi.org/10.1007/978-3-642-38562-9_14

Han, D., & Stroulia, E. (2013), Hgrid: A data model for large geospatial data sets in hbase. In Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on, 910-917. http://dx.doi.org/10.1109/CLOUD.2013.78

Zhang, N., Zheng, G., Chen, H., Chen, J., Chen, X. (2014). Hbasespatial: A scalable spatial data storage based on hbase. In Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on, 644-651. http://dx.doi.org/10.1109/trustcom.2014.83

Du, N., Zhan, J., Zhao, M., Xiao, D., & Xie, Y. (2015), Spatio-Temporal Data Index Model of Moving Objects on Fixed Networks Using HBase, In Computational Intelligence & Communication Technology (CICT), 2015 IEEE International Conference on, 247-251.

HBase, A. (2012), Apache hbase reference guide. Webpage available at http://wiki. apache. org/hadoop/Hbase/HbaseArchitecture. Webpage visited, 04-04.

George, L. (2011). HBase: the definitive guide, O'Reilly Media, Inc.

Faloutsos, C., Roseman, S. (1989), Fractals for secondary key retrieval, Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, 247-252. http://dx.doi.org/10.1145/73721.73746

Wang, J., Wu, S., Gao, H., Li, J., Ooi, B. C. (2010), Indexing multi-dimensional data in a cloud system. Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 591-602. http://dx.doi.org/10.1145/1807167.1807232

Hjaltason, G. R., Samet, H. (1999), Distance browsing in spatial databases, ACM Transactions on Database Systems (TODS), 24(2): 265-318. http://dx.doi.org/10.1145/320248.320255

Roussopoulos, N., Kelley, S., Vincent, F. (1995). Nearest neighbor queries. In ACM sigmod record, 24(2):71-79. http://dx.doi.org/10.1145/568271.223794

Vu, Q. H., Ooi, B. C., Rinard, M., Tan, K. L. (2009), Histogram-based global load balancing in structured peer-to-peer systems, Knowledge and Data Engineering, IEEE Transactions on, 21(4): 595-608.

Hochreiter, S., Younger, A. S., Conwell, P. R. (2001), Learning to Learn Using Gradient Descent. Artificial Neural Networks-ICANN 2001, Springer Berlin Heidelberg.

Published

2016-08-31

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.