Efficient Historical Query in HBase for Spatio-Temporal Decision Support
Keywords:spatio-temporal query, HBase, range query, kNN query, GNN query, load balancing
Comparing to last decade, technologies to gather spatio-temporal data are more and more developed and easy to use or deploy, thus tens of billions, even trillions of sensed data are accumulated, which poses a challenge to spatio-temporal Decision Support System (stDSS). Traditional database hardly supports such huge volume, and tends to bring performance bottleneck to the analysis platform. Hence in this paper, we argue to use NoSQL database, HBase, to replace traditional back-end storage system. Under such context, the well-studied spatio-temporal querying techniques in traditional database should be shifted to HBase system parallel. However, this problem is not solved well in HBase, as many previous works tackle the problem only by designing schema, i.e., designing row key and column key formation for HBase, which we don’t believe is an effective solution. In this paper, we address this problem from nature level of HBase, and propose an index structure as a built-in component for HBase. STEHIX (Spatio-TEmporal Hbase IndeX) is adapted to two-level architecture of HBase and suitable for HBase to process spatio-temporal queries. It is composed of index in the meta table (the first level) and region index (the second level) for indexing inner structure of HBase regions. Base on this structure, three queries, range query, kNN query and GNN query are solved by proposing algorithms, respectively. For achieving load balancing and scalable kNN query, two optimizations are also presented. We implement STEHIX and conduct experiments on real dataset, and the results show our design outperforms a previous work in many aspects.
Van Orshoven et al. (2011), Upgrading geographic information systems to spatio-temporal decision support systems, Mathematical and Computational Forestry & Natural Resource Sciences, 3(1): 36-41.
Wiki, H. HBase: bigtable-like structured storage for Hadoop HDFS. 2012-02-23)[2012-04- 17]. http://wiki. apache, org/hadoop/Hbase.
Ralph Kimball, Margy Ross (1996), The data warehouse toolkit, Wiley.
Ralph Kimball, Margy Ross (2012), The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Edition, Wiley.
Nishimura, S., Das, S., Agrawal, D., Abbadi, A. E. (2011, June). MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In Mobile Data Management (MDM), 2011 12th IEEE International Conference on, 1: 7-16.
Hsu, Y. T., Pan, Y. C., Wei, L. Y., Peng, W. C., Lee, W. C. (2012), Key formulation schemes for spatial index in cloud data managements. In Mobile Data Management (MDM), 2012 IEEE 13th International Conference on, 21-26. http://dx.doi.org/10.1109/MDM.2012.67
Zhou, X., Zhang, X., Wang, Y., Li, R., Wang, S. (2013), Efficient distributed multidimensional index for big data management. In Web-Age Information Management, Springer Berlin Heidelberg, 130-141. http://dx.doi.org/10.1007/978-3-642-38562-9_14
Han, D., & Stroulia, E. (2013), Hgrid: A data model for large geospatial data sets in hbase. In Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on, 910-917. http://dx.doi.org/10.1109/CLOUD.2013.78
Zhang, N., Zheng, G., Chen, H., Chen, J., Chen, X. (2014). Hbasespatial: A scalable spatial data storage based on hbase. In Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on, 644-651. http://dx.doi.org/10.1109/trustcom.2014.83
Du, N., Zhan, J., Zhao, M., Xiao, D., & Xie, Y. (2015), Spatio-Temporal Data Index Model of Moving Objects on Fixed Networks Using HBase, In Computational Intelligence & Communication Technology (CICT), 2015 IEEE International Conference on, 247-251.
HBase, A. (2012), Apache hbase reference guide. Webpage available at http://wiki. apache. org/hadoop/Hbase/HbaseArchitecture. Webpage visited, 04-04.
George, L. (2011). HBase: the definitive guide, O'Reilly Media, Inc.
Faloutsos, C., Roseman, S. (1989), Fractals for secondary key retrieval, Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, 247-252. http://dx.doi.org/10.1145/73721.73746
Wang, J., Wu, S., Gao, H., Li, J., Ooi, B. C. (2010), Indexing multi-dimensional data in a cloud system. Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 591-602. http://dx.doi.org/10.1145/1807167.1807232
Hjaltason, G. R., Samet, H. (1999), Distance browsing in spatial databases, ACM Transactions on Database Systems (TODS), 24(2): 265-318. http://dx.doi.org/10.1145/320248.320255
Roussopoulos, N., Kelley, S., Vincent, F. (1995). Nearest neighbor queries. In ACM sigmod record, 24(2):71-79. http://dx.doi.org/10.1145/568271.223794
Vu, Q. H., Ooi, B. C., Rinard, M., Tan, K. L. (2009), Histogram-based global load balancing in structured peer-to-peer systems, Knowledge and Data Engineering, IEEE Transactions on, 21(4): 595-608.
Hochreiter, S., Younger, A. S., Conwell, P. R. (2001), Learning to Learn Using Gradient Descent. Artificial Neural Networks-ICANN 2001, Springer Berlin Heidelberg.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.