Asymptotically Unbiased Estimation of A Nonsymmetric Dependence Measure Applied to Sensor Data Analytics and Financial Time Series
Keywords:machine learning, sensor data analytics, financial time series, statistical inference, information energy, nonsymmetric dependence measure, big data analytics
A fundamental concept frequently applied to statistical machine learning is the detection of dependencies between unknown random variables found from data samples. In previous work, we have introduced a nonparametric unilateral dependence measure based on Onicescu’s information energy and a kNN method for estimating this measure from an available sample set of discrete or continuous variables. This paper provides the formal proofs which show that the estimator is asymptotically unbiased and has asymptotic zero variance when the sample size increases. It implies that the estimator has good statistical qualities. We investigate the performance of the estimator for data analysis applications in sensor data analysis and financial time series.
Andonie R., CaÈ›aron A. (2004), An informational energy LVQ approach for feature ranking, European Symposium on Artificial Neural Networks 2004, pages In d-side publications, 471- 476, 2004.
Andonie R. (1986), Interacting systems and informational energy, Foundation of Control Engineering, 11, 53-59, 1986.
Bonachela J.A., Hinrichsen H., Miguel A. Munoz M.A. (2008), Entropy estimates of small data sets, MATH.THEOR., 41(20), 1-20, 2008.
CaÈ›aron A., Andonie R., Chueh Y. (2013), Asymptotically unbiased estimator of the informational energy with kNN, International Journal of Computers Communications & Control, 8(5), 689-698, 2013. https://doi.org/10.15837/ijccc.2013.5.643
CaÈ›aron A., Andonie R. (2012), How to infer the informational energy from small datasets, Optimization of Electrical and Electronic Equipment (OPTIM), 2012 13th International Conference on, 1065 -1070, 2012.
CaÈ›aron A., Andonie R., Chueh Y. (2014), kNN estimation of the unilateral dependency measure between random variables, 2014 IEEE Symposium on Computational Intelligence and Data Mining, (CIDM 2014), Orlando, FL, USA, 471-478, 2014.
CaÈ›aron A., Andonie R., Chueh Y. (2015), Financial data analysis using the informational energy unilateral dependency measure, Proceedings of the International Joint Conference on Neural Networks, (IJCNN 2015), Killarney, Ireland, 1-8, 2015. https://doi.org/10.1109/ijcnn.2015.7280734
Chueh Y., CaÂµaron A., Andonie R. (2016), Mortality rate modeling of joint lives and survivor insurance contracts tested by a novel unilateral dependence measure, 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016, Athens, Greece, December 6-9, 2016, 1-8, 2016. https://doi.org/10.1109/SSCI.2016.7850023
Faivishevsky L., Goldberger J. (2008), ICA based on a smooth estimation of the differential entropy, NIPS, 1-8, 2008.
Gamez J.E., Modave F., Kosheleva O. (2008), Selecting the most representative sample is NP-hard: Need for expert (fuzzy) knowledge, Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on Computational Intelligence). IEEE International Conference on, 1069-1074, 2008.
Guiasu S. (1977), Information theory with applications, McGraw Hill, New York, 1977.
Hogg R.V., McKean J., Allen T. Craig A.T. (2006), Introduction To Mathematical Statistics, 6/E, Pearson Education, 2006.
Kozachenko L. F., Leonenko N. N. (1987), Sample estimate of the entropy of a random vector, Probl. Peredachi Inf., 23(2), 9-16, 1987.
Kraskov A., StÃ¶gbauer H., Grassberger P. (2004), Estimating mutual information, Phys. Rev. E, 69, 1-16, 2004. https://doi.org/10.1103/PhysRevE.69.066138
Li H. (2015), On nonsymmetric nonparametric measures of dependence, arXiv:1502.03850, 2015.
Lohr H. (1999), Sampling: Design and Analysis, Duxbury Press, 1999.
Miller M., Miller M. (2003), John E. Freund's mathematical statistics with applications, Pearson/Prentice Hall, Upper Saddle River, New Jersey, 7th edition, 2003.
Onicescu O. (1966), Theorie de l'information. Energie informationelle, C. R. Acad. Sci. Paris, Ser. A-B, 263, 841-842, 1966.
Paninski L. (2003), Estimation of entropy and mutual information, Neural Comput., 15, 1191-1253, 2003. https://doi.org/10.1162/089976603321780272
Schweizer B., Wolff E. F. (1981), On nonparametric measures of dependence for random variables, Ann. Statist., 9:879-885, 1981. https://doi.org/10.1214/aos/1176345528
Silverman B.W. (1986), Density Estimation for Statistics and Data Analysis (Chapman & Hall/CRC Monographs on Statistics & Applied Probability), Chapman and Hall/CRC, 1986.
Singh H., Misra N., Hnizdo V., Fedorowicz A., Demchuk E. (2003), Nearest neightboor estimates of entropy, American Journal of Mathematical and Management Sciences, 23, 301-321, 2003. https://doi.org/10.1080/01966324.2003.10737616
Walters-Williams J., Li Y. (2009), Estimation of mutual information: A survey, Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, Springer- Verlag, Berlin, Heidelberg, 389-396, 2009. https://doi.org/10.1007/978-3-642-02962-2_49
Wang Q., Kulkarni S. R., Verdu S. (2006), A nearest-neighbor approach to estimating divergence between continuous random vectors, Proc. of the IEEE International Symposium on Information Theory, Seattle, WA, 242-246, 2006. https://doi.org/10.1109/isit.2006.261842
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.