Weighted Random Search for Hyperparameter Optimization

  • Adrian-Catalin Florea Department of Electronics and Computers Transilvania University of Brasov, Romania
  • Razvan Andonie Central Washington University

Abstract

We introduce an improved version of Random Search (RS), used here for hyperparameter optimization of machine learning algorithms. Unlike the standard RS, which generates for each trial new values for all hyperparameters, we generate new values for each hyperparameter with a probability of change. The intuition behind our approach is that a value that already triggered a good result is a good candidate for the next step, and should be tested in new combinations of hyperparameter values. Within the same computational budget, our method yields better results than the standard RS. Our theoretical results prove this statement. We test our method on a variation of one of the most commonly used objective function for this class of problems (the Grievank function) and for the hyperparameter optimization of a deep learning CNN architecture. Our results can be generalized to any optimization problem dened on a discrete domain.

References

[1] Albelwi, S.; Mahmood, A. (2017). A framework for designing the architectures of deep convolutional neural networks. Entropy, 19, 6 (2017).

[2] Bergstra, J.; Bardenet, R.; Bengio, Y.; Kegl, B. (2011). Algorithms for hyper-parameter optimization. In NIPS (2011), J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, Eds., 2546-2554, 2011.

[3] Bergstra, J.; and Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281-305, 2012.

[4] Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; and Cox, D. D. (2015). Hyperopt: a Python library for model selection and hyperparameter optimization. Computational Science and Discovery, 8(1), 014008, 2015.
https://doi.org/10.1088/1749-4699/8/1/014008

[5] Chang, C.-C.; Lin, C.-J. (2011). LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2, 2(3), 27:1-27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
https://doi.org/10.1145/1961189.1961199

[6] Chollet, F., et al. (2015). Keras. https://keras.io, 2015.

[7] Claesen, M.; Simm, J.; Popovic, D.; Moreau, Y.; Moor, B. D. (2014). Easy Hyperparameter Search Using Optunity, CoRR abs/1412.1114 , 2014.

[8] Domhan, T.; Springenberg, J. T.; Hutter, F. (2015). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves, In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, AAAI Press, 3460- 3468, 2015.

[9] Florea, A. C.; Andonie, R. (2018). A Dynamic Early Stopping Criterion for Random Search in SVM Hyperparameter Optimization, In 14th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI) (Rhodes, Greece, May 2018), L. Iliadis, I. Maglogiannis, and V. Plagianakos, Eds., vol. AICT-519 of Artificial Intelligence Applications and Innovations, Springer International Publishing, Part 3: Support Vector Machines, 168-180, 2018.
https://doi.org/10.1007/978-3-319-92007-8_15

[10] Griewank, A. (1981). Generalized decent for global optimization, Journal of Optimization Theory and Applications, 34, 11-39, 1981.
https://doi.org/10.1007/BF00933356

[11] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H. (2009). The WEKA data mining software: An update, SIGKDD Explor. Newsl., 11(1), 10-18, 2009.
https://doi.org/10.1145/1656274.1656278

[12] Hansen, N.; Müller, S. D.; Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es), Evol. Comput., 11(1), 1-18, 2003.
https://doi.org/10.1162/106365603321828970

[13] He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, 2016.
https://doi.org/10.1109/CVPR.2016.90

[14] Hinton, G. E. (2012). A practical guide to training Restricted Boltzmann Machines, In Neural Networks: Tricks of the Trade (2nd ed.), G. Montavon, G. B. Orr, and K.-R. MAzller, Eds., vol. 7700 of Lecture Notes in Computer Science. Springer, 599-619, 2012.

[15] Hutter, F.; Hoos, H.; Leyton-Brown, K. (2014). An efficient approach for assessing hyperparameter importance, In Proceedings of International Conference on Machine Learning 2014 (ICML 2014), 754-762, 2014.

[16] Kennedy, J.; Eberhart, R. C. (1995). Particle swarm optimization, In Proceedings of the IEEE International Conference on Neural Networks, 1942-1948, 1995.
https://doi.org/10.1109/ICNN.1995.488968

[17] Kirkpatrick, S. (1984). Optimization by simulated annealing: Quantitative studies, Journal of Statistical Physics, 34(5), 975-986, 1984.
https://doi.org/10.1007/BF01009452

[18] Kotthoff, L.; Thornton, C.; Hoos, H. H.; Hutter, F.; Leyton-Brown, K. (2017). Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research, 18(25), 1-5, 2017.

[19] Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks, In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (USA, 2012), NIPS'12, Curran Associates Inc., 1097-1105, 2012.

[20] LeCun, Y.; Bottou, L.; Orr, G.; MAzller, K. (2012). Efficient Backprop, vol. 7700 LECTURE NO of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, 9-48, 2012.

[21] Lemley, J.: Jagodzinski, F.; Andonie, R. (2016). Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data, In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 1, 563-571, 2016.

[22] Li, L.; Jamieson, K. G.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. (2016). Efficient hyperparameter optimization and infinitely many armed bandits. CoRR abs/1603.06560 , 2016.

[23] Nair, V.; Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines, In Proceedings of the 27th International Conference on International Conference on Machine Learning (USA, 2010), ICML'10, Omnipress, 807-814, 2010.

[24] Nelder, J. A.; Mead, R. (1965). A Simplex Method for Function Minimization, Computer Journal, 7, 308-313, 1965.
https://doi.org/10.1093/comjnl/7.4.308

[25] Patterson, J.; Gibson, A. (2017). Deep Learning: A Practitioner's Approach, 1st ed. O'Reilly Media, Inc., 2017.

[26] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. (2011). Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830, 2011.

[27] Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; de Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104, 148-175, 2016.
https://doi.org/10.1109/JPROC.2015.2494218

[28] Smusz, S.; Czarnecki, W. M.; Warszycki, D.; Bojarski, A. J. (2015). Exploiting uncertainty measures in compounds activity prediction using support vector machines, Bioorganic & medicinal chemistry letters, 25(1), 100-105, 2015.
https://doi.org/10.1016/j.bmcl.2014.11.005

[29] Snoek, J.; Larochelle, H.; Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms, In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2951-2959, 2012.

[30] Sobol, I. (1976). Uniformly distributed sequences with an additional uniform property, USSR Computational Mathematics and Mathematical Physics, 16(5), 236 - 242, 1976.
https://doi.org/10.1016/0041-5553(76)90154-3

[31] Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. (2015). Going deeper with convolutions, In Computer Vision and Pattern Recognition (CVPR), 2015.

[32] Thornton, C.; Hutter, F.; Hoos, H. H.; Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

[33] Zeiler, M. D.; Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision - ECCV 2014 (Cham, 2014), D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., Springer International Publishing, 818-833, 2014.
https://doi.org/10.1007/978-3-319-10590-1_53

[34] Zoph, B.; Le, Q. V. (2016). Neural architecture search with reinforcement learning, CoRR abs/1611.01578 (2016).

[35] Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q. V. (2017). Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 , 2017.

[36] [Online]. Bigml; BigML, Inc. https://bigml.com/ Accessed: 2019-01-10.

[37] [Online]. Cifar 10; Krizhevsky, A., Nair, V., and Hinton, G. https://www.cs.toronto.edu/~kriz/cifar.html Accessed: 2019-01-10.

[38] [Online]. Google HyperTune, Google. https://cloud.google.com/ml-engine/docs/ tensorflow/using-hyperparameter-tuning Accessed: 2019-01-10.

[39] [Online]. Optunity, http://optunity.readthedocs.io/en/latest/. Accessed: 2019-01-10.

[40] [Online]. SigOpt, SigOpt. https://sigopt.com/ Sigopt. Accessed: 2019-01-10.
Published
2019-04-14
How to Cite
FLOREA, Adrian-Catalin; ANDONIE, Razvan. Weighted Random Search for Hyperparameter Optimization. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, [S.l.], v. 14, n. 2, p. 154-169, apr. 2019. ISSN 1841-9844. Available at: <http://univagora.ro/jour/index.php/ijccc/article/view/3514>. Date accessed: 04 july 2020. doi: https://doi.org/10.15837/ijccc.2019.2.3514.

Keywords

hyperparameter optimization, random search, deep learning, neural networks