Weighted Random Search for Hyperparameter Optimization

Authors

  • Adrian-Catalin Florea Department of Electronics and Computers Transilvania University of Brasov, Romania
  • Razvan Andonie Central Washington University

Keywords:

hyperparameter optimization, random search, deep learning, neural networks

Abstract

We introduce an improved version of Random Search (RS), used here for hyperparameter optimization of machine learning algorithms. Unlike the standard RS, which generates for each trial new values for all hyperparameters, we generate new values for each hyperparameter with a probability of change. The intuition behind our approach is that a value that already triggered a good result is a good candidate for the next step, and should be tested in new combinations of hyperparameter values. Within the same computational budget, our method yields better results than the standard RS. Our theoretical results prove this statement. We test our method on a variation of one of the most commonly used objective function for this class of problems (the Grievank function) and for the hyperparameter optimization of a deep learning CNN architecture. Our results can be generalized to any optimization problem dened on a discrete domain.

References

Albelwi, S.; Mahmood, A. (2017). A framework for designing the architectures of deep convolutional neural networks. Entropy, 19, 6 (2017).

Bergstra, J.; Bardenet, R.; Bengio, Y.; Kegl, B. (2011). Algorithms for hyper-parameter optimization. In NIPS (2011), J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, Eds., 2546-2554, 2011.

Bergstra, J.; and Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281-305, 2012.

Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; and Cox, D. D. (2015). Hyperopt: a Python library for model selection and hyperparameter optimization. Computational Science and Discovery, 8(1), 014008, 2015. https://doi.org/10.1088/1749-4699/8/1/014008

Chang, C.-C.; Lin, C.-J. (2011). LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2, 2(3), 27:1-27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. https://doi.org/10.1145/1961189.1961199

Chollet, F., et al. (2015). Keras. https://keras.io, 2015.

Claesen, M.; Simm, J.; Popovic, D.; Moreau, Y.; Moor, B. D. (2014). Easy Hyperparameter Search Using Optunity, CoRR abs/1412.1114 , 2014.

Domhan, T.; Springenberg, J. T.; Hutter, F. (2015). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves, In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, AAAI Press, 3460- 3468, 2015.

Florea, A. C.; Andonie, R. (2018). A Dynamic Early Stopping Criterion for Random Search in SVM Hyperparameter Optimization, In 14th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI) (Rhodes, Greece, May 2018), L. Iliadis, I. Maglogiannis, and V. Plagianakos, Eds., vol. AICT-519 of Artificial Intelligence Applications and Innovations, Springer International Publishing, Part 3: Support Vector Machines, 168-180, 2018. https://doi.org/10.1007/978-3-319-92007-8_15

Griewank, A. (1981). Generalized decent for global optimization, Journal of Optimization Theory and Applications, 34, 11-39, 1981. https://doi.org/10.1007/BF00933356

Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H. (2009). The WEKA data mining software: An update, SIGKDD Explor. Newsl., 11(1), 10-18, 2009. https://doi.org/10.1145/1656274.1656278

Hansen, N.; Müller, S. D.; Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es), Evol. Comput., 11(1), 1-18, 2003. https://doi.org/10.1162/106365603321828970

He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, 2016. https://doi.org/10.1109/CVPR.2016.90

Hinton, G. E. (2012). A practical guide to training Restricted Boltzmann Machines, In Neural Networks: Tricks of the Trade (2nd ed.), G. Montavon, G. B. Orr, and K.-R. MAzller, Eds., vol. 7700 of Lecture Notes in Computer Science. Springer, 599-619, 2012.

Hutter, F.; Hoos, H.; Leyton-Brown, K. (2014). An efficient approach for assessing hyperparameter importance, In Proceedings of International Conference on Machine Learning 2014 (ICML 2014), 754-762, 2014.

Kennedy, J.; Eberhart, R. C. (1995). Particle swarm optimization, In Proceedings of the IEEE International Conference on Neural Networks, 1942-1948, 1995. https://doi.org/10.1109/ICNN.1995.488968

Kirkpatrick, S. (1984). Optimization by simulated annealing: Quantitative studies, Journal of Statistical Physics, 34(5), 975-986, 1984. https://doi.org/10.1007/BF01009452

Kotthoff, L.; Thornton, C.; Hoos, H. H.; Hutter, F.; Leyton-Brown, K. (2017). Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research, 18(25), 1-5, 2017.

Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks, In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (USA, 2012), NIPS'12, Curran Associates Inc., 1097-1105, 2012.

LeCun, Y.; Bottou, L.; Orr, G.; MAzller, K. (2012). Efficient Backprop, vol. 7700 LECTURE NO of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, 9-48, 2012.

Lemley, J.: Jagodzinski, F.; Andonie, R. (2016). Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data, In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 1, 563-571, 2016.

Li, L.; Jamieson, K. G.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. (2016). Efficient hyperparameter optimization and infinitely many armed bandits. CoRR abs/1603.06560 , 2016.

Nair, V.; Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines, In Proceedings of the 27th International Conference on International Conference on Machine Learning (USA, 2010), ICML'10, Omnipress, 807-814, 2010.

Nelder, J. A.; Mead, R. (1965). A Simplex Method for Function Minimization, Computer Journal, 7, 308-313, 1965. https://doi.org/10.1093/comjnl/7.4.308

Patterson, J.; Gibson, A. (2017). Deep Learning: A Practitioner's Approach, 1st ed. O'Reilly Media, Inc., 2017.

Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. (2011). Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830, 2011.

Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; de Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104, 148-175, 2016. https://doi.org/10.1109/JPROC.2015.2494218

Smusz, S.; Czarnecki, W. M.; Warszycki, D.; Bojarski, A. J. (2015). Exploiting uncertainty measures in compounds activity prediction using support vector machines, Bioorganic & medicinal chemistry letters, 25(1), 100-105, 2015. https://doi.org/10.1016/j.bmcl.2014.11.005

Snoek, J.; Larochelle, H.; Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms, In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2951-2959, 2012.

Sobol, I. (1976). Uniformly distributed sequences with an additional uniform property, USSR Computational Mathematics and Mathematical Physics, 16(5), 236 - 242, 1976. https://doi.org/10.1016/0041-5553(76)90154-3

Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. (2015). Going deeper with convolutions, In Computer Vision and Pattern Recognition (CVPR), 2015.

Thornton, C.; Hutter, F.; Hoos, H. H.; Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Zeiler, M. D.; Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision - ECCV 2014 (Cham, 2014), D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., Springer International Publishing, 818-833, 2014. https://doi.org/10.1007/978-3-319-10590-1_53

Zoph, B.; Le, Q. V. (2016). Neural architecture search with reinforcement learning, CoRR abs/1611.01578 (2016).

Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q. V. (2017). Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 , 2017.

[Online]. Bigml; BigML, Inc. https://bigml.com/ Accessed: 2019-01-10.

[Online]. Cifar 10; Krizhevsky, A., Nair, V., and Hinton, G. https://www.cs.toronto.edu/~kriz/cifar.html Accessed: 2019-01-10.

[Online]. Google HyperTune, Google. https://cloud.google.com/ml-engine/docs/ tensorflow/using-hyperparameter-tuning Accessed: 2019-01-10.

[Online]. Optunity, http://optunity.readthedocs.io/en/latest/. Accessed: 2019-01-10.

[Online]. SigOpt, SigOpt. https://sigopt.com/ Sigopt. Accessed: 2019-01-10.

Published

2019-04-14

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.