Weighted Random Search for Hyperparameter Optimization
Keywords:
hyperparameter optimization, random search, deep learning, neural networksAbstract
We introduce an improved version of Random Search (RS), used here for hyperparameter optimization of machine learning algorithms. Unlike the standard RS, which generates for each trial new values for all hyperparameters, we generate new values for each hyperparameter with a probability of change. The intuition behind our approach is that a value that already triggered a good result is a good candidate for the next step, and should be tested in new combinations of hyperparameter values. Within the same computational budget, our method yields better results than the standard RS. Our theoretical results prove this statement. We test our method on a variation of one of the most commonly used objective function for this class of problems (the Grievank function) and for the hyperparameter optimization of a deep learning CNN architecture. Our results can be generalized to any optimization problem dened on a discrete domain.References
Albelwi, S.; Mahmood, A. (2017). A framework for designing the architectures of deep convolutional neural networks. Entropy, 19, 6 (2017).
Bergstra, J.; Bardenet, R.; Bengio, Y.; Kegl, B. (2011). Algorithms for hyper-parameter optimization. In NIPS (2011), J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, Eds., 2546-2554, 2011.
Bergstra, J.; and Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281-305, 2012.
Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; and Cox, D. D. (2015). Hyperopt: a Python library for model selection and hyperparameter optimization. Computational Science and Discovery, 8(1), 014008, 2015. https://doi.org/10.1088/1749-4699/8/1/014008
Chang, C.-C.; Lin, C.-J. (2011). LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2, 2(3), 27:1-27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. https://doi.org/10.1145/1961189.1961199
Chollet, F., et al. (2015). Keras. https://keras.io, 2015.
Claesen, M.; Simm, J.; Popovic, D.; Moreau, Y.; Moor, B. D. (2014). Easy Hyperparameter Search Using Optunity, CoRR abs/1412.1114 , 2014.
Domhan, T.; Springenberg, J. T.; Hutter, F. (2015). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves, In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, AAAI Press, 3460- 3468, 2015.
Florea, A. C.; Andonie, R. (2018). A Dynamic Early Stopping Criterion for Random Search in SVM Hyperparameter Optimization, In 14th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI) (Rhodes, Greece, May 2018), L. Iliadis, I. Maglogiannis, and V. Plagianakos, Eds., vol. AICT-519 of Artificial Intelligence Applications and Innovations, Springer International Publishing, Part 3: Support Vector Machines, 168-180, 2018. https://doi.org/10.1007/978-3-319-92007-8_15
Griewank, A. (1981). Generalized decent for global optimization, Journal of Optimization Theory and Applications, 34, 11-39, 1981. https://doi.org/10.1007/BF00933356
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H. (2009). The WEKA data mining software: An update, SIGKDD Explor. Newsl., 11(1), 10-18, 2009. https://doi.org/10.1145/1656274.1656278
Hansen, N.; Müller, S. D.; Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es), Evol. Comput., 11(1), 1-18, 2003. https://doi.org/10.1162/106365603321828970
He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, 2016. https://doi.org/10.1109/CVPR.2016.90
Hinton, G. E. (2012). A practical guide to training Restricted Boltzmann Machines, In Neural Networks: Tricks of the Trade (2nd ed.), G. Montavon, G. B. Orr, and K.-R. MAzller, Eds., vol. 7700 of Lecture Notes in Computer Science. Springer, 599-619, 2012.
Hutter, F.; Hoos, H.; Leyton-Brown, K. (2014). An efficient approach for assessing hyperparameter importance, In Proceedings of International Conference on Machine Learning 2014 (ICML 2014), 754-762, 2014.
Kennedy, J.; Eberhart, R. C. (1995). Particle swarm optimization, In Proceedings of the IEEE International Conference on Neural Networks, 1942-1948, 1995. https://doi.org/10.1109/ICNN.1995.488968
Kirkpatrick, S. (1984). Optimization by simulated annealing: Quantitative studies, Journal of Statistical Physics, 34(5), 975-986, 1984. https://doi.org/10.1007/BF01009452
Kotthoff, L.; Thornton, C.; Hoos, H. H.; Hutter, F.; Leyton-Brown, K. (2017). Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research, 18(25), 1-5, 2017.
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks, In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (USA, 2012), NIPS'12, Curran Associates Inc., 1097-1105, 2012.
LeCun, Y.; Bottou, L.; Orr, G.; MAzller, K. (2012). Efficient Backprop, vol. 7700 LECTURE NO of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, 9-48, 2012.
Lemley, J.: Jagodzinski, F.; Andonie, R. (2016). Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data, In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 1, 563-571, 2016.
Li, L.; Jamieson, K. G.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. (2016). Efficient hyperparameter optimization and infinitely many armed bandits. CoRR abs/1603.06560 , 2016.
Nair, V.; Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines, In Proceedings of the 27th International Conference on International Conference on Machine Learning (USA, 2010), ICML'10, Omnipress, 807-814, 2010.
Nelder, J. A.; Mead, R. (1965). A Simplex Method for Function Minimization, Computer Journal, 7, 308-313, 1965. https://doi.org/10.1093/comjnl/7.4.308
Patterson, J.; Gibson, A. (2017). Deep Learning: A Practitioner's Approach, 1st ed. O'Reilly Media, Inc., 2017.
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. (2011). Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830, 2011.
Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; de Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104, 148-175, 2016. https://doi.org/10.1109/JPROC.2015.2494218
Smusz, S.; Czarnecki, W. M.; Warszycki, D.; Bojarski, A. J. (2015). Exploiting uncertainty measures in compounds activity prediction using support vector machines, Bioorganic & medicinal chemistry letters, 25(1), 100-105, 2015. https://doi.org/10.1016/j.bmcl.2014.11.005
Snoek, J.; Larochelle, H.; Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms, In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2951-2959, 2012.
Sobol, I. (1976). Uniformly distributed sequences with an additional uniform property, USSR Computational Mathematics and Mathematical Physics, 16(5), 236 - 242, 1976. https://doi.org/10.1016/0041-5553(76)90154-3
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. (2015). Going deeper with convolutions, In Computer Vision and Pattern Recognition (CVPR), 2015.
Thornton, C.; Hutter, F.; Hoos, H. H.; Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Zeiler, M. D.; Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision - ECCV 2014 (Cham, 2014), D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., Springer International Publishing, 818-833, 2014. https://doi.org/10.1007/978-3-319-10590-1_53
Zoph, B.; Le, Q. V. (2016). Neural architecture search with reinforcement learning, CoRR abs/1611.01578 (2016).
Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q. V. (2017). Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 , 2017.
[Online]. Bigml; BigML, Inc. https://bigml.com/ Accessed: 2019-01-10.
[Online]. Cifar 10; Krizhevsky, A., Nair, V., and Hinton, G. https://www.cs.toronto.edu/~kriz/cifar.html Accessed: 2019-01-10.
[Online]. Google HyperTune, Google. https://cloud.google.com/ml-engine/docs/ tensorflow/using-hyperparameter-tuning Accessed: 2019-01-10.
[Online]. Optunity, http://optunity.readthedocs.io/en/latest/. Accessed: 2019-01-10.
[Online]. SigOpt, SigOpt. https://sigopt.com/ Sigopt. Accessed: 2019-01-10.
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.