Weighted Random Search for Hyperparameter Optimization

Adrian-Catalin Florea, Razvan Andonie

Abstract


We introduce an improved version of Random Search (RS), used here for hyperparameter optimization of machine learning algorithms. Unlike the standard RS, which generates for each trial new values for all hyperparameters, we generate new values for each hyperparameter with a probability of change. The intuition behind our approach is that a value that already triggered a good result is a good candidate for the next step, and should be tested in new combinations of hyperparameter values. Within the same computational budget, our method yields better results than the standard RS. Our theoretical results prove this statement. We test our method on a variation of one of the most commonly used objective function for this class of problems (the Grievank function) and for the hyperparameter optimization of a deep learning CNN architecture. Our results can be generalized to any optimization problem dened on a discrete domain.

Keywords


hyperparameter optimization, random search, deep learning, neural networks

Full Text:

PDF

References


Albelwi, S.; Mahmood, A. (2017). A framework for designing the architectures of deep convolutional neural networks. Entropy, 19, 6 (2017).

Bergstra, J.; Bardenet, R.; Bengio, Y.; Kegl, B. (2011). Algorithms for hyper-parameter optimization. In NIPS (2011), J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, Eds., 2546-2554, 2011.

Bergstra, J.; and Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281-305, 2012.

Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; and Cox, D. D. (2015). Hyperopt: a Python library for model selection and hyperparameter optimization. Computational Science and Discovery, 8(1), 014008, 2015.
https://doi.org/10.1088/1749-4699/8/1/014008

Chang, C.-C.; Lin, C.-J. (2011). LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2, 2(3), 27:1-27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
https://doi.org/10.1145/1961189.1961199

Chollet, F., et al. (2015). Keras. https://keras.io, 2015.

Claesen, M.; Simm, J.; Popovic, D.; Moreau, Y.; Moor, B. D. (2014). Easy Hyperparameter Search Using Optunity, CoRR abs/1412.1114 , 2014.

Domhan, T.; Springenberg, J. T.; Hutter, F. (2015). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves, In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, AAAI Press, 3460- 3468, 2015.

Florea, A. C.; Andonie, R. (2018). A Dynamic Early Stopping Criterion for Random Search in SVM Hyperparameter Optimization, In 14th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI) (Rhodes, Greece, May 2018), L. Iliadis, I. Maglogiannis, and V. Plagianakos, Eds., vol. AICT-519 of Artificial Intelligence Applications and Innovations, Springer International Publishing, Part 3: Support Vector Machines, 168-180, 2018.
https://doi.org/10.1007/978-3-319-92007-8_15

Griewank, A. (1981). Generalized decent for global optimization, Journal of Optimization Theory and Applications, 34, 11-39, 1981.
https://doi.org/10.1007/BF00933356

Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H. (2009). The WEKA data mining software: An update, SIGKDD Explor. Newsl., 11(1), 10-18, 2009.
https://doi.org/10.1145/1656274.1656278

Hansen, N.; Müller, S. D.; Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es), Evol. Comput., 11(1), 1-18, 2003.
https://doi.org/10.1162/106365603321828970

He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, 2016.
https://doi.org/10.1109/CVPR.2016.90

Hinton, G. E. (2012). A practical guide to training Restricted Boltzmann Machines, In Neural Networks: Tricks of the Trade (2nd ed.), G. Montavon, G. B. Orr, and K.-R. MAzller, Eds., vol. 7700 of Lecture Notes in Computer Science. Springer, 599-619, 2012.

Hutter, F.; Hoos, H.; Leyton-Brown, K. (2014). An efficient approach for assessing hyperparameter importance, In Proceedings of International Conference on Machine Learning 2014 (ICML 2014), 754-762, 2014.

Kennedy, J.; Eberhart, R. C. (1995). Particle swarm optimization, In Proceedings of the IEEE International Conference on Neural Networks, 1942-1948, 1995.
https://doi.org/10.1109/ICNN.1995.488968

Kirkpatrick, S. (1984). Optimization by simulated annealing: Quantitative studies, Journal of Statistical Physics, 34(5), 975-986, 1984.
https://doi.org/10.1007/BF01009452

Kotthoff, L.; Thornton, C.; Hoos, H. H.; Hutter, F.; Leyton-Brown, K. (2017). Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research, 18(25), 1-5, 2017.

Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks, In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (USA, 2012), NIPS'12, Curran Associates Inc., 1097-1105, 2012.

LeCun, Y.; Bottou, L.; Orr, G.; MAzller, K. (2012). Efficient Backprop, vol. 7700 LECTURE NO of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, 9-48, 2012.

Lemley, J.: Jagodzinski, F.; Andonie, R. (2016). Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data, In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 1, 563-571, 2016.

Li, L.; Jamieson, K. G.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. (2016). Efficient hyperparameter optimization and infinitely many armed bandits. CoRR abs/1603.06560 , 2016.

Nair, V.; Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines, In Proceedings of the 27th International Conference on International Conference on Machine Learning (USA, 2010), ICML'10, Omnipress, 807-814, 2010.

Nelder, J. A.; Mead, R. (1965). A Simplex Method for Function Minimization, Computer Journal, 7, 308-313, 1965.
https://doi.org/10.1093/comjnl/7.4.308

Patterson, J.; Gibson, A. (2017). Deep Learning: A Practitioner's Approach, 1st ed. O'Reilly Media, Inc., 2017.

Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. (2011). Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830, 2011.

Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; de Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104, 148-175, 2016.
https://doi.org/10.1109/JPROC.2015.2494218

Smusz, S.; Czarnecki, W. M.; Warszycki, D.; Bojarski, A. J. (2015). Exploiting uncertainty measures in compounds activity prediction using support vector machines, Bioorganic & medicinal chemistry letters, 25(1), 100-105, 2015.
https://doi.org/10.1016/j.bmcl.2014.11.005

Snoek, J.; Larochelle, H.; Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms, In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2951-2959, 2012.

Sobol, I. (1976). Uniformly distributed sequences with an additional uniform property, USSR Computational Mathematics and Mathematical Physics, 16(5), 236 - 242, 1976.
https://doi.org/10.1016/0041-5553(76)90154-3

Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. (2015). Going deeper with convolutions, In Computer Vision and Pattern Recognition (CVPR), 2015.

Thornton, C.; Hutter, F.; Hoos, H. H.; Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Zeiler, M. D.; Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision - ECCV 2014 (Cham, 2014), D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., Springer International Publishing, 818-833, 2014.
https://doi.org/10.1007/978-3-319-10590-1_53

Zoph, B.; Le, Q. V. (2016). Neural architecture search with reinforcement learning, CoRR abs/1611.01578 (2016).

Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q. V. (2017). Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 , 2017.

[Online]. Bigml; BigML, Inc. https://bigml.com/ Accessed: 2019-01-10.

[Online]. Cifar 10; Krizhevsky, A., Nair, V., and Hinton, G. https://www.cs.toronto.edu/~kriz/cifar.html Accessed: 2019-01-10.

[Online]. Google HyperTune, Google. https://cloud.google.com/ml-engine/docs/ tensorflow/using-hyperparameter-tuning Accessed: 2019-01-10.

[Online]. Optunity, http://optunity.readthedocs.io/en/latest/. Accessed: 2019-01-10.

[Online]. SigOpt, SigOpt. https://sigopt.com/ Sigopt. Accessed: 2019-01-10.




DOI: https://doi.org/10.15837/ijccc.2019.2.3514



Copyright (c) 2019 Adrian-Catalin Florea, Razvan ANDONIE

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2); Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.