An Ensemble Machine Learning Approach to Understanding the Effect of a Global Pandemic on Twitter Users’ Attitudes


  • Bokang Jia New York University Abu Dhabi
  • Domnica Dzitac New York University Abu Dhabi
  • Samridha Shrestha New York University Abu Dhabi
  • Komiljon Turdaliev New York University Abu Dhabi,
  • Nurgazy Seidaliev New York University Abu Dhabi


COVID-19, Coronavirus, Machine Learning, Natural Language Processing, Automatic Hate-Speech Detection, Racism


It is thought that the COVID-19 outbreak has significantly fuelled racism and discrimination, especially towards Asian individuals[10]. In order to test this hypothesis, in this paper, we build upon existing work in order to classify racist tweets before and after COVID-19 was declared a global pandemic. To overcome the difficult linguistic and unbalanced nature of the classification task, we combine an ensemble of machine learning techniques such as a Linear Support Vector Classifiers, Logistic Regression models, and Deep Neural Networks. We fill the gap in existing literature by (1) using a combined Machine Learning approach to understand the effect of COVID-19 on Twitter users’ attitudes and by (2) improving on the performance of automatic racism detectors. Here we show that there has not been a sharp increase in racism towards Asian people on Twitter and that users that posted racist Tweets before the pandemic are prone to post an approximately equal amount during the outbreak. Previous research on racism and other virus outbreaks suggests that racism towards communities associated with the region of the origin of the virus is not exclusively attributed to the outbreak but rather it is a continued symptom of deep-rooted biases towards minorities[13]. Our research supports these previous findings. We conclude that the COVID-19 outbreak is an additional outlet to discriminate against Asian people, instead of it being the main cause.


[1] Aken, B., Risch, J., Lí¶ser, A. (2018). Challenges for Toxic Comment Classification: An In-Depth Error Analysis CoRR , DOI: 10.18653/v1/W18-5105

[2] Atkenson, A. (2020). What Will Be the Economic Impact of COVID-19 in the US? Rough Estimates of Disease Scenarios National Bureau of Economic Research, DOI: 10.3386/w26867

[3] Chen, E., Lerman, K., Ferrara, E. (2020). COVID-19: The First Public Coronavirus Twitter Dataset JMIR Public Health Surveill , DOI: 10.2196/19273

[4] Davidson T., Warmsley D., Macy M., Weber I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language, AAAI Publications, Eleventh International AAAI Conference on Web and Social Media, DOI:

[5] Devakumar, D., Shannon, G., Bhopal, S. S., Abubakar, I.(2020). Racism and discrimination in COVID-19 responses The Lancet, DOI:

[6] Godin, F., Vandersmissen, B., De Neve, W., Van de Walle, R. (2015). Named Entity Recognition for Twitter Microposts using Distributed Word Representations Proceedings of the Workshop on Noisy User-generated Text , DOI: 10.18653/v1/W15-4322

[7] Hanasoge, S., Horiuchi, N., Huang, C., Jia, H., Kim, N. Y., Murao, M., Seo, M., Tan, R., Wilkinson, J. (2020). Visibility challenges for Asian scientists Nature Reviews Physics, DOI:

[8] Kwok, I., Wang, Y. (2013). Locate the Hate: Detecting Tweets against Blacks, AAAI Publications, Twenty-Seventh AAAI Conference on Artificial Intelligence , DOI: 10.5555/2891460.2891697

[9] Li J., Guo K., Viedma E. H., Lee H., Liu J., Zhong N., Gomes L. F. A. M., Filip F.G., Fang SC., í–zdemir M.S., Liu X., Lu G., Shi Y. (2020), Culture versus Policy: More Global Collaboration to Effectively Combat COVID-19 The Innovation, Volume 1, Issue 2

[10] Nature (2020). Stop the coronavirus stigma now Nature 580, 165, DOI:

[11] Pitsilis, G., Ramampiaro, H., Langseth, H. (2018). Effective hate-speech detection in Twitter data using recurrent neural networks Appl Intell 48, DOI:

[12] Saars H. A., Keil, R. (2006). Global Cities and the Spread of Infectious Disease: The Case of Severe Acute Respiratory Syndrome (SARS) in Toronto, Canada Urban Studies , DOI:

[13] Siu, J. Y. (2015). Influence of social experiences in shaping perceptions of the Ebola virus among African residents of Hong Kong during the 2014 outbreak: a qualitative study International Journal for Equity in Health , DOI:

[14] Dong E, Du H, Gardner L. (2020). An interactive web-based dashboard to track COVID-19 in real time Lancet Inf Dis. 20(5):533-534, DOI: 10.1016/S1473-3099(20)30120-1

[15] Wang, W., Chen, L., Thirunarayan, K., Sheth, A. P. (2014). Cursing in English on Twitter Association for Computing Machinery , DOI:

[16] Waseem, Z., Hovy, D.(2016). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter Proceedings of the NAACL Student Research Workshop , DOI: 10.18653/v1/N16-2013

[17] World Health Organization (2021). Coronavirus disease 2021 (COVID-19): situation report, 52 World Health Organization

[18] Zimmerman, S., Kruschwitz, U., Fox, C. (2018). Improving Hate Speech Detection with Deep Learning Ensembles Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

[19] Zubiaga, A., Voss, A., Procter, R., Liakata, M., Wang, B., Tsakalidis, A.(2016). Towards Real-Time, Country-Level Location Classification of Worldwide Tweets IEEE Transactions on Knowledge and Data Engineering, Volume: 29, Issue: 9, Sept. 1 2017, DOI: 10.1109/TKDE.2017.2698463

Additional Files



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.