Text Classification Research with Attention-based Recurrent Neural Networks

Changshun Du, Lei Huang


Text classification is one of the principal tasks of machine learning. It aims to design proper algorithms to enable computers to extract features and classify texts automatically. In the past, this has been mainly based on the classification of keywords and neural network semantic synthesis classification. The former emphasizes the role of keywords, while the latter focuses on the combination of words between roles. The method proposed in this paper considers the advantages of both methods. It uses an attention mechanism to learn weighting for each word. Under the setting, key words will have a higher weight, and common words will have lower weight. Therefore, the representation of texts not only considers all words, but also pays more attention to key words. Then we feed the feature vector to a softmax classifier. At last, we conduct experiments on two news classification datasets published by NLPCC2014 and Reuters, respectively. The proposed model achieves F-values by 88.5% and 51.8% on the two datasets. The experimental results show that our method outperforms all the traditional baseline systems.


machine learning, text classification, attention mechanism, bidirectional RNN, word vector

Full Text:



Bahdanau, D.; Kyunghyun Cho, K.; Bengio Y. (2014); Neural machine translation by jointly learning to align and translate, ICLR 2015, arXiv preprint arXiv, 1409.0473, 2014.

Chung, J.; Gulcehre, C.; Cho, K. et al. (2015); Gated feedback recurrent neural networks, International Conference on Machine Learning, 37, 2067-2075, 2015.

Graves, A.; Schmidhuber, J. (2005); Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, 18(5), 602–610, 2005.

Hua, L. (2007); Text Categorization Base on Key Phrases, Journal of Chinese Information Processing, 21(4), 34–41, 2007. (in Chinese)

Huang, E.H.; Socher, R.; Manning, C.D.; et al. (2012); Improving word representations via global context and multiple word prototypes, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Association for Computational Linguistics, 873–882, 2012.

Li, W.; Wu, G.; Zhang, F.; Du, Q. (2017); Hyperspectral Image Classification Using Deep Pixel-Pair Features, IEEE Transactions on Geoscience and Remote Sensing, 55(2), 844-853, 2017.

Luong, T.; Socher, R.; Manning, C.D. (2013); Better Word Representations with Recursive Neural Networks for Morphology, CoNLL, 104–113, 2013.

Mikolov, T.; Sutskever, I.; Chen, K.; et al. (2013); Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, 3111–3119, 2013.

Mikolov, T.; Yih, W.T.; Zweig, G. (2013); Linguistic regularities in continuous space word representations, Proceedings of NAACL HLT 2013, Atlanta, USA, 746-751, 2013.

Nitish, S.; Salakhutdinov, R.R.; Hinton G.E. (2013); Modeling documents with deep boltzmann machines, Uncertainty in Artificial Intelligence - Proceedings of the 29th Conference, 616-624, 2013.

Pennington, J.; Socher, R.; Manning, C.D. (2014); GloVe: Global vectors for word representation, Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 1532–1543, 2014.

Socher, R.; Huval, B.; Manning, C.D.; et al. (2012); Semantic compositionality through recursive matrix-vector spaces, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, 1201–1211, 2012.

Socher, R.; Perelygin, A.; Wu, J.Y.; et al. (2013); Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the conference on empirical methods in natural language processing (EMNLP), 1631–1642, 2013.

Srivastava, N.; Hinton, G.; Krizhevsky, A.; et al. (2014); Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, 15(1), 1929– 1958, 2014.

Xu, X. ; Li, W.; Ran, Q.; et al. (2018); Multisource Remote Sensing Data Classification Based on Convolutional Neural Network, IEEE Transactions on Geoscience and Remote Sensing, 56(2), 937-949, 2018.

Yao, Q.Z.; Song, Z.L.; Peng, C. (2011); Research on text categorization based on LDA, Computer Engineering and Applications, 47(13), 150–153, 2011. (in Chinese)

Zeng, D.; Liu, K.; Lai, S.; et al. (2014); Relation Classification via Convolutional Deep Neural Network, COLING, 2335–2344, 2014.

Zhang, A.-L., Liu, G.-L., Liu C.-Y. (2004); Research on multiple classes text categorization based o SVM, Journal of Information, 9, 6–10, 2004. (in Chinese)

DOI: http://dx.doi.org/10.15837/ijccc.2018.1.3142

Copyright (c) 2018 Changshun Du, Lei Huang

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]

INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2016: IF=1.374. .

IJCCC is indexed in Scopus from 2008 (SNIP2016 = 0.701, SJR2016 =0.319):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.