Classification using various machine learning methods and combinations of key-phrases and visual features

Yaakov Hacohen-Kerner, Asaf Sabag, Dimitris Liparas, Anastasia Moumtzidou, Stefanos Vrochidis, Ioannis Kompatsiaris

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

In this paper, we present a comparative study of news documents classification using various supervised machine learning methods and different combinations of key-phrases (word N-grams extracted from text) and visual features (extracted from a representative image from each document). The application domain is news documents written in English that belong to four categories: Health, Lifestyle-Leisure, Nature-Environment and Politics. The use of the N-gram textual feature set alone led to an accuracy result of 81.0%, which is much better than the corresponding accuracy result (58.4%) obtained through the use of the visual feature set alone. A competition between three classification methods, a feature selection method, and parameter tuning led to improved accuracy (86.7%), achieved by the Random Forests method.

Original languageEnglish
Title of host publicationSemantic Keyword-Based Search on Structured Data Sources First COST Action IC1302 – International KEYSTONE Conference, IKC 2015, Revised Selected Papers
EditorsYannis Velegrakis, Jorge Cardoso, Jorge Cardoso, Alexandre Miguel Pinto, Francesco Guerra, Geert-Jan Houben
PublisherSpringer Verlag
Pages64-75
Number of pages12
ISBN (Print)9783319279312
DOIs
StatePublished - 2015
Externally publishedYes
Event1st COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, IKC 2015 - Coimbra, Portugal
Duration: 8 Sep 20159 Sep 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9398
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, IKC 2015
Country/TerritoryPortugal
CityCoimbra
Period8/09/159/09/15

Bibliographical note

Funding Information:
This work was supported by MULTISENSOR project, partially funded by the European Commission, under the contract number FP7-610411. The authors would also like to thank Avi Rosenfeld, Maor Tzidkani and Daniel Nissim Cohen from the Jerusalem College of Technology, Lev Academic Center, for their assistance to the authors in providing the software tool to generate the textual features used in this research. The authors would also like to acknowledge the networking support by the COST Action IC1302: semantic KEYword-based Search on sTructured data sOurcEs (KEYSTONE) and the COST Action IC1307: The European Network on Integrating Vision and Language (iV&L Net).

Publisher Copyright:
© Springer International Publishing Switzerland 2015.

Keywords

  • Document classification
  • Feature selection
  • Key-phrases
  • N-gram features
  • Supervised learning
  • Visual features

Fingerprint

Dive into the research topics of 'Classification using various machine learning methods and combinations of key-phrases and visual features'. Together they form a unique fingerprint.

Cite this