Abstract
The aim of this research is to find out how to perform effective clustering of unlabeled personal blog posts written in English by gender. Given a gender-labeled blog corpus and a blog corpus that is not genderlabeled, we extracted from the labeled corpus distinguishable unigrams for both males and females. Then, we defined two general features that represent the relative frequencies of the distinguishable males' unigrams and females' unigrams, (males' frequency and females' frequency). The best distinguishable feature was found to be the males' frequency feature with a ratio factor at least 1.4 times that of females. This feature leads to accuracy rate of 83.7% for gender clustering of the unlabeled blog corpus. To the best of our knowledge, this study presents two novelties: (1) this is the first study to cluster blog posts by gender, and (2) clustering of an unlabeled corpus using distinguishable features that were extracted from a labeled corpus.
Original language | English |
---|---|
Title of host publication | KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval |
Editors | Ana Fred, Jan Dietz, David Aveiro, Kecheng Liu, Jorge Bernardino, Joaquim Filipe, Joaquim Filipe |
Publisher | SciTePress |
Pages | 384-391 |
Number of pages | 8 |
ISBN (Electronic) | 9789897582035 |
DOIs | |
State | Published - 2016 |
Externally published | Yes |
Event | 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2016 - Porto, Portugal Duration: 9 Nov 2016 → 11 Nov 2016 |
Publication series
Name | IC3K 2016 - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management |
---|---|
Volume | 1 |
Conference
Conference | 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2016 |
---|---|
Country/Territory | Portugal |
City | Porto |
Period | 9/11/16 → 11/11/16 |
Bibliographical note
Funding Information:The authors would like to acknowledge the ?laser and additive manufacturing unit and Advanced Melting Unit? at Central Metallurgical Research and Development Institute, for supporting this work. They also would like to thank Prof. Khalid Abdelhany for his valuable assistance in this work.
Funding Information:
The authors would like to acknowledge the "laser and additive manufacturing unit and Advanced Melting Unit" at Central Metallurgical Research and Development Institute, for supporting this work. They also would like to thank Prof. Khalid Abdelhany for his valuable assistance in this work.
Funding
The authors would like to acknowledge the ?laser and additive manufacturing unit and Advanced Melting Unit? at Central Metallurgical Research and Development Institute, for supporting this work. They also would like to thank Prof. Khalid Abdelhany for his valuable assistance in this work. The authors would like to acknowledge the "laser and additive manufacturing unit and Advanced Melting Unit" at Central Metallurgical Research and Development Institute, for supporting this work. They also would like to thank Prof. Khalid Abdelhany for his valuable assistance in this work.
Funders | Funder number |
---|---|
Central Metallurgical Research and Development Institute | |
Central Metallurgical Research and Development Institute |
Keywords
- Blog Posts
- Distinguishable Features
- Gender Clustering.