TY - JOUR
T1 - Categorical relevance judgment
AU - Zhitomirsky-Geffet, Maayan
AU - Bar-Ilan, Judit
AU - Levene, Mark
N1 - Publisher Copyright:
© 2018 ASIS&T
PY - 2018/9
Y1 - 2018/9
N2 - In this study we aim to explore users' behavior when assessing search results relevance based on the hypothesis of categorical thinking. To investigate how users categories search engine results, we perform several experiments where users are asked to group a list of 20 search results into several categories, while attaching a relevance judgment to each formed category. Moreover, to determine how users change their minds over time, each experiment was repeated three times under the same conditions, with a gap of one month between rounds. The results show that on average users form 4–5 categories. Within each round the size of a category decreases with the relevance of a category. To measure the agreement between the search engine's ranking and the users’ relevance judgments, we defined two novel similarity measures, the average concordance and the MinMax swap ratio. Similarity is shown to be the highest for the third round as the users' opinion stabilizes. Qualitative analysis uncovered some interesting points that users tended to categories results by type and reliability of their source, and particularly, found commercial sites less trustworthy, and attached high relevance to Wikipedia when their prior domain knowledge was limited.
AB - In this study we aim to explore users' behavior when assessing search results relevance based on the hypothesis of categorical thinking. To investigate how users categories search engine results, we perform several experiments where users are asked to group a list of 20 search results into several categories, while attaching a relevance judgment to each formed category. Moreover, to determine how users change their minds over time, each experiment was repeated three times under the same conditions, with a gap of one month between rounds. The results show that on average users form 4–5 categories. Within each round the size of a category decreases with the relevance of a category. To measure the agreement between the search engine's ranking and the users’ relevance judgments, we defined two novel similarity measures, the average concordance and the MinMax swap ratio. Similarity is shown to be the highest for the third round as the users' opinion stabilizes. Qualitative analysis uncovered some interesting points that users tended to categories results by type and reliability of their source, and particularly, found commercial sites less trustworthy, and attached high relevance to Wikipedia when their prior domain knowledge was limited.
UR - http://www.scopus.com/inward/record.url?scp=85052526464&partnerID=8YFLogxK
U2 - 10.1002/asi.24035
DO - 10.1002/asi.24035
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85052526464
SN - 2330-1635
VL - 69
SP - 1084
EP - 1094
JO - Journal of the Association for Information Science and Technology
JF - Journal of the Association for Information Science and Technology
IS - 9
ER -