Abstract
This paper proposes a set of measures to evaluate search engine functionality over time. When coming to evaluate the performance of Web search engines, the evaluation criteria used in traditional information retrieval systems (precision, recall, etc.) are not sufficient. Web search engines operate in a highly dynamic, distributed environment, therefore it becomes necessary to assess search engine performance not just at a single point in time, but over a whole period. The size of a search engine's database is limited, and even if it grows, it grows more slowly than the Web. Thus the search engine has to decide whether and to what extent to include new pages in place of pages that were previously listed in the database. The optimal solution is that all new pages are listed, and no old ones are removed - but this of course is usually unachievable. The proposed metrics that evaluate search engine functionality in presence of dynamic changes include the percentage of newly added pages, and the percentage of the removed pages, which still exist on the Web. The percentage of non-existent pages (404 errors, nonexistent server, etc.) out of the set of retrieved pages indicates the timeliness of the search engine. The ideas in this paper elaborate on some of the measures introduced in a recently published paper (Bar-Ilan, 2002). I'd like to take advantage of the opportunity to discuss the problem of search engine evaluation in dynamic environments with the participants of the Web Dynamics Workshop.
Original language | English |
---|---|
Pages (from-to) | 70-77 |
Number of pages | 8 |
Journal | CEUR Workshop Proceedings |
Volume | 702 |
State | Published - 2002 |
Externally published | Yes |
Event | 2nd International Workshop on Web Dynamics, WebDyn 2002, in Conjunction with the 11th International World Wide Web Conference - Honululu, HI, United States Duration: 7 May 2002 → 7 May 2002 |