TY - JOUR
T1 - Identifying join candidates in the Cairo Genizah
AU - Wolf, Lior
AU - Littman, Rotem
AU - Mayer, Naama
AU - German, Tanya
AU - Dershowitz, Nachum
AU - Shweka, Roni
AU - Choueka, Yaacov
PY - 2011/8
Y1 - 2011/8
N2 - A join is a set of manuscript-fragments that are known to originate from the same original work. The Cairo Genizah is a collection containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in libraries and private collections worldwide, and there is an ongoing effort to document and catalogue all extant fragments. The task of finding joins is currently conducted manually by experts, and presumably only a small fraction of the existing joins have been discovered. In this work, we study the problem of automatically finding candidate joins, so as to streamline the task. The proposed method is based on a combination of local descriptors and learning techniques. To evaluate the performance of various join-finding methods, without relying on the availability of human experts, we construct a benchmark dataset that is modeled on the Labeled Faces in the Wild benchmark for face recognition. Using this benchmark, we evaluate several alternative image representations and learning techniques. Finally, a set of newly-discovered join-candidates have been identified using our method and validated by a human expert.
AB - A join is a set of manuscript-fragments that are known to originate from the same original work. The Cairo Genizah is a collection containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in libraries and private collections worldwide, and there is an ongoing effort to document and catalogue all extant fragments. The task of finding joins is currently conducted manually by experts, and presumably only a small fraction of the existing joins have been discovered. In this work, we study the problem of automatically finding candidate joins, so as to streamline the task. The proposed method is based on a combination of local descriptors and learning techniques. To evaluate the performance of various join-finding methods, without relying on the availability of human experts, we construct a benchmark dataset that is modeled on the Labeled Faces in the Wild benchmark for face recognition. Using this benchmark, we evaluate several alternative image representations and learning techniques. Finally, a set of newly-discovered join-candidates have been identified using our method and validated by a human expert.
KW - Cairo Genizah
KW - Document analysis
KW - Similarity learning
UR - http://www.scopus.com/inward/record.url?scp=79958217040&partnerID=8YFLogxK
U2 - 10.1007/s11263-010-0389-8
DO - 10.1007/s11263-010-0389-8
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:79958217040
SN - 0920-5691
VL - 94
SP - 118
EP - 135
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 1
ER -