TY - GEN
T1 - Active clustering of document fragments using information derived from both images and catalogs
AU - Wolf, Lior
AU - Litwak, Lior
AU - Dershowitz, Nachum
AU - Shweka, Roni
AU - Choueka, Yaacov
PY - 2011
Y1 - 2011
N2 - Many significant historical corpora contain leaves that are mixed up and no longer bound in their original state as multi-page documents. The reconstruction of old manuscripts from a mix of disjoint leaves can therefore be of paramount importance to historians and literary scholars. Previously, it was shown that visual similarity provides meaningful pair-wise similarities between handwritten leaves. Here, we go a step further and suggest a semiautomatic clustering tool that helps reconstruct the original documents. The proposed solution is based on a graphical model that makes inferences based on catalog information provided for each leaf as well as on the pairwise similarities of handwriting. Several novel active clustering techniques are explored, and the solution is applied to a significant part of the Cairo Genizah, where the problem of joining leaves remains unsolved even after a century of extensive study by hundreds of scholars.
AB - Many significant historical corpora contain leaves that are mixed up and no longer bound in their original state as multi-page documents. The reconstruction of old manuscripts from a mix of disjoint leaves can therefore be of paramount importance to historians and literary scholars. Previously, it was shown that visual similarity provides meaningful pair-wise similarities between handwritten leaves. Here, we go a step further and suggest a semiautomatic clustering tool that helps reconstruct the original documents. The proposed solution is based on a graphical model that makes inferences based on catalog information provided for each leaf as well as on the pairwise similarities of handwriting. Several novel active clustering techniques are explored, and the solution is applied to a significant part of the Cairo Genizah, where the problem of joining leaves remains unsolved even after a century of extensive study by hundreds of scholars.
UR - http://www.scopus.com/inward/record.url?scp=84856688035&partnerID=8YFLogxK
U2 - 10.1109/iccv.2011.6126428
DO - 10.1109/iccv.2011.6126428
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84856688035
SN - 9781457711015
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 1661
EP - 1667
BT - 2011 International Conference on Computer Vision, ICCV 2011
T2 - 2011 IEEE International Conference on Computer Vision, ICCV 2011
Y2 - 6 November 2011 through 13 November 2011
ER -