Active clustering of document fragments using information derived from both images and catalogs

Lior Wolf, Lior Litwak, Nachum Dershowitz, Roni Shweka, Yaacov Choueka

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Many significant historical corpora contain leaves that are mixed up and no longer bound in their original state as multi-page documents. The reconstruction of old manuscripts from a mix of disjoint leaves can therefore be of paramount importance to historians and literary scholars. Previously, it was shown that visual similarity provides meaningful pair-wise similarities between handwritten leaves. Here, we go a step further and suggest a semiautomatic clustering tool that helps reconstruct the original documents. The proposed solution is based on a graphical model that makes inferences based on catalog information provided for each leaf as well as on the pairwise similarities of handwriting. Several novel active clustering techniques are explored, and the solution is applied to a significant part of the Cairo Genizah, where the problem of joining leaves remains unsolved even after a century of extensive study by hundreds of scholars.

Original languageEnglish
Title of host publication2011 International Conference on Computer Vision, ICCV 2011
Pages1661-1667
Number of pages7
DOIs
StatePublished - 2011
Externally publishedYes
Event2011 IEEE International Conference on Computer Vision, ICCV 2011 - Barcelona, Spain
Duration: 6 Nov 201113 Nov 2011

Publication series

NameProceedings of the IEEE International Conference on Computer Vision

Conference

Conference2011 IEEE International Conference on Computer Vision, ICCV 2011
Country/TerritorySpain
CityBarcelona
Period6/11/1113/11/11

Fingerprint

Dive into the research topics of 'Active clustering of document fragments using information derived from both images and catalogs'. Together they form a unique fingerprint.

Cite this