In this article, we utilized large-scale statistical analysis and data visualization techniques of the greatest collection in the world of Hebrew manuscript metadata records to develop a new methodology for quantitative investigation of the palaeographic, geographic, and temporal characteristics of historical manuscripts. The study aims to explore whether and to what extent the script type of the manuscript and its changes over time can be used to automatically predict and complete missing geospatial data of the manuscripts. To this end, various ontological entities were used as features to train supervised machine-learning algorithms to predict the places of writing of manuscripts which were often absent in the catalogue records. The obtained results show that while the script type as an only feature might not be sufficient for prediction of the location of the manuscript's writing, its combination with temporal data of the manuscript yielded about 80% accuracy. Eventually, our system was able to complete the missing places of writing for over 60% of the manuscripts in the corpus. Moreover, we found that through typical and marginal script types in different regions and their changes over time, it is possible to draw the migration map of the Jewish communities over the centuries. This reinforces the findings of historical research on Jewish migration patterns and communal formation. For example, the waves of immigration from Western Europe can be seen clearly from the second half of the 13th century, which continued until the 17th century and greatly increased the Eastern European Jewish community.
Bibliographical notePublisher Copyright:
© The Author(s) 2019.