TY - JOUR
T1 - CRF with deep class embedding for large scale classification
AU - Goldman, Eran
AU - Goldberger, Jacob
N1 - Publisher Copyright:
© 2019 Elsevier Inc.
PY - 2020/2
Y1 - 2020/2
N2 - This paper presents a novel deep learning architecture for classifying structured objects in ultrafine-grained datasets, where classes may not be clearly distinguishable by their appearance but rather by their context. We model sequences of images as linear-chain CRFs, and jointly learn the parameters from both local-visual features and neighboring class information. The visual features are learned by convolutional layers, whereas class-structure information is reparametrized by factorizing the CRF pairwise potential matrix. This forms a context-based semantic similarity space, learned alongside the visual similarities, and dramatically increases the learning capacity of contextual information. This new parametrization, however, forms a highly nonlinear objective function which is challenging to optimize. To overcome this, we develop a novel surrogate likelihood which allows for a local likelihood approximation of the original CRF with integrated batch-normalization. This model overcomes the difficulties of existing CRF methods to learn the contextual relationships thoroughly when there is a large number of classes and the data is sparse. The performance of the proposed method is illustrated on a huge dataset that contains images of retail-store product displays, and shows significantly improved results compared to linear CRF parametrization, unnormalized likelihood optimization, and RNN modeling. We also show improved results on a standard OCR dataset.
AB - This paper presents a novel deep learning architecture for classifying structured objects in ultrafine-grained datasets, where classes may not be clearly distinguishable by their appearance but rather by their context. We model sequences of images as linear-chain CRFs, and jointly learn the parameters from both local-visual features and neighboring class information. The visual features are learned by convolutional layers, whereas class-structure information is reparametrized by factorizing the CRF pairwise potential matrix. This forms a context-based semantic similarity space, learned alongside the visual similarities, and dramatically increases the learning capacity of contextual information. This new parametrization, however, forms a highly nonlinear objective function which is challenging to optimize. To overcome this, we develop a novel surrogate likelihood which allows for a local likelihood approximation of the original CRF with integrated batch-normalization. This model overcomes the difficulties of existing CRF methods to learn the contextual relationships thoroughly when there is a large number of classes and the data is sparse. The performance of the proposed method is illustrated on a huge dataset that contains images of retail-store product displays, and shows significantly improved results compared to linear CRF parametrization, unnormalized likelihood optimization, and RNN modeling. We also show improved results on a standard OCR dataset.
KW - Batch normalization
KW - CRF
KW - Class embedding
KW - Matrix factorization
KW - Surrogate likelihood
UR - https://www.scopus.com/pages/publications/85075356557
U2 - 10.1016/j.cviu.2019.102865
DO - 10.1016/j.cviu.2019.102865
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85075356557
SN - 1077-3142
VL - 191
JO - Computer Vision and Image Understanding
JF - Computer Vision and Image Understanding
M1 - 102865
ER -