Scaling in Deep and Shallow Learning Architectures

Ella Koresh, Tal Halevi, Yuval Meir, Dolev Dilmoney, Tamar Dror, Ronit Gross, Ofek Tevet, Shiri Hodassman, Ido Kanter

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The realization of classification tasks using deep learning is a primary goal of artificial intelligence; however, its possible universal behavior remains unexplored. Herein, we demonstrate a scaling behavior for the test error, ϵ, as a function of the number of classified labels, K. For trained utmost deep architectures on CIFAR-100 ϵ(K)∝Kρ with ρ∼1, and in case of reduced deep architectures, ρ continuously decreases until a crossover to ϵ(K)∝log(K) is observed for shallow architectures. A similar crossover is observed for shallow architectures, where the number of filters in the convolutional layers is proportionally increased. This unified the scaling behavior of deep and shallow architectures, which yields a reduced latency method. The dependence of Δϵ/ΔK on the trained architecture is expected to be crucial in learning scenarios involving dynamic number of labels.

Original languageEnglish
Article number129909
JournalPhysica A: Statistical Mechanics and its Applications
Volume646
DOIs
StatePublished - 15 Jul 2024

Bibliographical note

Publisher Copyright:
© 2024 Elsevier B.V.

Keywords

  • Deep learning
  • Machine learning
  • Shallow learning

Fingerprint

Dive into the research topics of 'Scaling in Deep and Shallow Learning Architectures'. Together they form a unique fingerprint.

Cite this