TY - JOUR
T1 - Scaling in Deep and Shallow Learning Architectures
AU - Koresh, Ella
AU - Halevi, Tal
AU - Meir, Yuval
AU - Dilmoney, Dolev
AU - Dror, Tamar
AU - Gross, Ronit
AU - Tevet, Ofek
AU - Hodassman, Shiri
AU - Kanter, Ido
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/7/15
Y1 - 2024/7/15
N2 - The realization of classification tasks using deep learning is a primary goal of artificial intelligence; however, its possible universal behavior remains unexplored. Herein, we demonstrate a scaling behavior for the test error, ϵ, as a function of the number of classified labels, K. For trained utmost deep architectures on CIFAR-100 ϵ(K)∝Kρ with ρ∼1, and in case of reduced deep architectures, ρ continuously decreases until a crossover to ϵ(K)∝log(K) is observed for shallow architectures. A similar crossover is observed for shallow architectures, where the number of filters in the convolutional layers is proportionally increased. This unified the scaling behavior of deep and shallow architectures, which yields a reduced latency method. The dependence of Δϵ/ΔK on the trained architecture is expected to be crucial in learning scenarios involving dynamic number of labels.
AB - The realization of classification tasks using deep learning is a primary goal of artificial intelligence; however, its possible universal behavior remains unexplored. Herein, we demonstrate a scaling behavior for the test error, ϵ, as a function of the number of classified labels, K. For trained utmost deep architectures on CIFAR-100 ϵ(K)∝Kρ with ρ∼1, and in case of reduced deep architectures, ρ continuously decreases until a crossover to ϵ(K)∝log(K) is observed for shallow architectures. A similar crossover is observed for shallow architectures, where the number of filters in the convolutional layers is proportionally increased. This unified the scaling behavior of deep and shallow architectures, which yields a reduced latency method. The dependence of Δϵ/ΔK on the trained architecture is expected to be crucial in learning scenarios involving dynamic number of labels.
KW - Deep learning
KW - Machine learning
KW - Shallow learning
UR - http://www.scopus.com/inward/record.url?scp=85196829092&partnerID=8YFLogxK
U2 - 10.1016/j.physa.2024.129909
DO - 10.1016/j.physa.2024.129909
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85196829092
SN - 0378-4371
VL - 646
JO - Physica A: Statistical Mechanics and its Applications
JF - Physica A: Statistical Mechanics and its Applications
M1 - 129909
ER -