TY - GEN
T1 - Cache Replacement Policies for Multicore Processors
AU - Hassidim, A.
N1 - Place of conference:China
PY - 2010
Y1 - 2010
N2 - Almost all of the modern computers use multiple cores, and the number of cores is
expected to increase as hardware prices go down, and Moore's law fails to hold. Most of
the theoretical algorithmic work so far has focused on the setting where multiple cores
are performing the same task. Indeed, one is tempted to assume that when the cores are
independent then the current design performs well.
This work infirms this assumption by showing that even when the cores run completely
independent tasks, there exist dependencies arising from running on the same
chip, and using the same cache. These dependencies cause the standard caching algorithms
to underperform. To address the new challenge, we revisit some aspects of the
classical caching design.
More specifically, we focus on the page replacement policy of the first cache shared
between all the cores (usually the L2 cache). We make the simplifying assumption
that since the cores are running independent tasks, they are accessing disjoint memory
locations (in particular this means that maintaining coherency is not an issue). We
show, that even under this simplifying assumption, the multicore case is fundamentally
different then the single core case. In particular
1. LRU performs poorly, even with resource augmentation.
2. The offline version of the caching problem is NP complete.
Any attempt to design an efficient cache for a multicore machine in which the cores
may access the same memory has to perform well also in this simpler setting. We provide
some intuition to what an efficient solution could look like, by
1. Partly characterizing the offline solution, showing that it is determined by the part
of the cache which is devoted to each core at every timestep.
2. Presenting a PTAS for the offline problem, for some range of the parameters.
In the recent years, multicore caching was the subject of extensive experimental
research. The conclusions of some of these works are that LRU is inefficient in practice.
The heuristics which they propose to replace it are based on dividing the cache between
cores, and handling each part independently. Our work can be seen as a theoretical
explanation to the results of these experiments.
AB - Almost all of the modern computers use multiple cores, and the number of cores is
expected to increase as hardware prices go down, and Moore's law fails to hold. Most of
the theoretical algorithmic work so far has focused on the setting where multiple cores
are performing the same task. Indeed, one is tempted to assume that when the cores are
independent then the current design performs well.
This work infirms this assumption by showing that even when the cores run completely
independent tasks, there exist dependencies arising from running on the same
chip, and using the same cache. These dependencies cause the standard caching algorithms
to underperform. To address the new challenge, we revisit some aspects of the
classical caching design.
More specifically, we focus on the page replacement policy of the first cache shared
between all the cores (usually the L2 cache). We make the simplifying assumption
that since the cores are running independent tasks, they are accessing disjoint memory
locations (in particular this means that maintaining coherency is not an issue). We
show, that even under this simplifying assumption, the multicore case is fundamentally
different then the single core case. In particular
1. LRU performs poorly, even with resource augmentation.
2. The offline version of the caching problem is NP complete.
Any attempt to design an efficient cache for a multicore machine in which the cores
may access the same memory has to perform well also in this simpler setting. We provide
some intuition to what an efficient solution could look like, by
1. Partly characterizing the offline solution, showing that it is determined by the part
of the cache which is devoted to each core at every timestep.
2. Presenting a PTAS for the offline problem, for some range of the parameters.
In the recent years, multicore caching was the subject of extensive experimental
research. The conclusions of some of these works are that LRU is inefficient in practice.
The heuristics which they propose to replace it are based on dividing the cache between
cores, and handling each part independently. Our work can be seen as a theoretical
explanation to the results of these experiments.
UR - https://scholar.google.co.il/scholar?q=Cache+Replacement+Policies+for+Multicore+Processors&btnG=&hl=en&as_sdt=0%2C5
M3 - Conference contribution
BT - ICS
ER -