TY - GEN
T1 - A fully integrated multi-CPU, GPU and memory controller 32nm processor
AU - Yuffe, Marcelo
AU - Knoll, Ernest
AU - Mehalel, Moty
AU - Shor, Joseph
AU - Kurts, Tsvika
PY - 2011
Y1 - 2011
N2 - This paper describes the 32nm Sandy Bridge processor that integrates up to 4 high performance Intel Architecture (IA) cores, a power/performance optimized graphic processing unit (GPU) and memory and PCIe controllers in the same die. The Sandy Bridge architecture block diagram is shown in Fig. 15.1.1 and the floorplan of a four IA-core version is shown in Fig. 15.1.2. The Sandy Bridge IA core implements an improved branch prediction algorithm, a micro-operation (Uop) cache, a floating point Advanced Vector Extension (AVX), a second load port in the L1 cache and bigger register files in the out-of-order part of the machine; all these architecture improvements boost the IA core performance without increasing the thermal power dissipation envelope or the average power consumption (to preserve battery life in mobile systems). The CPUs and GPU share the same 8MB level-3 cache memory. The data flow is optimized by a high performance on die interconnect fabric (called "ring") that connects between the CPUs, the GPU, the L3 cache and the system agent (SA) unit that houses a 1600MT/s, dual channel DDR3 memory controller, a 20-lane PCIe gen2 controller, a two parallel pipe display engine, the power management control unit and the testability logic. An on die EPROM is used for configurability and yield optimization.
AB - This paper describes the 32nm Sandy Bridge processor that integrates up to 4 high performance Intel Architecture (IA) cores, a power/performance optimized graphic processing unit (GPU) and memory and PCIe controllers in the same die. The Sandy Bridge architecture block diagram is shown in Fig. 15.1.1 and the floorplan of a four IA-core version is shown in Fig. 15.1.2. The Sandy Bridge IA core implements an improved branch prediction algorithm, a micro-operation (Uop) cache, a floating point Advanced Vector Extension (AVX), a second load port in the L1 cache and bigger register files in the out-of-order part of the machine; all these architecture improvements boost the IA core performance without increasing the thermal power dissipation envelope or the average power consumption (to preserve battery life in mobile systems). The CPUs and GPU share the same 8MB level-3 cache memory. The data flow is optimized by a high performance on die interconnect fabric (called "ring") that connects between the CPUs, the GPU, the L3 cache and the system agent (SA) unit that houses a 1600MT/s, dual channel DDR3 memory controller, a 20-lane PCIe gen2 controller, a two parallel pipe display engine, the power management control unit and the testability logic. An on die EPROM is used for configurability and yield optimization.
UR - http://www.scopus.com/inward/record.url?scp=79955708930&partnerID=8YFLogxK
U2 - 10.1109/ISSCC.2011.5746311
DO - 10.1109/ISSCC.2011.5746311
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:79955708930
SN - 9781612843001
T3 - Digest of Technical Papers - IEEE International Solid-State Circuits Conference
SP - 264
EP - 265
BT - 2011 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, ISSCC 2011
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2011 IEEE International Solid-State Circuits Conference, ISSCC 2011
Y2 - 20 February 2011 through 24 February 2011
ER -