摘要:针对多核私有Cache结构面临的容量失效问题,提出了一种基于细粒度伪划分的核间容量共享机制.通过在细粒度层次为每个Cache Bank设置加权饱和计数器阵列来统计和预测各线程的访存需求差异情况,控制各个处理器核在每个Cache Set上的私有域与共享域划分比例,并以此指导各处理器核上的牺牲块替换、溢出与接收决策,利用智能的核间容量借用机制来均衡处理器间访存需求差异,缓解多核私有Cache结构面临的容量失效问题.在体系结构级全系统模拟器上的实验结果表明,该机制能够有效改善多核私有Cache结构的容量失效问题,降低多线程应用程序的平均存储访问延迟.
关键词:微处理器;伪划分;溢出;替换;容量共享;压力均衡
中图分类号:TP302 文献标识码:A
A Capacity Sharing Mechanism Based on Fine-grained Pseudo-partitioning
between Private Caches for Chip Multiprocessors
HUANG An-wen,ZHANG Cheng-yi,SONG Chao,GUO Wei,LI Peng,ZHANG Min-xuan
(College of Computer, National Univ of Defense Technology, Changsha, Hunan 410073, China)
Abstract: A cache capacity sharing mechanism based on fine-grained pseudo-partitioning (CSFP) was proposed, which was aimed at the capacity miss problem confronted with the private caches in Chip Multiprocessors (CMP). Each cache bank was equipped with a weighted saturation counter array, designed to collect and predict the memory demand diversity experienced by different threads at a fine granularity. The private region and shared region on each cache set were adjusted adaptively, and the partition decision was used to not only guide the replacement of the victim block, but also control the co-operation of spilling and receiving dynamically. An intelligent capacity sharing mechanism was adopted to correct the memory imbalance between different cores, which mitigated the capacity misses in CMP private cache effectively. Experimental results based on a cycle-accurate architecture simulator show that the CSFP mechanism can reduce the capacity misses of private caches in CMP significantly, so the average memory access latency of different programs can be reduced to some extent.
Key words: microprocessor chips;pseudo-partitioning;spilling;replacement;capacity sharing;pressure balance
多核处理器可以在本地私有Cache上复制相应的数据副本,具备以下性能优势:首先,数据的物理放置位置距离请求者处理器核较近,命中延迟较低;其次,不同处理器核上运行的程序间的存储访问不会造成彼此干扰,便于实现性能隔离.
然而,与多核共享Cache相比,私有Cache机制也有若干弊端.第一,多个程序间或者同一程序的多个线程间的访存需求经常呈现出非均衡现象[1],容易导致部分处理器核由于私有L2空间不能满足需求而发生容量失效,而其他处理器核的L2 Cache尚有未利用空间[2];第二,多个处理器核会对竞争访问的共享数据分别构造本地副本,片上Cache空间的整体有效利用率会进一步降低,容量失效问题更加突出,由此导致的片外存储访问开销不容忽视;第三,随着应用程序工作集规模的不断增大,单纯依靠本地私有Cache Bank的容量往往无法满足片上存储需求.因此,改善多核处理器私有Cache面临的容量失效问题对于提升访存系统性能至关重要.
1 问题来源与相关研究
多核处理器私有Cache间可以通过溢出协作机制(Cooperative Caching,CC)[3]来实现核间容量共享,它的主要思想是:允许本地Cache将替换出来的数据块保存在其他处理器的Cache空间中,当再次访问该块时可以从远程L2 Cache获得数据,避免部分片外存储访问.传统溢出机制的不足之处在于[3]:第一,溢出操作存在一定的盲目性,没有考虑多核上各线程对访存容量的需求差异.第二,依赖于集中式的一致性引擎来控制溢出操作,在大规模多核系统中应用时可扩展性欠佳