I am trying to figure out the event to use with the perf stat command to count L3 cache accesses on an AMD Zen 2 processor. As per the PPR (http://developer.amd.com/wordpress/media/2017/11/54945_PPR_Family_17h_Models_00h-0Fh.pdf), section 2.1.13.4.1, page 168, the event is x01 and the umask is x80 for “[L3 Cache Accesses] (L3RequestG1)”. From what I understand, the event to use in perf stat command would thus be r8001. But the following command always returns the count as zero no matter what load I run:
perf stat -a -e r8001 — sleep 10
Performance counter stats for ‘system wide’:
0 r8001
10.001105322 seconds time elapsed
Am I misinterpreting the PPR or does [L3 Cache Accesses] (L3RequestG1) mean something else?
Also, is there a way to specify the slice of L3 cache to monitor for events in perf as most of the newer architectures with high core counts have multiple L3 slices.
Advertisement
Answer
The L3 cache events can only be counted on the L3 PMU as clearly specified in both the physical mnemonic (L3PMCx01
) and the logical mnemonic (Core::X86::Pmc::L3::L3RequestG1
) of the event you want to measure. The L3 PMU is formally called L3PMC. This is similar to the cbox PMUs on Intel processors.
The default PMU in perf for raw events is cpu
, which is the name the perf_events subsystem gives to the core PMU. An event specified using a raw event code without an explicit PMU, such as r8001, is equivalent to cpu/r8001/. The core event 0x001 represents the event Core::X86::Pmc::Core::FpSchedEmpty
and the umask 0x80 is undefined for this event (see Section 2.1.15.4.1). So you’re counting an undefined event. In this case, if the event happened to be implemented but not documented, then the event count may not be zero depending on whether it occurs during the execution of the program being profiled. Otherwise, the event count would be zero. perf_events doesn’t stop you from counting undefined events.
Starting with upstream kernel version v5.4-rc1, the L3PMC is supported in perf_events under the name amd_l3
. To determine whether you’re using a kernel that supports this PMU, check whether it’s enumerated using the command ls /sys/devices/*/format
. If not supported, then you can’t measure the L3 events on that kernel through perf.
If amd_l3
is supported, you have to explicitly specify the PMU as in amd_l3/r8001/
or amd_l3/event=0x01,umask=0x80/
to have the event counted on the right PMU. Or you can just use the perf event name l3_request_g1.caching_l3_cache_accesses
.
Do you know what the event L3RequestG1
represents? The documentation only describes it as “Caching: L3 cache accesses,” which isn’t very meaningful. It seems to me that the types of transactions it counts are a subset of those covered by the event L3LookupState
. Table 19 in Section 2.1.15.2 says that L3 accesses and misses should be counted using rFF04 (L3LookupState
) and r0106 (L3CombClstrState
), respectively. Don’t blindly expect that any of these events actually count whatever you want to measure.
The PPR you linked is not for any Zen2 processors, it’s for some Zen and Zen+ processors (specifically models 00h-0Fh). You need to know the processor model and family to locate the right PPR.