Tag: perf

Counting L3 cache access event on Amd Zen 2 processors

I am trying to figure out the event to use with the perf stat command to count L3 cache accesses on an AMD Zen 2 processor. As per the PPR (http://developer.amd.com/wordpress/media/2017/11/54945_PPR_Family_17h_Models_00h-0Fh.pdf), section 2.1.13.4.1, page 168, the event is x01 and the umask is x80 for “…

Is linux perf accurate for measuring cache misses for multithread C program?

linux multithreading perf

Can linux perf measure cache misses for multithread program, or it can only report the result for master thread? I used it on a C program using pthread, it seemed the cache miss number was lower than the expected number. Answer Yes, perf stat is an accurate total across all threads. (Unless your CPU has an er…

Weird Backtrace in Perf

call-graph linux perf performancecounter trace

I used the following command to extract backtraces leading to user level L3-misses in a simple evince benchmark: As it is clear, the sampling period is quite large (10000 events between consecutive samples). For this experiment, the output of perf script had some samples similar to this one: At the bottom of …

definition of linux perf cache-misses event?

cpu linux perf performancecounter profiling

I am trying to use linux perf to profile cache performance. perf list shows there is a cache-misses event. However, what’s the definition of this “cache-misses” event? Is it one of L1D/L1i cache, L2 cache or L3 cache? Thanks! Answer The cache-misses event corresponds to the misses in the las…

how to get rid of the “unknown” section in the perf

c++ linux perf

what I did is: then I get a tiny part of “unknown”, like looks this is due to libc ‘malloc’ call. then I write a program on the same machine to test it. then I did the same thing as above, there is no “unknown” section. how to explain/fix this? Answer The [unknown] block in…

Linux perf_events annotation frame pointer confusion

disassembly linux perf x86-64

I ran sudo perf record -F 99 find / followed by sudo perf report and selected “Annotate fdopendir” and here are the first seven instructions: push %rbp push %rbx mov %edi,%esi mov %edi,%ebx mov $0x1,%edi sub $0xa8,%rsp mov %rsp,%rbp The first instruction appears to be saving the caller’s bas…

How does perf associate events to functions?

linux linux-kernel perf performance

More precisely how does the perf tool associate PMU events to functions i already realized that when the kernel perf subsystem records the event counters it also records the Program Counter (PC) so it can associate the count to a function. However to really get fine grain result, you need to sample the counte…

Which perf events can use PEBS?

intel linux perf performance performancecounter

I want to understand which events can have the precise modifier on my CPU (Sandy Bridge). Intel Software Developer’s Manual (Table 18-32. PEBS Performance Events for Intel Microarchitecture Code Name Sandy Bridge) contains only the following events: INST_RETIRED, UOPS_RETIRED, BR_INST_RETIRED, BR_MISP_R…

How do I use a newer perf tool front end with a record from an older perf version

c++ linux linux-kernel perf

I am running perf record on an older version of the kernel on an ARM board. The kernel version is 3.18.21-rt19 The perf version on the board is similarly perf version 3.18.21. While I can record and use the report feature on this perf, the TUI for report on this version is quite awful/non-existent. Instead of…

Use linux perf utility to report counters every second like vmstat

linux perf performancecounter

There is perf command-linux utility in Linux to access hardware performance-monitoring counters, it works using perf_events kernel subsystems. perf itself has basically two modes: perf record/perf top to record sampling profile (the sample is for example every 100000th cpu clock cycle or executed command), an…