I write three different codes to copy data from a 4GB buffer to another 4GB buffer. I measure their bandwidth and the cache miss with perf stat. The code is shown below: Compiling it with gcc memcpy-test.c -o memcpy-test. The first one uses memcpy to copy memcpy_sz bytes data for each time. I test this with 8B, 64B, 4KB, 512KB,
Tag: cpu
understand sysstat sar memory output
I’m preparing for more traffic in the days to come, and I want to be sure server can handle it. Running sar -q, the load of “3.5” doesn’t seem much on 32 CPU architecture: However, I’m not sure about the memory. Running sar -r shows 98.5% for the %memused and only 13.60 for %commit: running htop seems OK too: 14.9G/126G.
definition of linux perf cache-misses event?
I am trying to use linux perf to profile cache performance. perf list shows there is a cache-misses event. However, what’s the definition of this “cache-misses” event? Is it one of L1D/L1i cache, L2 cache or L3 cache? Thanks! Answer The cache-misses event corresponds to the misses in the last level cache (LLC). Note that this is an architectural performance
Estimate Core capacity required based on load?
I have quad core ubuntu system. say If I see the load average as 60 in last 15 mins during peak time. Load average goes to 150 as well. This loads happens generally only during peak time. Basically I want to know if there is any standard formula to derive the number of cores ideally required to handle the given
Detect if running on a device with heterogeneous CPU architecture
I’m very specific on this one. I need to know if the device has a CPU which has heterogeneous cores like ARM’s big.LITTLE technology, for instance, a set of 4 ARM Cortex-A53 + another set of 4 more powerfull ARM Cortex-A72, totaling 8 cores, basically 2 processors in the same chip. The processors model does not really matter. What I’m
Is it safe to use all 4 cores of your pc to train a machine learning model
I am training a ml model on my ubuntu 16.04 . I wrote a code that utilizes all 4 cores of my pc. I doubt if this would lead to some sort of crash Using htop command on terminal shows me my 100% usage of all 4 cores along with many information including /usr/lib/xorg/Xorg -core:0 …….. no listen Answer It
Get hardware information from /proc filesytem in Linux
I use execv to run lshw command to get the CPU, disk, and memory in C code. But I would like to search another solution to get these information from /proc or any other existed data. Have any suggestion? Here is my code: Linux command: $ sudo lshw -short -c disk -c processor -c memory I have two questions: Where
Track down high CPU load average
Trying to understand what’s going on with my server. It’s a 2 cpu server, so: While on load avergae, queue is showing ~8 : So You can assume, load is really high and things are pailing up, there is some load on the system and it’s not just a spike. However, Looking at top cpu consumers: Results of free command:
Ada program works in Linux but not in GPS Windows 10
Thanks in advance for any help. I am currently doing some beginner work on ada programming and I have installed GNAT Programming Studio (GPS) from http://libre.adacore.com/download/configurations# I have Windows 10 64-bits. I was given the following code at school: I opened the file in GPS, built it (no errors) and ran it but it doesn’t show any printed output. I
Time spends in CPU faster than in reality
I am wondering why my entire application runs in less than 8 seconds while the time obtained from clock_gettime is 19.3468 seconds which is more than two times as much as what happens in reality. Where is the problem from? Update: I am not using any OpenMP explicitly. Answer CLOCK_MONOTONIC should be used if you want to measure total elapsed