Tag: cpu

why memcpy is slower than copying data in bytes granularity?

I write three different codes to copy data from a 4GB buffer to another 4GB buffer. I measure their bandwidth and the cache miss with perf stat. The code is shown below: Compiling it with gcc memcpy-test.c -o memcpy-test. The first one uses memcpy to copy memcpy_sz bytes data for each time. I test this with 8B, 64B, 4KB, 512KB,

understand sysstat sar memory output

cpu linux memory performance sar

I’m preparing for more traffic in the days to come, and I want to be sure server can handle it. Running sar -q, the load of “3.5” doesn’t seem much on 32 CPU architecture: However, I’m not sure about the memory. Running sar -r shows 98.5% for the %memused and only 13.60 for %commit: running htop seems OK too: 14.9G/126G.

definition of linux perf cache-misses event?

cpu linux perf performancecounter profiling

I am trying to use linux perf to profile cache performance. perf list shows there is a cache-misses event. However, what’s the definition of this “cache-misses” event? Is it one of L1D/L1i cache, L2 cache or L3 cache? Thanks! Answer The cache-misses event corresponds to the misses in the last level cache (LLC). Note that this is an architectural performance

Estimate Core capacity required based on load?

cpu cpu-architecture cpu-usage linux performance

I have quad core ubuntu system. say If I see the load average as 60 in last 15 mins during peak time. Load average goes to 150 as well. This loads happens generally only during peak time. Basically I want to know if there is any standard formula to derive the number of cores ideally required to handle the given

Detect if running on a device with heterogeneous CPU architecture

android cpu linux linux-kernel root

I’m very specific on this one. I need to know if the device has a CPU which has heterogeneous cores like ARM’s big.LITTLE technology, for instance, a set of 4 ARM Cortex-A53 + another set of 4 more powerfull ARM Cortex-A72, totaling 8 cores, basically 2 processors in the same chip. The processors model does not really matter. What I’m

Is it safe to use all 4 cores of your pc to train a machine learning model

cpu cpu-usage linux machine-learning processor

I am training a ml model on my ubuntu 16.04 . I wrote a code that utilizes all 4 cores of my pc. I doubt if this would lead to some sort of crash Using htop command on terminal shows me my 100% usage of all 4 cores along with many information including /usr/lib/xorg/Xorg -core:0 …….. no listen Answer It

Get hardware information from /proc filesytem in Linux

c++ cpu disk linux memory

I use execv to run lshw command to get the CPU, disk, and memory in C code. But I would like to search another solution to get these information from /proc or any other existed data. Have any suggestion? Here is my code: Linux command: $ sudo lshw -short -c disk -c processor -c memory I have two questions: Where

Track down high CPU load average

amazon-ec2 cpu devops linux unix

Trying to understand what’s going on with my server. It’s a 2 cpu server, so: While on load avergae, queue is showing ~8 : So You can assume, load is really high and things are pailing up, there is some load on the system and it’s not just a spike. However, Looking at top cpu consumers: Results of free command:

Ada program works in Linux but not in GPS Windows 10

ada affinity cpu linux windows

Thanks in advance for any help. I am currently doing some beginner work on ada programming and I have installed GNAT Programming Studio (GPS) from http://libre.adacore.com/download/configurations# I have Windows 10 64-bits. I was given the following code at school: I opened the file in GPS, built it (no errors) and ran it but it doesn’t show any printed output. I

Time spends in CPU faster than in reality

c++ cpu ctime gcc linux

I am wondering why my entire application runs in less than 8 seconds while the time obtained from clock_gettime is 19.3468 seconds which is more than two times as much as what happens in reality. Where is the problem from? Update: I am not using any OpenMP explicitly. Answer CLOCK_MONOTONIC should be used if you want to measure total elapsed