Why is an AMD Ryzen 2700x 2x slower than a 3-year-old laptop Intel i7-6820HQ with Python?

Question

I just finished installing a desktop computer based on an AMD Ryzen 2700x and 32GB RAM (running Ubuntu 18.04). At work, I have a 3-year-old laptop workstation with an Intel i7-6820HQ and 16GB RAM (running Windows 10). I installed Anaconda on both platforms and ran a custom Python code which relies heavily on basic numpy matrix operations. The code does

Accepted Answer

numpy matrix operationsIntel Skylake has significantly better FMA throughput (2 per clock 256-bit vector) than Ryzen (2 per clock 128-bit vector or 1 per clock 256-bit vector).  See https://agner.org/optimize/ for x86 microarch details.  And FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2 for a summary including Ryzen.With data hot in cache, which a well-optimized matmul can achieve with cache-blocking, a good matmul can bottleneck on FMA execution unit throughput.Or L1d SIMD load/store bandwidth, where Skylake > 2x Ryzen, being able to sustain close to 2x 256-bit load + 1x 256-bit store, while Ryzen can sustain 2x 128-bit cache accesses, up to one of which can be a store.So it&#8217;s totally reasonable for the single-threaded or per-core throughput for Intel to be twice that of a Ryzen core, for matmul / FMA throughput.Are you multi-threading to take advantage of all cores in each machine?  2700x is an 8-core CPU, while 6820HQ is a 4-core chip.If your workload can / is taking advantage of multiple cores, then maybe it&#8217;s an L3 cache bandwidth limitation that&#8217;s making the difference, assuming they&#8217;re both configured correctly and actually running at 3.6 / 3.7 GHz.  Or maybe there&#8217;s something creating a 4x per-core perf difference.

Advertisement

Answer