Skip to content
Advertisement

Differing CPUID usage from high-level languages

I’m attempting to utilize an x86 ASM function that requires certain processor architecture. I understand that I need to check a specific bit after calling “CPUID standard function 01H“. Below is a C implementation from the CPUID Wikipedia page for calling CPUID:

#include <stdio.h>

int main() {
    int i;
    unsigned int index = 0;
    unsigned int regs[4];
    int sum;
    __asm__ __volatile__(
#if defined(__x86_64__) || defined(_M_AMD64) || defined (_M_X64)
        "pushq %%rbx     nt" /* save %rbx */
#else
        "pushl %%ebx     nt" /* save %ebx */
#endif
        "cpuid            nt"
        "movl %%ebx ,%[ebx]  nt" /* write the result into output var */
#if defined(__x86_64__) || defined(_M_AMD64) || defined (_M_X64)
        "popq %%rbx nt"
#else
        "popl %%ebx nt"
#endif
        : "=a"(regs[0]), [ebx] "=r"(regs[1]), "=c"(regs[2]), "=d"(regs[3])
        : "a"(index));
    for (i=4; i<8; i++) {
        printf("%c" ,((char *)regs)[i]);
    }
    for (i=12; i<16; i++) {
        printf("%c" ,((char *)regs)[i]);
    }
    for (i=8; i<12; i++) {
        printf("%c" ,((char *)regs)[i]);
    }
    printf("n");
}

Though the Linux kernel uses the function below:

static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
                                unsigned int *ecx, unsigned int *edx)
{
        /* ecx is often an input as well as an output. */
        asm volatile("cpuid"
            : "=a" (*eax),
              "=b" (*ebx),
              "=c" (*ecx),
              "=d" (*edx)
            : "0" (*eax), "2" (*ecx));
}

Which one is better? Other they essentually equivalent?

Advertisement

Answer

As Jester says, in GNU C the cpuid.h wrapper intrinsic is probably your best bet.


There’s also __builtin_cpu_supports("popcnt") or "avx" or whatever, which works after you call __builtin_cpu_init(). Only the really major feature-bits are supported, though. For example, the docs don’t mention the feature-bit for rdrand, so __builtin_cpu_supports("rdrand") probably doesn’t work.


Custom versions:

The implementation from Linux can inline with no wasted instructions, and it looks well-written, so there’s no reason to use anything else. It’s remotely possible that you might get a complaint about not being able to satisfy the "=b" constraint; if so see below for what clang’s cpuid.h does. (But I think that’s never necessary and the result of a documentation mistake).

It doesn’t actually need volatile, though, if you’re using it for the values produced rather than the serializing effect on the pipeline: Running CPUID with the same inputs will give the same result, so we can let the optimizer move it around or hoist it out of loops. (So it runs fewer times). This is probably not helpful because normal code won’t use it in a loop in the first place, though.


The source for clang’s implementation of cpuid.h does some weird stuff, like preserving %rbx because apparently some x86-64 environments might not be able to satisfy a constraint that uses %rbx as an output operand? The comment is /* x86-64 uses %rbx as the base register, so preserve it. */, but I have no idea what they’re talking about. If anything x86-32 PIC code in the SysV ABI uses %ebx for a fixed purpose (as a pointer to the GOT), but I don’t know about anything like that for x86-64. Perhaps that code is motivated by a mistake in the ABI documentation? See HJ Lu’s mailing list post about it.


Most importantly, the first version in the question (inside main()) is broken because it clobbers the red-zone with push.

To fix it, just tell the compiler the result will be in ebx (with "=b"), and let it worry about saving/restoring ebx/rbx at the start/end of the function.

Advertisement