Here is the strcmp
function that i found in the glibc:
int STRCMP (const char *p1, const char *p2) { const unsigned char *s1 = (const unsigned char *) p1; const unsigned char *s2 = (const unsigned char *) p2; unsigned char c1, c2; do { c1 = (unsigned char) *s1++; c2 = (unsigned char) *s2++; if (c1 == '') return c1 - c2; } while (c1 == c2); return c1 - c2; }
This is a pretty simple function where the body of while
initiates c1
and c2
with the value of *s1
and *s2
and continues till either c1
is nul
or the values of c1
and c2
are equal, then returns the difference between c1
and c2
.
What i didn’t understand is the use of s1
and s2
variables. I mean other than the fact that they are unsigned char
they are also const
like the 2 arguments p1
and p2
, so why not just use the p1
and p2
inside the body of while and cast them ? Does in this case using those 2 extra variables make the function somehow more optimized? because here is the same function for FreeBSD I found on github:
int strcmp(const char *s1, const char *s2) { while (*s1 == *s2++) if (*s1++ == '') return (0); return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1)); }
In their version they didn’t even bother using any extra variables.
Thanks in advance for your answers.
PS: I did search on the internet about this specific fact before asking here, but i didn’t got anything.
I would also like to know if there are any particular reason why glibc
used those extra variables instead of casting the parameters p1
and p2
directly inside while
.
Advertisement
Answer
What i didn’t understand is the use of s1 and s2 variables. I mean other than the fact that they are unsigned char they are also const like the 2 arguments p1 and p2, so why not just use the p1 and p2 inside the body of while and cast them ?
For readability; to make it easier for us humans to maintain the code.
If you look at glibc sources, the code tends to readability rather than concise expressions. It seems to be a good policy, because it has kept it relevant and vibrant (actively maintained) for over 30 years now.
Does in this case using those 2 extra variables make the function somehow more optimized?
No, not at all.
I would also like to know if there are any particular reason why glibc used those extra variables instead of casting the parameters p1 and p2 directly inside while.
For readability only.
The authors know that the C compiler used should be able to optimize this code just fine. (And it is easy to prove that is the case, just by looking at the code compiler generateds. For GCC, you can use the -S
option, or you can use the binutils’ objdump -d
to examine an object file or a binary executable.)
Note that the casts to unsigned char
are required for the exact same reasons as they are for isspace()
, isalpha()
et cetera: the character codes compared must be treated as unsigned char
for correct results.