I have this sample code:
#include <fcntl.h> #include <stdio.h> #include <unistd.h> int main(void){ printf("%lin",sizeof(char)); char mytext[20]; read(1,mytext,3); printf("%s",mytext); return 0; }
First run:
koray@koray-VirtualBox:~$ ./a.out 1 pp pp koray@koray-VirtualBox:~$
Well I think this is all expected as p is 1 byte long character defined in ASCII and I am reading 3 bytes. (2 p’s and Line break) In the terminal, again I see 2 characters.
Now let’s try with a character that is 2 bytes long:
koray@koray-VirtualBox:~$ ./a.out 1 ğ ğ
What I do not understand is, when I send the character ‘ğ’ to the memory pointed by mytext variable, 16 bits are written to that area. As ‘ğ’ is 11000100:10011110 in utf-8, these bytes are written.
My question is, when printing back to the standard out, how does C (or should I say the kernel?) know that, it should read 2 bytes and interpret as 1 character instead of 2 1-byte characters?
Advertisement
Answer
C doesn’t interpret it. Your program reads 2 bytes and outputs same 2 bytes without caring about what characters (or anything else) they are.
Your terminal encodes your input and reinterprets your output back as the same two byte character.