Skip to content
Advertisement

What parts of this HelloWorld assembly code are essential if I were to write the program in assembly?

I have this short hello world program:

#include <stdio.h>

static const char* msg = "Hello world";

int main(){
    printf("%sn", msg);
    return 0;
}

I compiled it into the following assembly code with gcc:

    .file   "hello_world.c"
    .section    .rodata
.LC0:
    .string "Hello world"
    .data
    .align 4
    .type   msg, @object
    .size   msg, 4
msg:
    .long   .LC0
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $16, %esp
    movl    msg, %eax
    movl    %eax, (%esp)
    call    puts
    movl    $0, %eax
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
    .section    .note.GNU-stack,"",@progbits

My question is: are all parts of this code essential if I were to write this program in assembly (instead of writing it in C and then compiling to assembly)? I understand the assembly instructions but there are certain pieces I don’t understand. For instance, I don’t know what .cfi* is, and I’m wondering if I would need to include this to write this program in assembly.

Advertisement

Answer

The absolute bare minimum that will work on the platform that this appears to be, is

        .globl main
main:
        pushl   $.LC0
        call    puts
        addl    $4, %esp
        xorl    %eax, %eax
        ret
.LC0:
        .string "Hello world"

But this breaks a number of ABI requirements. The minimum for an ABI-compliant program is

        .globl  main
        .type   main, @function
main:
        subl    $24, %esp
        pushl   $.LC0
        call    puts
        xorl    %eax, %eax
        addl    $28, %esp
        ret
        .size main, .-main
        .section .rodata
.LC0:
        .string "Hello world"

Everything else in your object file is either the compiler not optimizing the code down as tightly as possible, or optional annotations to be written to the object file.

The .cfi_* directives, in particular, are optional annotations. They are necessary if and only if the function might be on the call stack when a C++ exception is thrown, but they are useful in any program from which you might want to extract a stack trace. If you are going to write nontrivial code by hand in assembly language, it will probably be worth learning how to write them. Unfortunately, they are very poorly documented; I am not currently finding anything that I think is worth linking to.

The line

.section    .note.GNU-stack,"",@progbits

is also important to know about if you are writing assembly language by hand; it is another optional annotation, but a valuable one, because what it means is “nothing in this object file requires the stack to be executable.” If all the object files in a program have this annotation, the kernel won’t make the stack executable, which improves security a little bit.

(To indicate that you do need the stack to be executable, you put "x" instead of "". GCC may do this if you use its “nested function” extension. (Don’t do that.))

It is probably worth mentioning that in the “AT&T” assembly syntax used (by default) by GCC and GNU binutils, there are three kinds of lines: A line with a single token on it, ending in a colon, is a label. (I don’t remember the rules for what characters can appear in labels.) A line whose first token begins with a dot, and does not end in a colon, is some kind of directive to the assembler. Anything else is an assembly instruction.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement