Skip to content
Advertisement

Getting only hexa decimal byte values from the disassembly of a compiled library

I have used objdump to disassemble all the functions in a compiled library file and written the output to a text file. In the text file the output of function called clear_bit is as follows.

Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit:

   0:   80 b5 84 b0     addlt   r11, r4, r0, lsl #11
   4:   01 46 03 90     andls   r4, r3, r1, lsl #12
   8:   03 98 00 22     andhs   r9, r0, #196608
   c:   02 91 11 46     ldrmi   r9, [r1], -r2, lsl #2
  10:   ff f7 fe ff  <unknown>
  14:   01 90 ff e7     ldrb    r9, [pc, r1]!
  18:   01 98 04 b0     andlt   r9, r4, r1, lsl #16
  1c:   80  <unknown>
  1d:   bd  <unknown>

The output of an another function set_bit is as follows-:

Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit:

   0:   80 b5 84 b0     addlt   r11, r4, r0, lsl #11
   4:   01 46 03 90     andls   r4, r3, r1, lsl #12
   8:   03 98 01 22     andhs   r9, r1, #196608
   c:   02 91 11 46     ldrmi   r9, [r1], -r2, lsl #2
  10:   ff f7 fe ff  <unknown>
  14:   01 90 ff e7     ldrb    r9, [pc, r1]!
  18:   01 98 04 b0     andlt   r9, r4, r1, lsl #16
  1c:   80  <unknown>
  1d:   bd  <unknown>

Similar to the above two functions, this output.txt contains disassembly of more than 100 such functions. However, what I need to achieve here is to extract only the hex byte values [80,b5,84,b0,01,..,b0,80,bd] that are respective to each and every function without assembly instructions, function names, offsets etc. I am trying to extract these byte sequences with corresponding to each function without as a single sequence in order to develop a model in machine learning. Following is what I am expecting for only two functions.(Comments are just for understanding purpose I don’t need any of those in my expected output)

 // byte sequence related to first function
 80 b5 84 b0 01 46 03 90 03 98 00 22 02 91 11 46 ff f7 fe ff  01 90 ff 
 e7 01 98 04 b0 80 bd 

 // byte sequence related to second function separated by a line
 80 b5 84 b0 01 46 03 90 03 98 01 22 02 91 11 46 ff f7 fe ff 01 90 ff 
 e7 01 98 04 b0 80 bd

I used xxd -g 1 command but it gives me a sequence of bytes as follows with the offsets, some other values at the right of the byte values and seems like it contains disassembly of all the sections.(Not only the code in the text section).

00000000: 21 3c 61 72 63 68 3e 0a 2f 20 20 20 20 20 20 20  !<arch>./       
00000010: 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20          0       
00000020: 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20      0     0     
00000030: 30 20 20 20 20 20 20 20 34 37 33 32 34 30 20 20  0       473240  
00000040: 20 20 60 0a 00 00 1c 8c 00 07 aa ea 00 07 aa ea    `.............
00000050: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea  ................
00000060: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea  ................
00000070: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea  ................
00000080: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea  ................
00000090: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea  ................
000000a0: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea  ................
000000b0: 00 08 1a 1a 00 08 1a 1a 00 08 1a 1a 00 08 1a 1a  ................
000000c0: 00 08 1a 1a 00 08 1a 1a 00 08 3a ee 00 08 3a ee  ..........:...:.

I have been trying different tools and gone through other similar stack overflow questions but have failed so far. I don’t know whether I am using xxd in a wrong manner, or else there are other tools to achieve my goal. Any help would be highly appreciated. Thank you!

Advertisement

Answer

Would you please try the following:

# fold $str, print and clear
flush() {
    if [[ -n $str ]]; then
        fold -w 69 <<< "$str"
        echo
        str=""
    fi
}

header='^Disassembly of section'
body='^[[:blank:]]*[0-9a-fA-f]+:[[:blank:]]+(([0-9a-fA-f]{2} )+)'
while IFS= read -r line; do
    if [[ $line =~ $header ]]; then
        flush
        echo "// $line"
    elif [[ $line =~ $body ]]; then
        # concatenate the byte sequence on $str
        str+="${BASH_REMATCH[1]}"
    fi
done < output.txt
flush

output.txt (as an input to the script above):

Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit:

   0:   80 b5 84 b0     addlt   r11, r4, r0, lsl #11
   4:   01 46 03 90     andls   r4, r3, r1, lsl #12
   8:   03 98 00 22     andhs   r9, r0, #196608
   c:   02 91 11 46     ldrmi   r9, [r1], -r2, lsl #2
  10:   ff f7 fe ff  <unknown>
  14:   01 90 ff e7     ldrb    r9, [pc, r1]!
  18:   01 98 04 b0     andlt   r9, r4, r1, lsl #16
  1c:   80  <unknown>
  1d:   bd  <unknown>

Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit:

   0:   80 b5 84 b0     addlt   r11, r4, r0, lsl #11
   4:   01 46 03 90     andls   r4, r3, r1, lsl #12
   8:   03 98 01 22     andhs   r9, r1, #196608
   c:   02 91 11 46     ldrmi   r9, [r1], -r2, lsl #2
  10:   ff f7 fe ff  <unknown>
  14:   01 90 ff e7     ldrb    r9, [pc, r1]!
  18:   01 98 04 b0     andlt   r9, r4, r1, lsl #16
  1c:   80  <unknown>
  1d:   bd  <unknown>

Result:

// Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit:
80 b5 84 b0 01 46 03 90 03 98 00 22 02 91 11 46 ff f7 fe ff 01 90 ff
e7 01 98 04 b0 80 bd

// Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit:
80 b5 84 b0 01 46 03 90 03 98 01 22 02 91 11 46 ff f7 fe ff 01 90 ff
e7 01 98 04 b0 80 bd
  • It detects the header line and the body (byte values) lines by using regex.
  • If the body line is found, it extracts the byte sequence by the regex and store it in the bash variable ${BASH_REMATCH[1]}.
  • At the end of each section, it prints out the sequence by putting newlines at the designated width.

Hope this is what you want.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement