I have used objdump
to disassemble all the functions in a compiled library file and written the output to a text
file. In the text
file the output of function called clear_bit
is as follows.
Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit: 0: 80 b5 84 b0 addlt r11, r4, r0, lsl #11 4: 01 46 03 90 andls r4, r3, r1, lsl #12 8: 03 98 00 22 andhs r9, r0, #196608 c: 02 91 11 46 ldrmi r9, [r1], -r2, lsl #2 10: ff f7 fe ff <unknown> 14: 01 90 ff e7 ldrb r9, [pc, r1]! 18: 01 98 04 b0 andlt r9, r4, r1, lsl #16 1c: 80 <unknown> 1d: bd <unknown>
The output of an another function set_bit
is as follows-:
Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit: 0: 80 b5 84 b0 addlt r11, r4, r0, lsl #11 4: 01 46 03 90 andls r4, r3, r1, lsl #12 8: 03 98 01 22 andhs r9, r1, #196608 c: 02 91 11 46 ldrmi r9, [r1], -r2, lsl #2 10: ff f7 fe ff <unknown> 14: 01 90 ff e7 ldrb r9, [pc, r1]! 18: 01 98 04 b0 andlt r9, r4, r1, lsl #16 1c: 80 <unknown> 1d: bd <unknown>
Similar to the above two functions, this output.txt
contains disassembly of more than 100 such functions. However, what I need to achieve here is to extract only the hex byte values [80,b5,84,b0,01,..,b0,80,bd]
that are respective to each and every function without assembly instructions, function names, offsets etc. I am trying to extract these byte sequences with corresponding to each function without as a single sequence in order to develop a model in machine learning. Following is what I am expecting for only two functions.(Comments are just for understanding purpose I don’t need any of those in my expected output)
// byte sequence related to first function 80 b5 84 b0 01 46 03 90 03 98 00 22 02 91 11 46 ff f7 fe ff 01 90 ff e7 01 98 04 b0 80 bd // byte sequence related to second function separated by a line 80 b5 84 b0 01 46 03 90 03 98 01 22 02 91 11 46 ff f7 fe ff 01 90 ff e7 01 98 04 b0 80 bd
I used xxd -g 1
command but it gives me a sequence of bytes as follows with the offsets, some other values at the right of the byte values and seems like it contains disassembly of all the sections.(Not only the code in the text section).
00000000: 21 3c 61 72 63 68 3e 0a 2f 20 20 20 20 20 20 20 !<arch>./ 00000010: 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 0 00000020: 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20 0 0 00000030: 30 20 20 20 20 20 20 20 34 37 33 32 34 30 20 20 0 473240 00000040: 20 20 60 0a 00 00 1c 8c 00 07 aa ea 00 07 aa ea `............. 00000050: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................ 00000060: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................ 00000070: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................ 00000080: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................ 00000090: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................ 000000a0: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................ 000000b0: 00 08 1a 1a 00 08 1a 1a 00 08 1a 1a 00 08 1a 1a ................ 000000c0: 00 08 1a 1a 00 08 1a 1a 00 08 3a ee 00 08 3a ee ..........:...:.
I have been trying different tools and gone through other similar stack overflow questions but have failed so far. I don’t know whether I am using xxd
in a wrong manner, or else there are other tools to achieve my goal. Any help would be highly appreciated. Thank you!
Advertisement
Answer
Would you please try the following:
# fold $str, print and clear flush() { if [[ -n $str ]]; then fold -w 69 <<< "$str" echo str="" fi } header='^Disassembly of section' body='^[[:blank:]]*[0-9a-fA-f]+:[[:blank:]]+(([0-9a-fA-f]{2} )+)' while IFS= read -r line; do if [[ $line =~ $header ]]; then flush echo "// $line" elif [[ $line =~ $body ]]; then # concatenate the byte sequence on $str str+="${BASH_REMATCH[1]}" fi done < output.txt flush
output.txt (as an input to the script above):
Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit: 0: 80 b5 84 b0 addlt r11, r4, r0, lsl #11 4: 01 46 03 90 andls r4, r3, r1, lsl #12 8: 03 98 00 22 andhs r9, r0, #196608 c: 02 91 11 46 ldrmi r9, [r1], -r2, lsl #2 10: ff f7 fe ff <unknown> 14: 01 90 ff e7 ldrb r9, [pc, r1]! 18: 01 98 04 b0 andlt r9, r4, r1, lsl #16 1c: 80 <unknown> 1d: bd <unknown> Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit: 0: 80 b5 84 b0 addlt r11, r4, r0, lsl #11 4: 01 46 03 90 andls r4, r3, r1, lsl #12 8: 03 98 01 22 andhs r9, r1, #196608 c: 02 91 11 46 ldrmi r9, [r1], -r2, lsl #2 10: ff f7 fe ff <unknown> 14: 01 90 ff e7 ldrb r9, [pc, r1]! 18: 01 98 04 b0 andlt r9, r4, r1, lsl #16 1c: 80 <unknown> 1d: bd <unknown>
Result:
// Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit: 80 b5 84 b0 01 46 03 90 03 98 00 22 02 91 11 46 ff f7 fe ff 01 90 ff e7 01 98 04 b0 80 bd // Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit: 80 b5 84 b0 01 46 03 90 03 98 01 22 02 91 11 46 ff f7 fe ff 01 90 ff e7 01 98 04 b0 80 bd
- It detects the header line and the body (byte values) lines by using regex.
- If the body line is found, it extracts the byte sequence by the regex and store it in the bash variable
${BASH_REMATCH[1]}
. - At the end of each section, it prints out the sequence by putting newlines at the designated width.
Hope this is what you want.