Skip to content
Advertisement

Finding the number of bytes of entered string at runtime

I’m new at learning assembly x86. I have written a program that asks the user to enter a number and then checks if it’s even or odd and then print a message to display this information. The code works fine but it has one problem. It only works for 1 digit numbers:

; Ask the user to enter a number from the keyboard
; Check if this number is odd or even and display a message to say this


section .text
   global _start          ;must be declared for linker (gcc)

_start:                  ;tell linker entry point

  ;Display 'Please enter a number'
  mov  eax, 4             ; sys_write
  mov  ebx, 1             ; file descriptor: stdout
  mov  ecx, msg1          ; message to be print
  mov  edx, len1          ; message length
  int  80h                ; perform system call

  ;Enter the number from the keyboard
  mov  eax, 3            ; sys_read
  mov  ebx, 2            ; file descriptor: stdin
  mov  ecx, myvariable   ; destination (memory address)
  mov  edx, 4            ; size of the the memory location in bytes
  int  80h               ; perform system call


  ;Convert the variable to a number and check if even or odd
  mov eax, [myvariable]
  sub eax, '0' ;eax now has the number value
  and eax, 01H
  jz isEven

  ;Display 'The entered number is odd'
  mov  eax, 4             ; sys_write
  mov  ebx, 1             ; file descriptor: stdout
  mov  ecx, msg2          ; message to be print
  mov  edx, len2          ; message length
  int  80h
  jmp outProg

isEven:
 ;Display 'The entered number is even'
  mov  eax, 4             ; sys_write
  mov  ebx, 1             ; file descriptor: stdout
  mov  ecx, msg3          ; message to be print
  mov  edx, len3          ; message length
  int  80h

outProg:
  mov   eax,1         ;system call number (sys_exit)
  int   0x80          ;call kernel

section .data
  msg1 db "Please enter a number: ", 0xA,0xD
  len1 equ $- msg1

  msg2 db "The entered number is odd", 0xA,0xD
  len2 equ $- msg2

  msg3 db "The entered number is even", 0xA,0xD
  len3 equ $- msg3

segment .bss
  myvariable resb 4

It does not work properly for numbers with more than 1 digit because it only takes in account the first byte(first digit) of the entered number so it only checks that. So I would need a way to find out how many digits(bytes) there are in the entered value that the user gives so I could do something like this: ;Convert the variable to a number and check if even or odd

mov eax, [myvariable+(number_of_digits-1)]

And only check eax which contains the last digit to see if it’s even or odd. Problem is I have no ideea how could I check how many bytes are in my number after the user has entered it. I’m sure it’s something very easy yet I have not been able to figure it out, nor have I found any solutions on how to do this on google. Please help me with this. Thank you!

Advertisement

Answer

You actually want movzx eax, byte [myvariable+(number_of_digits-1)] to only load 1 byte, not a dword. Or just directly test memory with test byte [...], 1. You can skip the sub because '0' is an even number; subtracting to convert from ASCII code to integer digit doesn’t change the low bit.

But yes, you need least significant digit, the last (highest address) in printing / reading order.

A read system call returns the number of bytes read in EAX. (Or negative error code). This will include a newline if the user hit return, but not if the user redirected from a file that didn’t end with a newline. (Or if they submitted input on a terminal using control-d after typing some digits). The most simple and robust way would be to simply loop looking for the first non-digit in the buffer.

But the “clever” / fun way would be to check if [mybuffer + eax - 1] is a digit, and if so use it. Otherwise check the previous byte. (Or just assume there’s a newline and always check [mybuffer + eax - 2], the 2nd-last byte of what was read. (Or off the start of the buffer if the user just pressed return.)

(To efficiently check for an ASCII digit; sub al, '0' / cmp al, 9 / ja non_digit. See double condition checking in assembly / What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?)


Just for fun, here’s a more compact version that always just checks the 2nd-last byte of the read() input. (It doesn’t check for being a digit, and it reads outside the buffer for input lengths of 0 or 1, e.g. pressing control-D or return.) Also for read errors, e.g. redirect with strace ./oddeven <&- to close its stdin.

Note the interesting part:

  ; check if the low digit is even or odd
  mov    ecx, msg_even
  mov    edx, msg_odd                 ; these don't set flags and actually could be done after TEST
  test   byte [mybuf + eax - 2], 1    ; check the low bit of 2nd-last byte of the read input
  cmovnz ecx, edx

  ;Display selected message
  mov  eax, 4             ; sys_write
  mov  ebx, 1             ; file descriptor: stdout
  mov  edx, msg_odd.len
  int  80h                ; write(1, digit&1 ? msg_odd : msg_even, msg_odd.len)

I used cmov, but a simple branch over a mov ecx, msg_odd would work. You don’t need to duplicate the whole setup for the system call, just run it with the right pointer and length. (ECX and EDX values, and I padded the odd message with a space so I could use the same length for both.)

And this is a homebrewed static_assert(msg_odd.len == msg_even.len), using NASM’s conditional directives (https://nasm.us/doc/nasmdoc4.html). It’s not just a separate preprocessor like C has, it can use NASM numeric equ expressions.

%if msg_odd.len != msg_even.len
  ; homebrew assert with NASM preprocessor, since I chose to skip doing a 2nd cmov for the length
  %warn we assume both messages have the same length
%endif

The full thing. I outside of the part shown above, I just tweaked comments to sometimes simplify when I thought it was too redundant, and used meaningful label names.

Also, I put .rodata and .bss at the top because NASM complained about referencing msg_odd.len before it was defined. (You previously had your strings in .data, but read-only data should generally go in .rodata, so the OS can share those pages between runs of the same program because they stay clean.)

Other fixes:

  • Linux/Unix uses 0xa line endings, n not nr.
  • stdin is fd 0. 2 is stderr. (2 happens to work because terminal emulators normally run the shell with all 3 file descriptors referring to the same read+write open file description for the tty).
; Ask the user to enter a number from the keyboard
; Check if this number is odd or even and display a message to say this

section .rodata
  msg_prompt db "Please enter a number: ", 0xA
  .len equ $- msg_prompt

  msg_odd db  "The entered number is odd ", 0xA    ; padded with a space for same length as even
  .len equ $- msg_odd

  msg_even db "The entered number is even", 0xA
  .len equ $- msg_even

section .bss
  mybuf resb 128
  .len equ $ - mybuf


section .text
   global _start
_start:                  ; ld defaults to starting at the top of the .text section, but exporting a symbol silences the warning and can make GDB work more easily.

  ; Display prompt
  mov  eax, 4             ; sys_write
  mov  ebx, 1             ; file descriptor: stdout
  mov  ecx, msg_prompt
  mov  edx, msg_prompt.len
  int  80h                ; perform system call

  mov  eax, 3            ; sys_read
  xor  ebx, ebx          ; file descriptor: stdin
  mov  ecx, mybuf
  mov  edx, mybuf.len
  int  80h               ; read(0, mybuf, len)

; return value in EAX: negative for error, 0 for EOF, or positive byte count
; for this toy program, lets assume valid input ending with digitn

; the newline will be at [mybuf + eax - 1].  The digit before that, at [mybuf + eax - 2].
; If the user just presses return, we'll access before the end of mybuf, and may segfault if it's at the start of a page.

  ; check if the low digit is even or odd
  mov    ecx, msg_even
  mov    edx, msg_odd                 ; these don't set flags and actually could be done after TEST
  test   byte [mybuf + eax - 2], 1    ; check the low bit of 2nd-last byte of the read input
  cmovnz ecx, edx

  ;Display selected message
  mov  eax, 4             ; sys_write
  mov  ebx, 1             ; file descriptor: stdout
  mov  edx, msg_odd.len
  int  80h                ; write(1, digit&1 ? msg_odd : msg_even, msg_odd.len)

%if msg_odd.len != msg_even.len
  ; homebrew assert with NASM preprocessor, since I chose to skip doing a 2nd cmov for the length
  %warning  we assume both messages have the same length
%endif

  mov   eax, 1        ;system call number (sys_exit)
  xor   ebx, ebx
  int   0x80          ; _exit(0)

assemble + link with nasm -felf32 oddeven.asm && ld -melf_i386 -o oddeven oddeven.o

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement