From C to assembler - example: length of string

This is an account of the Friday Jan 29, 2010 class example.

a string is an array of characters (bytes), identified by starting address, and continuing until the char NUL (0x00) is encountered.
In C, declare a string as follows:

char  mystring[] = "Ace";

This sets aside 4 bytes of storage. mystring is the address of this area, mystring[0] is the 'A', and mystring[3] is the NUL. The length of this string is 3 (we don't count the NUL).

If we wish to instruct a computer to compute the length of mystring, we need to program a loop to look for the NUL, loading each character and counting as we go.
Here is the code for such a loop:
int count = 0;
char letter = mystring[count];
while (letter > 0)
{ count = count+1;
letter = mystring[count];
}

Tricky stuff

C has a lot of "lazy programmer" features, of interest to us are:

  1. while and if statements test an integer value for 0 representing FALSE, non 0 as true. Thus we could have written while(letter)
  2. Assignment, like other operations, returns a value, which can be used further. For instance, to test in a while, such as while(letter=mystring[count])
  3. ++ is short for "increment", so:
    ++count is exactly equivalent to count = count+1. It increments count and returns the new value
    count++ also increments count, but only after returning the old value. This postincrement is very useful.
  4. Addresses, also called pointers, can be dereferenced with the operator *  So  *mystring is the 'A'

Using these features, we can shorten the above to:

int count = 0;
char letter;
while (letter = mystring[count])
count++;

We don't really need letter here, since we are not actually using the value in this loop. Also, it is common in C to use a pointer variable to hold the address of the characters, and we cah step through the array by incrementing the address in the pointer. A pointer is declared with a *, for example char* or int*. So a third way to count our letters is:

int count = 0;
char* next = mystring; // point to each character in turn
while (*next++)
count++;

Freaky! Let's look at what the while statement does:

  1. Get the character pointed to by next (*next) - save it temporarily, probably in a register.
  2. Increment the address (next++) by 1, since char's are 1 byte long.
  3. Test the character for zero (NUL) if so, exit the loop, if not, execute the loop body(count++;), and repeat
Now that hopefully we understand what is going on, we can do this in assembly language, see also length.a.
	li	$t0, 0		# t0 = count = 0
la $t1, mystring # t1 = next address of string, will increment
while01:
lb $t2, ($t1) # t2 = *next load character
add $t1, 1 # next++ point to next character
beqz $t2, end01 # if at NUL, get out of loop
add $t0, 1 # count++ count the (non-zero) character
j while01 # repeat the loop
end01: # out of the loop

Ahhh, some new instructions here. To form a loop, we need to modify the flow of control, by changing the PC (program counter), to the address of a label


To compile and execute:

the command gcc (or g++) takes care of all the steps of compiling a C program, and creates an executable file, by default it is named a.out
Naturally, if there are syntax errors, you'll have to correct those and recompile.
To execute, use the name of the program, with the proper path, dot(.) indicates the current directory.
(System programs, like gcc, are found "automatically" by looking in certain directories named in the $PATH environment variable, your home diretory should never be included in this, however.)

gcc myprogram.c
./a.out
--or--
gcc -o myprogram myprogram.c
./myprogram

Back to CSC 116
Home page of Lin Jensen