CSc 116 notes

6) Instruction formats and addressing modes

We now know enough instructions to write reasonable programs. It is time to look at various ways to refer to data, and at how machine instructions are encoded. This will give us insights into what hardware designers have to do, and explain why they have chosen to omit certain reasonable (from the programmers point of view) instructions from the instruction set. We have already seen some of these conversions. First, lets look at MIPS instruction formats.

Instruction formats

register
immediate
jump

Addressing modes

immediate - value built in to the instruction
register - register used for data

Memory refrencing - used with load and store instructions

label - fixed address built in to the instruction
indirect - register contains the address

Base addressing - field of a record
Indexed addressing - element of an array

Instruction formats

To keep the MIPS processor simple and fast, all instructions fit in 32-bit words, and there are just 3 different instruction layouts. Decoding of instructions starts with the leftmost 6 bits, the operation code. This determines how the rest of the bits will be handled. The formats are:

Register

For operations using only registers as operands, the operation code is 000000

6 bits	5 bits	5 bits	5 bits	5 bits	6 bits
000000	source reg.	target reg.	Destination	shift amt.	function
`000000`	`01010`	`00111`	`00101`	`00000`	`100010`

The last line of the table gives as an example a subtract instruction, sub $5, $10, $7

3 five-bit fields identify the 3 registers used for the operation. Note the order is different from Assembly language, the 2 sources come first, then the destination. The "target" is usually the second source, but in some instructions it is the destination or has an alternate meaning.
Shift amount is used in the shift instructions we will meet in the next chapter.
Function determines just what is done with the registers, usually it is an arithmetic or logical (boolean) instruction

This diagram shows how data flows from registers to the ALU (Arithmetic Logic Unit) and the result is stored back into the destination register, for the instruction

add   $t3, $t1, $t2

Immediate

Some instructions include a constant value. This constant is included right in the instruction, so it is called an immediate value. In 32 bits, there is room for a 16 bit immediate, 6 bits of op. code, and 2 register numbers (5 bits each). A good example is add immediate (addi), which adds source register + immediate and stores in target register. Since the immediate value is shorter than the register, it is "sign-extended" to form a 32-bit signed number. Thus addi $t1,$t2, 0xfffe (shown in table) actually adds 0xfffffffe, which is -2.

6 bits	5 bits	5 bits	16 bits
op code	source reg	target reg	immediate value
`001000`	`01010`	`01001`	`1111 1111 1111 1110`

This layout is also used for load, store, and branch instructions. (see diagram for store)

In load and store, the immediate value is added to the source register to form the memory address of the data.
In branch, the immediate value is an offset (in words) from the present PC value.

Jump

The jump instructions (j & jal) use all 26 bits following the op code for the address to jump to. Since all instructions are word aligned, the last 2 bits of the address are always 0, so they are left out of the instruction. Upon execution, the 26 bit jump target is shifted left 2 bits, and stored in the PC, this causes the jump to take effect.

6 bits - op code	26 bits - jump target (words)
j	0x0040000c [nextCh]
`000010`	`0x0100003`

Effectively this is a 28-bit address, so jumps can "only" target the first 256 Mbytes of memory. This could be a problem for the operating system of a machine with more that this amount of memory. Such a machine would probably be manipulating massive amounts of data, and so could reserve the "low" memory for programs. Alternatively, compilers could be directed to use jump-register instructions or branch-always instructions instead of jumps.

OPerator and FUNCTion codes

From page 61

6 bits	op	funct	rt
- - -   - - - 	- - - - - -
000000		sll	bltz
000001	see rt		bgez
000010	j	srl
000011	jal	sra
000100	beq	sllv
000101	bne	
000110	blez	srlv
000111	bgtz
001000	addi	jr
001001	addiu	jalr
001010	slti	
001011	sltiu	
001100	andi	syscall
001101	ori	break
001110	xori	
001111	lui	
010000		mfhi
010001		mthi
010010		mflo
010011		mtlo

011000		mult
011001		multu
011010		div
011011		divu
			-rt-
100000	lb	add	bltzal
100001	lh	addu	bgezal
100010	lwl	sub
100011	lw	subu
100100	lbu	and
100101	lhu	or
100110	lwr	xor
100111		nor
101000	sb	
101001	sh	
101010	swl	slt
101011	sw	sltu

Addressing modes

There are a number of ways to refer to data, either as source operands, or destination locations for storage. This section starts with these different methods from the programmers point of view, and discusses how these are encoded in the MIPS instruction formats.

Immediate

Source operands can be constants. A constant value is usually encoded directly in the machine language instruction, so that it si available immediately after the instruction is decoded, without fetching it from memory. Understandably, this does not provide any location for storing data.

MIPS instructions encode immediate constants in the lower 16 bits of the immediate instruction layout. For constants larger than 16 bits, the assembler generates 2 machine instructions, the upper half of the constant is loaded into the $at register with the "load upper immediate" instruction.

Examples of instructions with immediate data:

addi $t0,t1,65
sub $t0,7 #assembled as:

addi $t0, $t0, -7

li $t3, 0x12345678 #assembled as

lui $at, 0x1234
ori $t3, $at, 0x5678 #puts it all together

bgez $t5, 16 #skip 4 instructions ahead if $t5 is non-negative (4 is encoded in immediate field)

The last example is also referred to as PC-relative addressing, because the immediate value is (shifted left 2 bits and) added to the PC when the branch is taken. The constant is often referred to as an offset or displacement. It is a signed number, so the new PC value can be before (a loop) or after the current instruction. This mode is very useful because the operating system can move the program to another part of memory without affecting branch instructions.

Most processors use PC-relative addressing for branch instructions. The number of bits used for the displacement limits the distance to the branch target. In older designs, such as the 6502 and 80x86, 8 bits are used, limiting the displacement to + or - 128 bytes before or after. This usually caused students some distress. With MIPS, the limit is 128Kbytes. Since branch instructions are usually used within procedures, this is unlikely to cause you any problems!

Register addressing

Registers may be used as sources and destinations of data. Access to registers is extremely fast, much faster than fetching data from memory. All the instructions listed above reference 1 or 2 registers as well as the immediate constant. In addition, many instructions reference registers exclusively. Most processors require at least one operand of arithmetic instructions to be in a register. Instructions with all operands in registers (or immediate) are the fastest to execute.

MIPS provides 32 registers, and encourages you to load your data into them and work with it there, resulting in very fast execution. Since it takes only 5 bits to specify a register, as opposed to 32 bits for a memory location, the register instruction layout can easily accomodate 3 register designations.

Examples of register instructions:

addu $t3, $t1, $t2 # add (unsigned) $t3 := $t1 + $t2
sub $t0, $t3 # subtract (signed) $t0 := $t0 - $t3

Memory addressing

With 32 bit addresses, a computer can access as much memory as you can afford to buy. Well, up to 4 Gigabytes. You can store lots of data there, and transfer it to and from the processor as needed. In return for large amounts of storage, access is slower. Typically an instruction will allow access to at most one memory address. MIPS, in common with other RISC processors and most supercomputers, restricts this to load and store instructions. There are several ways we could specify the address:

label - fixed address built in to the instruction

Also called direct addressing, the known address can be built into the instruction as a constant. Programers usually specify this address using a variable name or label in the data segment, the compiler or assembler assigns it a numeric value. Since it is a 32-bit value, a MIPS assembler will break it into two 16-bit immediate quantities, and assemble 2 machine instructions to do the job. Examples:

lw  $t0, MyNumber       #load the word into $t0
sb  $t9, firstInitial   #store rightmost 8 bits of $t9 in data segment

indirect - register contains the address

Besides holding data, a register can be used as a pointer, holding an address that points to the data. The data is transferred by putting the register;s contents on the address bus, the data is transferred on the data bus. This is called indirect addressing because the register itself is not the target, rather it points to the target in memory indirectly. A common use of this is to step through a string one character at a time. One first loads a register with the address of the string, and then uses it to access the characters in succession, incrementing the register each time. For example, suppose we have the string

catStr:  .asciiz   "cat"

The address can be loaded into a register with the load-address instruction (in machine language it is just like load immediate, except the constant is the address rather than data), and then refer to the characters indirectly

la  $t0, catStr
lb  $t1, ($t0)        # 'c' is now in $t1
addi      $t0, 1      # point to next character
lb  $t2, ($t0)        # 'a' is now in $t2

Base addressing

A variation of indirect addressing is useful for referring to fields of a record or struct. Suppose we store telephone numbers in a record of 4 short integers (16-bit halfwords), holding area code, prefix, 4-digit number, and extension. Then the extension number starts 6 bytes from the beginning of the record. We can get it by adding the offset 6 to a register pointing to the record:

la    $t0, MyPhone     # $t0 points to a telephone record
lh    $t1, 6($t0)      # $t1 loaded with the extension # in that record

Indexed addressing

Suppose we want to index an array. We know the address of the (start of) the array, and want a register to index it. Now the address is fixed, and the register will hold a variable offset. The following code will copy 10 bytes (the assumed length of both strings, including final 0) from str1 to str2, using $t0 going from 9 downto 0

      li   $t0,9           # t0 indexes arrays
copyloop:
      lb   $t1, str1($t0)  # t1 used to transfer character
      sb   $t1, str2($t0)  #    to str2
      sub  $t0, 1          # decrement array index
      bgez $t0, copyloop   # repeat until t0 < 0

Note that if we were moving words, we would need to decrement the index by 4 instead of 1.

To implement this in MIPS, the assembler needs to produce 3 machine instructions for each indexed instruction, beacuse the 32-bit constant address must be spilt into 2 16 bit immediate values.

The bottom line - MIPS memory addressing

MIPS, in keeping with the RISC philosophy, actually has only one memory addressing mode at the machine level, it corresponds to base addressing. Indirect addressing is simply a special case with offset = 0. Direct and indexed addressing both involve fixed 32-bit addresses, the assembler calculates these numerical addresses and splits them into 16-bit sections. The high order part goes it $at using the lui instruction, the low order part ends up in the immediate field of the machine load or store instruction. The sb instruction above assembled as follows, str2's address was 0x1001000c (0x1001 = 4097)

0x3c011001  lui $1, 4097 [str2]             ; 7: sb   $t1, str2($t0)
0x00280821  addu $1, $1, $8
0xa029000c  sb $9, 12($1) [str2]

You can figure out how this has the same effect as if we had been able to code directly

            sb $t1, 0x1001000c($t0)

Meanwhile, here is the layout of the actual sb $9, 12($1) base addresssed instruction, and a diagram explaining how the immediate 12 is added to $t0, the sum placed on the address bus, and $t1 (a character) on the data bus.

op code `sb`	rs `($1)`	rt - `$9`	immediate (offset) 0x000c
`101000`	`00001`	`01001`	`0000 0000 0000 1100`

Prepared by Lin Jensen, Bishop's University, 28 January 1999. OP code FUNCTion table addes 16 February 2022

Back Next Contents