CSc 116 notes
Back
Next
Contents
6) Instruction formats and addressing modes
We now know enough instructions to write reasonable programs. It is time
to look at various ways to refer to data, and at how machine instructions
are encoded. This will give us insights into what hardware designers have
to do, and explain why they have chosen to omit certain reasonable (from
the programmers point of view) instructions from the instruction set. We
have already seen some of these conversions. First, lets look at MIPS instruction
formats.
-
Instruction formats
-
Addressing modes
-
immediate - value built in to the instruction
-
register - register used for data
-
Memory refrencing - used with load and store instructions
-
label - fixed address built in to the instruction
-
indirect - register contains the address
-
Base addressing - field of a record
-
Indexed addressing - element of an array
Instruction formats
To keep the MIPS processor simple and fast, all instructions fit in 32-bit
words, and there are just 3 different instruction layouts. Decoding of
instructions starts with the leftmost 6 bits, the operation code. This
determines how the rest of the bits will be handled. The formats are:
Register
For operations using only registers as operands, the operation code is
000000
6 bits |
5 bits |
5 bits |
5 bits |
5 bits |
6 bits |
000000 |
source reg. |
target reg. |
Destination |
shift amt. |
function |
000000
|
01010
|
00111
|
00101
|
00000
|
100010
|
The last line of the table gives as an example a subtract instruction,
sub $5, $10, $7
-
3 five-bit fields identify the 3 registers used for the operation. Note
the order is different from Assembly language, the 2 sources come first,
then the destination. The "target" is usually the second source, but in
some instructions it is the destination or has an alternate meaning.
-
Shift amount is used in the shift instructions we will meet in the next
chapter.
-
Function determines just what is done with the registers, usually it is
an arithmetic or logical (boolean) instruction
This diagram shows how data flows from registers to the ALU (Arithmetic
Logic Unit) and the result is stored back into the destination register,
for the instruction
add $t3, $t1, $t2
Immediate
Some instructions include a constant value. This constant is included right
in the instruction, so it is called an immediate value. In 32 bits,
there is room for a 16 bit immediate, 6 bits of op. code, and 2 register
numbers (5 bits each). A good example is add immediate (addi),
which adds source register + immediate and stores in target register. Since
the immediate value is shorter than the register, it is "sign-extended"
to form a 32-bit signed number. Thus addi $t1,$t2, 0xfffe (shown
in table) actually adds 0xfffffffe, which is -2.
6 bits |
5 bits |
5 bits |
16 bits |
op code |
source reg |
target reg |
immediate value |
001000
|
01010
|
01001
|
1111 1111 1111 1110
|
This layout is also used for load, store, and branch instructions. (see
diagram for store)
-
In load and store, the immediate value is added to the source register
to form the memory address of the data.
-
In branch, the immediate value is an offset (in words) from the present
PC value.
Jump
The jump instructions (j & jal) use all 26 bits following
the op code for the address to jump to. Since all instructions are word
aligned, the last 2 bits of the address are always 0, so they are left
out of the instruction. Upon execution, the 26 bit jump target is shifted
left 2 bits, and stored in the PC, this causes the jump to take effect.
6 bits - op code |
26 bits - jump target (words) |
j |
0x0040000c [nextCh] |
000010 |
0x0100003 |
Effectively this is a 28-bit address, so jumps can "only" target
the first 256 Mbytes of memory. This could be a problem for the operating
system of a machine with more that this amount of memory. Such a machine
would probably be manipulating massive amounts of data, and so could reserve
the "low" memory for programs. Alternatively, compilers could be directed
to use jump-register instructions or branch-always instructions instead
of jumps.
OPerator and FUNCTion codes
From page 61
6 bits op funct rt
- - - - - - - - - - - -
000000 sll bltz
000001 see rt bgez
000010 j srl
000011 jal sra
000100 beq sllv
000101 bne
000110 blez srlv
000111 bgtz
001000 addi jr
001001 addiu jalr
001010 slti
001011 sltiu
001100 andi syscall
001101 ori break
001110 xori
001111 lui
010000 mfhi
010001 mthi
010010 mflo
010011 mtlo
011000 mult
011001 multu
011010 div
011011 divu
-rt-
100000 lb add bltzal
100001 lh addu bgezal
100010 lwl sub
100011 lw subu
100100 lbu and
100101 lhu or
100110 lwr xor
100111 nor
101000 sb
101001 sh
101010 swl slt
101011 sw sltu
Addressing modes
There are a number of ways to refer to data, either as source operands,
or destination locations for storage. This section starts with these different
methods from the programmers point of view, and discusses how these are
encoded in the MIPS instruction formats.
Immediate
Source operands can be constants. A constant value is usually encoded directly
in the machine language instruction, so that it si available immediately
after the instruction is decoded, without fetching it from memory. Understandably,
this does not provide any location for storing data.
MIPS instructions encode immediate constants in the lower 16 bits of
the immediate instruction layout. For constants larger than 16 bits, the
assembler generates 2 machine instructions, the upper half of the constant
is loaded into the $at register with the "load upper immediate" instruction.
Examples of instructions with immediate data:
-
addi $t0,t1,65
-
sub $t0,7
#assembled as:
addi $t0, $t0, -7
-
li $t3, 0x12345678 #assembled
as
lui $at, 0x1234
ori $t3, $at, 0x5678 #puts it
all together
-
bgez $t5, 16
#skip 4 instructions ahead if $t5 is non-negative (4 is encoded in
immediate field)
The last example is also referred to as PC-relative addressing,
because the immediate value is (shifted left 2 bits and) added to the PC
when the branch is taken. The constant is often referred to as an offset
or displacement. It is a signed number, so the new PC value can be before
(a loop) or after the current instruction. This mode is very useful because
the operating system can move the program to another part of memory
without affecting branch instructions.
Most processors use PC-relative addressing for branch instructions.
The number of bits used for the displacement limits the distance to the
branch target. In older designs, such as the 6502 and 80x86, 8 bits are
used, limiting the displacement to + or - 128 bytes before or after. This
usually caused students some distress. With MIPS, the limit is 128Kbytes.
Since branch instructions are usually used within procedures, this is unlikely
to cause you any problems!
Register addressing
Registers may be used as sources and destinations of data. Access to registers
is extremely fast, much faster than fetching data from memory. All the
instructions listed above reference 1 or 2 registers as well as the immediate
constant. In addition, many instructions reference registers exclusively.
Most processors require at least one operand of arithmetic instructions
to be in a register. Instructions with all operands in registers (or immediate)
are the fastest to execute.
MIPS provides 32 registers, and encourages you to load your data into
them and work with it there, resulting in very fast execution. Since it
takes only 5 bits to specify a register, as opposed to 32 bits for a memory
location, the register instruction layout can easily accomodate 3 register
designations.
Examples of register instructions:
-
addu $t3, $t1, $t2 # add (unsigned)
$t3 := $t1 + $t2
-
sub $t0, $t3
# subtract (signed) $t0 := $t0 - $t3
Memory addressing
With 32 bit addresses, a computer can access as much memory as you can
afford to buy. Well, up to 4 Gigabytes. You can store lots of data there,
and transfer it to and from the processor as needed. In return for large
amounts of storage, access is slower. Typically an instruction will allow
access to at most one memory address. MIPS, in common with other RISC processors
and most supercomputers, restricts this to load and store instructions.
There are several ways we could specify the address:
label - fixed address built in to the instruction
Also called direct addressing, the known address can be built into
the instruction as a constant. Programers usually specify this address
using a variable name or label in the data segment, the compiler or assembler
assigns it a numeric value. Since it is a 32-bit value, a MIPS assembler
will break it into two 16-bit immediate quantities, and assemble 2 machine
instructions to do the job. Examples:
lw $t0, MyNumber #load the word into $t0
sb $t9, firstInitial #store rightmost 8 bits of $t9 in data segment
indirect - register contains the address
Besides holding data, a register can be used as a pointer, holding an address
that points to the data. The data is transferred by putting the
register;s contents on the address bus, the data is transferred on the
data bus. This is called indirect addressing because the register
itself is not the target, rather it points to the target in memory indirectly.
A common use of this is to step through a string one character at a time.
One first loads a register with the address of the string, and then uses
it to access the characters in succession, incrementing the register each
time. For example, suppose we have the string
catStr: .asciiz "cat"
The address can be loaded into a register with the load-address instruction
(in machine language it is just like load immediate, except the constant
is the address rather than data), and then refer to the characters indirectly
la $t0, catStr
lb $t1, ($t0) # 'c' is now in $t1
addi $t0, 1 # point to next character
lb $t2, ($t0) # 'a' is now in $t2
Base addressing
A variation of indirect addressing is useful for referring to fields of
a record or struct. Suppose we store telephone numbers in a record of 4
short integers (16-bit halfwords), holding area code, prefix, 4-digit number,
and extension. Then the extension number starts 6 bytes from the beginning
of the record. We can get it by adding the offset 6 to a register pointing
to the record:
la $t0, MyPhone # $t0 points to a telephone record
lh $t1, 6($t0) # $t1 loaded with the extension # in that record
Indexed addressing
Suppose we want to index an array. We know the address of the (start of)
the array, and want a register to index it. Now the address is fixed, and
the register will hold a variable offset. The following code will copy
10 bytes (the assumed length of both strings, including final 0) from str1
to str2, using $t0 going from 9 downto 0
li $t0,9 # t0 indexes arrays
copyloop:
lb $t1, str1($t0) # t1 used to transfer character
sb $t1, str2($t0) # to str2
sub $t0, 1 # decrement array index
bgez $t0, copyloop # repeat until t0 < 0
Note that if we were moving words, we would need to decrement the index
by 4 instead of 1.
To implement this in MIPS, the assembler needs to produce 3 machine
instructions for each indexed instruction, beacuse the 32-bit constant
address must be spilt into 2 16 bit immediate values.
The bottom line - MIPS memory addressing
MIPS, in keeping with the RISC philosophy, actually has only one memory
addressing mode at the machine level, it corresponds to base addressing.
Indirect addressing is simply a special case with offset = 0. Direct and
indexed addressing both involve fixed 32-bit addresses, the assembler calculates
these numerical addresses and splits them into 16-bit sections. The high
order part goes it $at using the lui instruction, the low order part ends
up in the immediate field of the machine load or store instruction. The
sb instruction above assembled as follows, str2's address was 0x1001000c
(0x1001 = 4097)
0x3c011001 lui $1, 4097 [str2] ; 7: sb $t1, str2($t0)
0x00280821 addu $1, $1, $8
0xa029000c sb $9, 12($1) [str2]
You can figure out how this has the same effect as if we had been able
to code directly
sb $t1, 0x1001000c($t0)
Meanwhile, here is the layout of the actual sb
$9, 12($1) base addresssed instruction, and a diagram explaining how
the immediate 12 is added to $t0, the sum placed on the address bus, and
$t1 (a character) on the data bus.
op code sb |
rs ($1) |
rt - $9 |
immediate (offset) 0x000c |
101000 |
00001 |
01001 |
0000 0000 0000 1100 |
Prepared by Lin Jensen, Bishop's University, 28 January
1999. OP code FUNCTion table addes 16 February 2022
Back
Next
Contents