CS6270: Lecture 2 - Background - NicDumZ

rs rt rd register. Register (direct) op rs rt immed. Immediate. Displacement op rs rt ...... Use the information we have accumulated (ex: control signals for Step 1) to ...
886KB taille 4 téléchargements 315 vues
CS6270: Virtual Machines Lecture 2: Background Review of Basic Computer Architecture Concepts

Samarjit Chakraborty

1

Last Week’s Class: VM Taxonomy Process VMs support an ABI (user instr. + sys. calls)

System VMs

support complete ISA

different ISA

same ISA

different ISA

Multi programmed Systems

Dynamic Translators

Classic OS VMs

Whole System VMs

HLL VMs

Hosted VMs

Co-Designed VMs

Dynamic Binary Optimizers

same ISA

2

Today: Review of Background Material ƒ Virtual machines essentially present an interface that is identical to some desired real machine ƒ Hence, it is important to understand the interfaces that real machines provide and how such interfaces are supported/implemented ƒ In particular, we will review concepts from ƒ Computer architecture (today’s class) ƒ Operating systems

3

Computer System Hardware – Major Components Processor

Memory

Interface

Controller Local Bus Interface

Controller

Controller

High-Speed I/O Bus

Expansion

Frame Buffer

Network Hard Drive

CD ROM

Low-Speed I/O Bus

Display

Floppy 4

Basics of Processors ƒ We will use the MIPS instruction set to illustrate the basic concepts ƒ This instruction set is used by NEC, Nintendo, Silicon Graphics, Sony, …

ƒ MIPS fields op 6 bits ƒ ƒ ƒ ƒ ƒ ƒ

rs 5 bits

rt 5 bits

rd 5 bits

shamt 5 bits

funct 6 bits

op: Operation of the instruction (opcode) rs: First register source operand rt: Second register source operand rd: Register destination operand shamt: Shift amount funct: Function field (selects specific variant of opcode) 5

MIPS Operands: Registers and Memory MIPS operands Name

Examples

Comments

32 registers

$s0-$s7, $t0-$t9, $zero, $a0$a3, $v0-$v1, $gp, $fp, $sp, $ra, $at

Fast locations for data. In MIPS, data must be in registers to perform arithmetic.

230 memory words

Mem[0], Mem[4], …, Mem[4294967292].

Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls.

6

MIPS: Addressing Modes Register (direct)

op

rs

rt

rd

register Immediate

op

rs

rt

immed

Displacement

op

rs

rt

immed

register

PC-relative

op

rs

rt PC

Memory

+ immed

Memory

+ 7

MIPS: Instruction Format ƒ ƒ ƒ ƒ

Fixed-length instruction format All instructions are 32-bit long Very structured Only three instruction formats: R, I, J R-format

6 bits op

5 bits rs

5 bits rt

I-format

op

rs

rt

J-format

op

5 bits rd

5 bits shamt

6 bits funct

16-bit immed/address 26-bit address

8

MIPS: Instruction Format (Contd.) ƒ R-format: Used for instructions with 3 register operands ƒ Arithmetic instructions: ƒ add $t0, $s1, $s2 # $t0 Å $s1 + $s2 ƒ Note that $t0 is register 8, $s1 is register 17 and $s2 is register 18. 000000

10001

10010

01000

00000

100000

R-format 6 bits op

5 bits rs

5 bits rt

5 bits rd

5 bits shamt

6 bits funct

9

MIPS: Instruction Format (Contd.) ƒ I-format: For data transfer instructions ƒ Examples: load word (lw) and store word (sw) ƒ One register operand and one memory address operand (specified by a constant and a register)

lw $t0, 40($s2)

# load Mem[$s2+40] to $t0

$t0 is register 8 and $s2 is register 18.

100011

10010

01000

0000000000101000

5 bits rs

5 bits rt

16 bits 16-bit immed/address

I-format 6 bits op

10

MIPS: Instruction Format (Contd.) ƒ J-format: For jump instructions ƒ

j

Label

# next instr. at Label

ƒ Formats: J-format

op

26-bit address

ƒ Jump instructions just use high-order bits of PC ƒ Address = bits 31-28 of PC + shift_left_2_bits(26-bit address) ƒ Address boundaries of 256 MB.

11

Execution Time of a Program - Factors ƒ Instruction Count ƒ Determined by compiler and ISA ƒ Clock cycle time ƒ Determined by the architecture/implementation of the ISA ƒ Number of Clock Cycles per Instruction (CPI) ƒ Determined by the architecture/implementation of the ISA ƒ We will now look at different possible implementation possibilities

12

The Processor: Datapath & Control ƒ Implementation of the MIPS ISA ƒ Simplified to contain only: ƒ memory-reference instructions: lw, sw ƒ arithmetic-logical instructions: add, sub, and, or, slt ƒ control flow instructions: beq, j ƒ Generic Implementation: ƒ use the program counter (PC) to supply instruction address ƒ get the instruction from memory ƒ read registers ƒ use the instruction to decide exactly what to do ƒ All instructions use the ALU after reading the registers Why? ƒ memory-reference? ƒ arithmetic? ƒ control flow? 13

Building Blocks ƒ Different functional units we need for each instruction Instruction address MemWrite

PC Instruction

Add Sum

Instruction memory

Address

a. Instruction memory

5 Register numbers

5 5

Data

b. Program counter

3

Read register 1 Read register 2 Registers Write register Write data

c. Adder

ALU control

Write data

Read data Data memory

Data

Sign extend

32

MemRead a. Data memory unit

Read data 1

16

b. Sign-extension unit

Zero ALU ALU result

Read data 2

RegWrite a. Registers

b. ALU

14

Incrementing the Program Counter (PC)

Add 4

PC

Read address Instruction Instruction memory

ƒ Fetching instructions and incrementing the PC 15

Datapath for R-type Instructions 5 Register numbers

5 5

Data

3

Read register 1 Read register 2 Registers Write register Write data

ALU control

Read data 1 Zero ALU ALU result

Data Read data 2

RegWrite a. Registers

31

R-type

26 op 6 bits

b. ALU

21 rs 5 bits

16 rt 5 bits

11 rd 5 bits

6 shamt 5 bits

0 funct 6 bits 16

Datapath for R-type Instructions (Contd.) rs rt Instruction

rd

3

Read register 1 Read register 2 Registers Write register Write data

ALU operation

Read data 1 Zero ALU ALU result Read data 2

RegWrite

31

R-type

26 op 6 bits

21 rs 5 bits

16 rt 5 bits

11 rd 5 bits

6 shamt 5 bits

0 funct 6 bits 17

Datapath for Load/Store Instructions 3

Read register 1 Read register 2 Registers Write register

Instruction

Write data

MemWrite

Read data 1 Zero ALU ALU result

Write data 16

Sign extend

26 op 6 bits

Address

Read data 2

RegWrite

31

ALU operation

32

Data memory

MemRead

21 rs 5 bits

Read data

16 rt 5 bits

0 immediate 16 bits

18

Datapath for Load Instructions rs Instruction

rt

3

Read register 1 Read register 2 Registers Write register Write data

MemWrite

Read data 1 Zero ALU ALU result

31

Write data Sign extend

26 op 6 bits

Address

Read data 2

RegWrite

immediate16

ALU operation

32

Data memory

MemRead

21 rs 5 bits

Read data

16 rt 5 bits

0 immediate 16 bits

19

Datapath for Store Instructions rs rt Instruction

3

Read register 1 Read register 2 Registers Write register Write data

MemWrite

Read data 1 Zero ALU ALU result

31

Write data Sign extend

26 op 6 bits

Address

Read data 2

RegWrite

immediate16

ALU operation

32

Data memory

MemRead

21 rs 5 bits

Read data

16 rt 5 bits

0 immediate 16 bits

20

Datapath for Branch Instructions PC + 4 from instruction datapath Add Sum

Branch target

Shift left 2

Instruction

3

Read register 1 Read register 2 Registers Write register Write data

ALU operation

Read data 1 ALU Zero

To branch control logic

Read data 2

RegWrite 16

Sign extend

32

ƒ The ALU is used to evaluate the branch condition and a separate adder is used to compute the branch target address as the sum of the incremented PC and the sign-extended lower 16 bits of the instruction shifted left by 2 bits 21

Memory & R-type Instructions: Combined Datapath 3

Read register 1

3

Read register 1 Instruction

Read register 2 Registers Write register Write data

ALU operation Instruction

Read data 1 Zero ALU ALU result Read data 2

Read register 2 Registers Write register Write data

MemWrite

Read data 1 Zero ALU ALU result

Address

Read data 2 Write data

RegWrite 16

RegWrite

ALU operation

Sign extend

32

Read data Data memory

MemRead

R-type Memory

22

Using the Multiplexor

23

Adding “Instruction Fetch” Add 4

PC

Read address Instruction Instruction memory

Registers Read register 1 Read Read data 1 register 2

3

Write data

M u x

Zero ALU ALU result

Address

Write data

RegWrite 16

MemWrite MemtoReg

ALUSrc

Read data 2

Write register

ALU operation

Sign 32 extend

Read data

Data memory

M u x

MemRead

ƒ The Instruction Fetch portion of the datapath has now been added to the previous datapath 24

Simple Datapath for the MIPS Architecture ƒ Finally, adding the datapath for branch instructions PCSrc M u x

Add Add ALU result

4 Shift left 2 Registers PC

Read address Instruction Instruction memory

Read register 1 Read Read data 1 register 2 Write register Write data RegWrite 16

ALUSrc

Read data 2

Sign extend

M u x

3 ALU operation Zero ALU ALU result

MemWrite MemtoReg

Address

Read data

Data Write memory data

M u x

32 MemRead

25

Simple Control Structure ƒ All of the logic is combinational ƒ Wait for everything to settle down, and the right thing to be done ƒ ALU might not produce “right answer” right away ƒ Use write signals along with clock to determine when to write ƒ Cycle time determined by length of the longest path S tate elem ent 1

Com binational logic

State elem ent 2

Clock cycle

26

Control: Two-level implementation

31

6

Control 2 26

instruction register

Opcode

bit

2 ALUop 00: lw, sw 01: beq 10: add, sub, and, or, slt

Funct.

Control 1

5

6

3 ALUcontrol 000: and 001: or 010: add 110: sub 111: set on less than

ALU

0

27

Designing Control 1

31

6

Control 2 26

instruction register

Opcode

bit

Assume that Control 2 generates the 2-bit ALUop based on the opcode. Now, using this 2-bit ALUop and the function field of the instruction, Control 1 generates the 3-bit control signal ALUcontrol. 2 ALUop 00: lw, sw 01: beq 10: add, sub, and, or, slt

Funct.

Control 1

5

6

3 ALUcontrol 000: and 001: or 010: add 110: sub 111: set on less than

ALU

0

ALUcontrol will determine the function that the ALU will perform (ADD, OR, etc.) 28

Deriving Control2 signals Input

9 control (output) signals

Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 lw X 1 X 0 0 1 0 0 0 sw X 0 X 0 0 0 1 0 1 beq

Determine these control signals directly from the opcodes: R-format: 0 lw: 35 sw: 43 beq: 4 29

Similarly for the Other Instructions ƒ For each opcode, find the values of the control signals ƒ Construct the truth table ƒ Determine the logic that implements this truth table

Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 lw X 1 X 0 0 1 0 0 0 sw X 0 X 0 0 0 1 0 1 beq

30

Where we are headed? ƒ Single Cycle Problems: ƒ what if we had a more complicated instruction like floating point? ƒ wasteful of area: NO Sharing of Hardware resources ƒ One Solution: ƒ use a “smaller” cycle time ƒ have different instructions take different numbers of cycles ƒ a “multicycle” datapath:

Instruction register PC

Address

A Register #

Instruction Memory or data

Data

Data

IR

ALU

Registers Memory data register

MDR

ALUOut

Register # B Register #

31

Why single cycle implementation is not used? ƒ Assume the following access times: Memory (2ns), ALU & adders (2ns), reg. file access (1ns) ƒ Fixed length clock: longest instruction is the ‘lw’ which requires 8 ns ƒ Load uses five functional units: instruction memory, register file, ALU, data memory, register file once again ƒ Hence, clock cycle is 8ns ƒ Clock cycle is determined by the longest path in the machine (lw in this case) ƒ However, several other instructions could fit into a shorter clock cycle 32

Why single cycle implementation is not used? ƒ R-type: Instruction fetch, Reg access, ALU, Reg access ƒ Load: Instruction fetch, Reg access, ALU, Mem access, Reg access ƒ Store: Instruction fetch, Reg access, ALU, Mem access ƒ Branch: Instruction fetch, Reg access, ALU ƒ Jump: Instruction fetch Note the difference between Load and Jump. This difference becomes even more significant of there are floating-point instructions.

33

Multicycle implementation: Basics ƒ In the previous slide, the execution of each instruction was broken into several steps ƒ In a multicycle implementation, each such step executes in 1 clock cycle ƒ Hence, different instructions require different number of clock cycles ƒ Advantages: ƒ More efficient ƒ A functional unit can be used more than once per instruction, as long as it is used in different clock cycles (so less hardware is required) ƒ But the design is more complex 34

Single-Cycle versus Multicycle ƒ Instruction register Address

PC

Memory Instruction or data Data

Memory data register

Data

ƒ

A

Register # Registers Register #

ALU

ALUOut

ƒ

B Register #

Multicycle architecture

ƒ PCSrc M u x

Add Add ALU result

4 Shift left 2

PC

Read address Instruction Instruction memory

Registers Read register 1 Read Read data 1 register 2 Write register Write data RegWrite 16

ALUSrc

Read data 2

Sign extend

M u x

In a multicycle architecture: Single memory unit for both instruction and data Single ALU, rather than one ALU and two adders One or more registers added after each functional unit to hold the output of that unit, until the value is used in the next clock cycle

Single cycle architecture 3 ALUoperation Zero ALU ALU result

MemWrite MemtoReg

Read data Data memory Write data Address

M u x

32 MemRead

35

Multicycle implementation: Additional Registers ƒ Instruction Register, Memory Data Register, Registers A and B in front of the Reg file and ALUOut (reg in front of the ALU) ƒ At the end of each clock cycle, the data to be used in subsequent clock cycles is stored in a state element ƒ data to be used in subsequent instructions in a later clock cycle is stored in a programmer-visible state element like reg file, PC or memory ƒ data used by the same instruction in a later cycle is stored in one of the additional registers

36

Multicycle implementation ƒ Each clock cycle can accommodate at most one of the following operations: ƒ a memory access ƒ a register file access (two reads or one write) ƒ an ALU operation ƒ Hence, any data produced by one of the above three functional units must be saved into a temporary register for use in a later cycle

37

Multicycle implementation: Additional Registers

Ins tru c tio n re giste r D ata PC

A dd re ss A R eg is te r # Ins tru ction M em o ry

Re gisters

o r da ta M e m ory

reg iste r

A L U O ut

R eg is te r #

d ata D a ta

A LU

B R eg is te r #

All registers except the Instruction register (IR) hold data only between a pair of adjacent clock cycles (and hence do not need a write control signal) 38

Multicycle implementation: Examples PC

0 M u x 1

Address Memory MemData Write data

Instruction [25– 21]

Read register 1

Instruction [20– 16]

Read Read register 2 data 1 Registers Write Read register data 2

Instruction [15– 0] Instruction register Instruction [15– 0]

Memory data register

0 M Instruction u x [15– 11] 1

A

B

0 M u x 1 Sign extend

32

Zero ALU ALU result

ALUOut

0 4

Write data

16

0 M u x 1

1 M u 2 x 3

Shift left 2

ALU used to compute PC = PC + 4 The same ALU is also used for R-type instructions, branch address computation, computing memory address in the case of lw/sw instructions 39

Multicycle Approach: Summary ƒ Break up the instructions into steps, each step takes a cycle ƒ balance the amount of work to be done ƒ restrict each cycle to use only one major functional unit ƒ At the end of a cycle ƒ store values for use in later cycles (easiest thing to do) ƒ introduce additional “internal” registers ƒ Notice: we distinguish ƒ processor state: programmer visible registers ƒ internal state: programmer invisible registers (like IR, MDR, A, B, and ALUout) 40

Multicycle implementation: Steps common for ƒ Instruction fetch all instructions ƒ Instruction decode and register fetch ƒ Execution, memory address computation or branch completion ƒ Memory access or R-type instruction completion ƒ Memory read completion

INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! 41

Step 1: Instruction Fetch ƒ Use PC to get instruction and put it in the Instruction Register ƒ Increment the PC by 4 and put the result back in the PC ƒ Can be described succinctly using RTL "Register-Transfer Language" IR = Memory[PC]; PC = PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? This step is common for all instructions (obviously!) 42

Step 2: Instruction Decode and Register Fetch ƒ ƒ ƒ ƒ

Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch Previous two actions are done optimistically (no harm is done) RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC+(sign-extend(IR[15-0])