Free Web Hosting by Netfirms
Web Hosting by Netfirms | Free Domain Names by Netfirms

PIPELINING ( COMPUTER ARCHITECTURE )

 

Pipelining is an implementation technique whereby multiple instructions are overlapped in execution.

There are few properties that characterize the RISC architectures and simplify the implementation. They are a) • all data operations are done to data in registers and causes the entire
register to change
b) • load and store are the only two operations that can affect memory.
Operations can be done to the entire register or less than a full register
c) • all instructions typically are of one size.There are only a few instruction
formats (e.g. I-type, J-type, R-type).


RISC instructions sets are designed in a way that simplifies the implementation
of pipelining.T ypically, there are three classes of instructions.
1. ALU instructions—take either two registers or a register and a signextended
immediate as operands.
2. Load and store instructions—access memory through base-displacement
addressing mode (e.g. LW $9, 4($8))
3. Branch and jumps instructions— are used to transfer control.Branc hes
are consider conditional and jumps are unconditional.

5 cycles of pipelining.

1.Cycle 1—Instruction fetch cycle (IF): Fetches instruction from the
memory and update the program counter to the new value.
2.Cycle 2—Instruction decode/register fetch cycle(ID): Performs fixed-
field decoding of the register values.Resolving branch conditions.Signextended
immediate value.Calculating branch target address and perhaps
update the PC to the new target address.
3.Cycle 3—Execution/effective address cycle(EX): If the instruction is
load/store, calculates the effective address.If the instruction is R-type,
performs the specified operation based on the two source registers.If
the instruction is I-type, perform the operation based on sign-extended
immediate value and the source register.
4.Cycle 4—Memory access (MEM): Performs read/write based on the
effective address calculated in the previous step.
5.C ycle 5—Write back (WB): Writes the result into the corresponding
register in the register file.

There are three important observations that can be drawn.
1.Separate instruction and data memories are used. They are actually
implemented as separate caches, and thus, the conflict between instruction
fetch and data memory access is eliminated.
2.Re gister file is used in two stages, ID and WB. Thus, two reads and
one write are performed every clock cycle.W rite is perform in the first
half and read in the second half of the clock cycle.
3.PC must be incremented and stored every clock in the IF stage. Potential
branch target address must also be computed during ID.Th us,
we are faced with a problem of updating PC during the ID stage (once
branch condition and the target address is resolved).

Pipeline Hazards
There are situations called hazards, that prevent the next instruction in the
stream from executing during its designated clock cycle.Ha zards reduce the
performance from the ideal speedup gained by pipelining.T here are three
classes of hazards:
1. Structural Hazards arise from resource conflict when the hardware cannot
support all possible combinations of instructions simultaneously in
overlapped execution.
2. Data Hazards arise when an instruction depends on the results of a
previous instruction not yet available in the pipeline.
3. Branch Hazards arise from the pipelining of branches and other instructions
that change the PC.

Solve Hazards

Structural Hazards - use instruction buffers

Data Hazards - use forwarding unit.

Control Hazards - reduced branch penalties with 4 techniques

a) 1. Freeze or flush the pipeline—in this scheme, any instructions after the
branch are held or deleted until the target address is known.This is a
very simple solution.

b) 2. Treat every branch as not taken—in this scheme, hardware is allowed
to continue as if the branch were not executed.The complexity of this
scheme arises when the processor state must be maintained until the
branch outcome is known.Ob viously, a roll-back mechanism must be
implemented so that changes in state can be reversed.In the simple
five-stage pipeline, this scheme is implemented by continuing to fetch
instructions as if the branch were a normal instruction.If the branch is
taken, we need to turn the fetch instruction into a NOP and restart the
fetch at the target address

3. Predict taken—as soon as the branch is decoded and the target address
is computed, we assume the branch to be taken and begin fetching and
executing at the target.In the five-stage pipeline, the branch outcome and the target address are computed in ID; thus, there is no advantage
in this approach for this pipeline.

4. Delayed branch—was heavily used in early RISC processors and works
reasonably well in the five-stage pipeline.In a delayed branch, the
execution cycle with a branch delay of one is
branch instruction
sequential successor
branch target if taken
The sequential successor is in the branch delay slot.This instruction is
executed whether or not the branch is taken.The pipeline behavior of
the five-stage pipeline with a branch delay is shown in figure 2.14. The instruction in the delay slot (only one slot for MIPS) are executed.If
the branch is untaken, execution continues with the instruction after
the branch delay instruction.If the branch is taken, execution continues
at the branch target.When the instruction in the branch delay slot is
also a branch, the meaning is unclear.Th us, branch instruction is not
allowed in the delay slot.

Scheduling the branch delay slot

3 ways

a)independent instruction from before the branch ( best choice )

b)the branch delay slot is scheduled
from the target of the branch; usually, the target instruction will need to be
copied since it can be reached by another path.T his strategy is preferred
when the probability for a branch taken is high (e.g. loop branch).

c)the delay slot is scheduled with fall-through instruction.

In order for
optimization technique (b) and (c) to be legal, the execution of the delay
slot when the branch goes in the unexpected direction must keep the system
in the correct state.T his means that the work is wasted but the program
will still execute correctly.