

### **SNS COLLEGE OF ENGINEERING**

Kurumbapalayam (Po), Coimbatore – 641 107

### **An Autonomous Institution**

Accredited by NBA – AICTE and Accredited by NAAC – UGC with 'A' Grade Approved by AICTE, New Delhi & Affiliated to Anna University, Chennai

### **DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING**

### **COURSE NAME : 19IT301 COMPUTER ORGANIZATION AND** ARCHITECTURE II YEAR /III SEMESTER

### Unit 3- PROCESSOR AND PIPELINING

Topics : Pipelining: Basic concepts – Data hazards – Instruction hazards – Influence on Instruction sets – Data path and control consideration.





### Pipelining

### A TECHNIQUE TO IMPROVE THE PERFORMANCE **BY OVERLAPPING THE EXECUTION OF** SUCCESSIVE INSTRUCTIONS.

Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE





### Pipelining



### Sequential Execution



### **Pipelined Execution**

Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE

11/10/2022





### **Hardware Organization**



Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE



### Execution Unit



### **Four State Pipeline**

Fetch (F) Read the instruction from memory Decode (D) Decode the instruction and fetch the source operand(s) Execute (E) Perform the operation specified by the instruction Write (W) Store the result in the destination location





# **Four Stage Pipeline**



Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE





### Hardware Organization







# **Role of cache memory**

- ✓ Pipelining is more effective in improving performance if the tasks being performed in different stages require about the same amount of time.
- Sut access time of main memory is high compared to the access time of registers.
- $\checkmark$  This problem is solved by cache memory.







# **PIPELINE PERFORMANCE** ✓ Data Hazard ✓ Instruction Hazards ✓ Structural Hazard



### **Data Hazard**



- $\checkmark$  It is a situation in which the pipeline is stalled because the data to be operated on are delayed for some reason.
- Source or destination operands not available at time expected in the pipeline
- $\checkmark$  Execution operation taking more than one clock cycle. Example:

### A=5

- A -----3+A
- B ------ 4\*A





### **Data Hazard**



Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE





# **Data Dependency** Ex: Mul R2, R3, R4







### **Operand Forwarding**



11/10/2022

Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE





### **Operand Forwarding**



Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE

11/10/2022





# Handling Data Hazards in SW

Compiler detect data dependencies and deal with them Insert NOPs Attempt to reorder instructions to perform useful tasks in NOP slots.

Ex:

- Mul R2,R3,R4 11: NOP NOP
- 12: Add R5,R4,R6







Side effects

- Instruction changes contents of a register other than the named destination
  - ✓ Autoincrement/autodecrement addressing modes. Ex: PUSH & POP OPERATIONS.
  - Condition code flags Ex: Add R1,R2 Addwithcarry R3, R4
- ✓ Give rise to multiple dependencies
- ✓ Should be minimized





# **Instruction Hazards**

It is a situation in which the pipeline is stalled because of a delay in the availability of an instruction.

Delay in the availability of an instruction

- $\checkmark$  Cache miss
- ✓ Branch instructions





# **Instruction Hazard or Control** Hazard---CACHE MISS

| Clock cycle          | 1              | 2              | 3              | 4              | 5              | 6              | 7     |
|----------------------|----------------|----------------|----------------|----------------|----------------|----------------|-------|
| Instruction          | Sec.14         | De la          | and wi         | lanin agi      |                |                |       |
| I1                   | F <sub>1</sub> | D <sub>1</sub> | E <sub>1</sub> | W <sub>1</sub> |                |                |       |
| I <sub>2</sub>       |                |                | I              | <sup>7</sup> 2 |                | D <sub>2</sub> | E     |
| I <sub>3</sub>       |                |                |                |                |                | F <sub>3</sub> | D     |
|                      | (a)            | Instructio     | on execu       | tion steps     | in succe       | essive cl      | ock c |
| Clock cycle<br>Stage | 1              | 2              | 3              | 4              | 5              | 6              | 7     |
| F: Fetch             | F <sub>1</sub> | F <sub>2</sub> | F <sub>2</sub> | F <sub>2</sub> | F <sub>2</sub> | F <sub>3</sub> |       |
| D: Decode            |                | D <sub>1</sub> | idle           | idle           | idle           | D <sub>2</sub> | D     |
| E: Execute           |                |                | E <sub>1</sub> | idle           | idle           | idle           | E     |
| W: Write             |                |                |                | $W_1$          | idle           | idle           | idl   |
| (b) Fu               | unction        | performe       | ed by ead      | ch proces      | sor stage      | e in succ      | essiv |
| Figure 8.4           | Pipeline       | e stall ca     | used by a      | a cache m      | niss in F2     |                |       |

11/10/2022

Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE







# **Unconditional branch instructions** Branch Penalty: The time lost as a result of a branch instruction





### **Branch Penalty**



Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE

11/10/2022







11/10/2022

### S.S INSTRUCTION HAZ

20/40



# Instruction Queue and Prefetch











11/10/2022





22/40



# **Conditional branch instructions**

Techniques for reducing the branch penalty associated with conditional branch instruction: 1.Delayed branching 2.Branch prediction





| LOOP | Shin_len  | RI    |
|------|-----------|-------|
|      | Decrement | R2    |
|      | Branch=0  | LOOP  |
| NEXT | Add       | R1,R3 |
|      |           |       |

(a) Original program loop

|      | a second s |       |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
| LOOP | Decrement                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | R2    |
|      | Branch=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | LOOP  |
|      | Shift_left                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | R1    |
| NEXT | Add                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | R1,R3 |
|      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |       |

(b) Reordered instructions

Figure 8.12 Reordering of instructions for a delayed branch.

| Clock cycle        | 1         | 2              | 3                | 4                  | 5                                                | 6              |
|--------------------|-----------|----------------|------------------|--------------------|--------------------------------------------------|----------------|
| Instruction        | ini m     | bashu          | Verbak           |                    |                                                  |                |
| Decrement          | F         | Е              | MER. THE         |                    |                                                  |                |
| Contraction of the |           |                | 13.02.00         | atech A.r          |                                                  |                |
| Branch             |           | F              | E                | story is           |                                                  |                |
|                    |           |                |                  |                    |                                                  |                |
| Shift (delay slo   | t)        |                | F                | E                  |                                                  |                |
| Deserves (D        |           |                |                  |                    |                                                  |                |
| Decrement (Bra     | anch take | n)             |                  | F                  | Е                                                | and the second |
| Branch             |           |                |                  |                    | F                                                | Е              |
|                    |           |                |                  | are the            |                                                  |                |
| Shift (delay slo   | t)        |                |                  |                    | SEPECTION AND AND AND AND AND AND AND AND AND AN | F              |
|                    |           |                |                  |                    |                                                  |                |
| Add (Branch no     | ot taken) |                |                  |                    |                                                  |                |
| F. 0.10 F          | Pine      | lining/Compute | r organization a | and architecture/[ | )r K Perivakaru                                  | opan/CSE/S     |

11/10/2022

ExecutiPipelining/Computer organization and architecture/Dr.K.Perivakaruppan/CSE/SNSCE Figure 8.13 passes through the loop in Figure 8.12b.







# **Branch Prediction**

Attempt to predict whether or not a particular branch will be taken

### **Static Branch Prediction**

The result of the branch instruction may be made in hardware by observing whether the target address of the branch is lower than or higher than the address of the branch instruction.

A more flexible approach is to have the compiler decide whether a given branch instruction should be predicted taken or not taken. That is branch instructions include a branch prediction bit, which is set to 0 or 1 by the compiler to indicate the desired behavior.

### **Dynamic Branch Prediction**

Based on execution history







Figure 8.14

Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE





ST: strongly likely to be taken LT: likely to be taken LNT: likely not to be taken SNT: strongly likely not to be taken



Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE





# **INFLUENCE ON INSTRUCTION** SETS

Two key aspects of machine instructions:

- ✓ Addressing modes
- ✓ Condition code flags





# Addressing modes

Complex addressing :Load (x(R1)),R2 Simple addressing :Add #x,R1,R2 Load (R2), R2 Load (R2),R2

\*Complex addressing modes are not suitable for piplined execution. The addressing modes used in modern processors often have the following features to support pipeline hardware:

- Access to an operand does not require more than one access to the 1. memory.
- Only load and store instructions access memory operands. 2.
- The addressing modes used do not have side effects. 3. \*Register, register indirect, index addressing modes are having all these features.





| Clock cycle 1    | 2               | 3                     |                       | 5 doute to            |
|------------------|-----------------|-----------------------|-----------------------|-----------------------|
| Load F           | D               | X+[R1]                | [X+[R1]]              | [[X+[R1]]]            |
| pices in and     |                 | inot conver           |                       |                       |
| Next instruction | n F             | D                     |                       |                       |
|                  |                 | (a) Complex           | addressing            |                       |
| Add F            | D               | X+[R1]                | w                     | odiastucentilo        |
| and an and a     | Lean x stori of |                       |                       | be on snup            |
| Load             | F               | D                     | [X+[R1]]              | W                     |
|                  |                 |                       |                       |                       |
| Load             |                 | F                     | D                     | [X+[R1]]]             |
|                  |                 | in the second         | index regist          |                       |
| Next instruct    | ion             |                       | F                     | D                     |
|                  | Pipelining/     | Computer organization | andlarchitectore/DCK. | Periyakaruppan/CSE/SI |

11/10/2022





SNSCE

. .

E

W

30/40



# **Condition code flags**

Compiler for a pipelined processor attempts to reorder instructions to avoid stalling the pipeline when branches or data dependencies between successive instructions occur.

The dependency introduced by the condition code flags reduces the flexibility available for the compiler to reorder instructions.

To provide flexibility in reordering instructions,

1. the condition code flags should be affected by as few instructions as possible. 2.the compiler should be able to specify in which instructions of a program the condition codes are affected and in which they are not.

An instruction set designed with pipelining in mind usually provides the desired flexibility.





# **Condition code flags**

| Add<br>Compare<br>Branch=0 | R1,R<br>R3,R   |
|----------------------------|----------------|
| (a) A progra               |                |
| (a) progra                 | am fragment    |
| ( , , , progra             | am fragment    |
| Compare                    |                |
|                            | R3,R4<br>R1,R2 |

(b) Instructions reordered

Figure 8.17 Instruction reordering.

Pipelining/Computer organization and architecture/Dr.K.Periyakaruppan/CSE/SNSCE

11/10/2022



32/40



# **Datapath and control** considerations

Several important changes as compared to the previous diagram (three bus organization):

1.Separate instruction and data caches.

2.PC is directly connected to IMAR(independent ALU operation) 3.The data address in DMAR can be obtained directly from the register file or from the ALU to support the register indirect and indexed addressing modes.





4. Separate MDR registers are provided for read and write operations. Data can be transferred directly between these registers and the register file during load and store operations without the need to pass through the ALU.

5. Buffer registers have been introduced at the inputs and output of the ALU. These registers SRC1, SRC2, and RSLT in Figure 8.7.

6. The instruction register has been replaced with an instruction queue, which is loaded from the instruction cache. 7. The output of the instruction decoder is connected to the control signal pipeline.





The following operations can be performed independently in the processor of Figure 8.18:

- Reading an instruction from the instruction cache
- ✓ Incrementing the PC
- Decoding an instruction
- Reading from or writing into the data cache
- Reading the contents of up to two registers from the register file
- ✓ Writing into one register in the register file
- Performing an ALU operation





Because these operations do not use any shared resources, they can be performed simultaneously in any combination.

The structure provides the flexibility required to implement the four-stage pipeline . For example, let 11,12,13, and 14 be a sequence of four instructions.

The following actions all happen during clock cycle 4: Write the result of instruction 11 into the register file Read the operands of instruction I2 from the register file. Decode instruction I3 Fetch instruction I4 and increment the Pc.







11/10/2022



### Assessment

a).What is Pipelining?

b) Mention the purpose of

1.Instruction fetch unit\_\_\_\_\_

- 2. Execution unit \_\_\_\_\_
- 3.Decode unit \_\_\_\_\_
- 4.Write unit \_\_\_\_\_
- 5. Data path \_\_\_\_\_





### Reference



- 1. Carl Hamacher, Zvonko Vranesic and Safwat Zaky, "Computer Organization", McGraw-Hill, 6<sup>th</sup> Edition 2012. 2. David A. Patterson and John L. Hennessey, "Computer organization and design", MorganKauffman /Elsevier, 5<sup>th</sup> edition,
- 2014.
- 3. William Stallings, "Computer Organization and Architecture designing for Performance", Pearson Education 8<sup>th</sup> Edition, 2010 4. John P.Hayes, "Computer Architecture and Organization",
- McGraw Hill, 3<sup>rd</sup> Edition, 2002
- 5. M. Morris R. Mano "Computer System Architecture" 3<sup>rd</sup> Edition 2007

