

8/4/2023

### **SNS COLLEGE OF TECHNOLOGY**

**Coimbatore-35 An Autonomous Institution** 

Accredited by NBA – AICTE and Accredited by NAAC – UGC with 'A+' Grade Approved by AICTE, New Delhi & Affiliated to Anna University, Chennai

### **DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING**

#### **19ECB302–VLSI** DESIGN

III YEAR/ V SEMESTER

UNIT 3 – SEQUENTIAL LOGIC CIRCUITS

**TOPIC 4 – PIPELINES** 

VLSI Design/ M.Pradeepa / AP/ECE/SNSCT





#### **OUTLINE**



- Introduction
- Architectural techniques : critical path
- Synchronous timing
- Self-timed pipelined data path
- Completion signal using current sensing
- Architectural techniques :pipelining
- Activity
- Architectural techniques : fine-grain pipelining
- Unrolling the loop using pipelining
- Architectural techniques : parallel processing
- Assessment
- Summary & thank you





### **INTRODUCTION**

### **Combinational logic**

output depends on current inputs **Sequential logic** 

output depends, on current and previous inputs Requires separating previous, current, future Called state or tokens Ex: FSM, pipeline







### **ARCHITECTURAL TECHNIQUES : CRITICAL PATH**

Critical path in any design is the longest path between

- 1. Any two internal latches/flip-flops
- 2. An input pad and an internal latch
- 3. An internal latch and an output pad
- 4. An input pad and an output pad

•Use FFs right after/before input/out pads to avoid the last three cases (off-chip and packaging delay)

Water In





Water Out



#### **SYNCHRONOUS TIMING**



#### **Pipelining**:

- Comes from the idea of a water pipe: continue sending water without waiting the water in the pipe t
- Used to reduce the critical path of the design





# be out



#### **SELF-TIMED PIPELINED DATA PATH**



8/4/2023

VLSI Design/ Mr.B.SideespankAuf/Fore/Son/SECE/SNSCT



6/25



#### **COMPLETION SIGNAL USING CURRENT SENSING**



238/406/2/2002230

VLSI Design/ Mr.Br.Sideespan kauf/Foffe/Son/SETE/SNSCT







Smaller Critical Path





#### **ARCHITECTURAL TECHNIQUES : PIPELINE DEPTH**

Pipeline depth: 0 (No Pipeline)

Critical path: 3 Adders



| 1 | t <sub>2</sub> | t <sub>3</sub> |  |
|---|----------------|----------------|--|
|   |                |                |  |
|   | X(1)           | X(2)           |  |
|   | Y(1)           | Y(2)           |  |

VLSI Design/ M.Pradeepa /AP/ECE/SNSCT

8/4/2023



Y(n)



wire w1, w2;

assign w1 = X + a;

assign  $Y = w^2 + c$ ;

assign  $w^2 = w^1 + b$ ;



#### **ARCHITECTURAL TECHNIQUES : PIPELINE DEPTH**

- Pipeline depth: 1 (One Pipeline register Added)
  - Critical path: 2 Adders



| t <sub>1</sub> t | 2 t  | 3    |
|------------------|------|------|
| X(1)             | X(2) | X(3) |
|                  | Y(1) | Y(2) |



wire w1; reg w2; assign w1 = X + a; always @(posedge Clk)

w2 <= w1 + b;





### **ARCHITECTURAL TECHNIQUES : PIPELINE DEPTH**

- Pipeline depth: 2 (One Pipeline register Added)
  - Critical path: 1 Adder



| t <u>1</u> | tz   | 2 ta | <u>, t</u> ∠ | 4 |
|------------|------|------|--------------|---|
|            |      |      |              |   |
| Ì          | X(1) | X(2) | X(3)         | Ī |
|            |      |      |              | I |
|            |      |      | Y(1)         |   |
|            |      |      |              |   |









Clock period and throughput as a function of pipeline depth: •







#### **General Rule:**

> Pipelining latches can only be placed across feed-forward cutsets of the circuit.

#### Cut set:

 $\triangleright$  A set of paths of a circuit such that if these paths are removed, the circuit becomes disjoint (i.e., two separate pieces) **Feed-Forward Cutset:** 

 $\triangleright$  A cutset is called feed-forward cutset if the data move in the forward direction on all the paths of the cutset





- **Example:** 
  - **FIR Filter**







#### **CLASS ROOM ACTIVITY**

#### Group Discussion & Debate

238/04/62/2002230

VLSI Design/ M.Pradeepa / AP/ECE/SNSCT





#### Critical Path: 1M+2A





#### Critical Path: 2A







| Cloc k | Input | 1     | 2     | 3           | 4     | 5                 | Output |
|--------|-------|-------|-------|-------------|-------|-------------------|--------|
| 0      | X(0)  | aX(0) | -     | aX(0)       | -     | aX(0)             | Y(0)   |
| 1      | X(1)  | aX(1) | bX(0) | aX(1)+bX(0) | -     | aX(1)+bX(0)       | Y(1)   |
| 2      | X(2)  | aX(2) | bX(1) | aX(2)+bX(1) | cX(0) | aX(2)+bX(1)+cX(0) | Y(2)   |
| 3      | X(3)  | aX(3) | bX(2) | aX(3)+bX(2) | cX(1) | aX(3)+bX(2)+cX(1) | Y(3)   |





#### Even more pipelining



| Clock | Input | 1     | 2     | 3           | 4     | 5                 | Output |
|-------|-------|-------|-------|-------------|-------|-------------------|--------|
| 0     | X(0)  | -     | -     | -           | -     | -                 | -      |
| 1     | X(1)  | aX(0) | -     | -           | -     | _                 | -      |
| 2     | X(2)  | aX(1) | bX(0) | aX(0)       | -     | aX(0)             | Y(0)   |
| 3     | X(3)  | aX(2) | bX(1) | aX(1)+bX(0) | -     | aX(1)+bX(0)       | Y(1)   |
| 4     | X(3)  | aX(2) | bX(1) | aX(2)+bX(1) | cX(0) | aX(2)+bX(1)+cX(0) | Y(2)   |





### **ARCHITECTURAL TECHNIQUES : FINE-GRAIN PIPELINING**

- Pipelining at the operation level
  - Break the multiplier into two parts



8/4/2023





### **UNROLLING THE LOOP USING PIPELINING**

- Calculation of X<sup>3</sup>
  - Throughput = 8/3, or 2.7
- Timing = One multiplier in the critical path
- Iterative implementation:
- No new computations can begin until the previous computation has completed





module power3( output reg [7:0] X3, output finished, input [7:0] X, input clk, start); reg [7:0] ncount; reg [7:0] Xpower, Xin; assign finished = (ncount == 0); always@(posedge clk) if (start) begin XPower <= X;</pre> Xin<=X; ncount <= 2; X3 <= XPower; end else if(!finished) begin ncount <= ncount - 1; XPower <= XPower \* Xin; End endmodule



### **REMOVING PIPELINE REGISTERS (TO IMPROVE LATENCY)**

Calculation of X<sup>3</sup>

- Throughput = 8 bits/clock (3X improvement)
- Latency = 0
- > Timing = Two multipliers in the critical path

Latency can be reduced by removing pipeline registers







module power3( Output [7:0] XPower, input [7:0] X); reg [7:0] XPower1, XPower2; reg [7:0] X1, X2; always @\* XPower1 = X; always @(\*) begin X2 = XPower1;XPower2 = XPower1\*XPower1; end assign XPower = XPower2 \* X2; endmodule



### **ARCHITECTURAL TECHNIQUES : PARALLEL PROCESSING**

In parallel processing the same ha dware is duplicated to

- > Increases the throughput without changing the critical path
- ➢ Increases the silicon area



### EL PROCESSING edto ritical path





#### **ADVANTAGEOUS**

- Reduction in the critical path
- Higher throughput (number of computed results in a give time)
- Increases the clock speed (or sampling speed)
- Reduces the power consumption at same speed

VLSI Design/ Dr.B.Sivasankari/rofessor/ECE/SNSCT

8/4/2023

23/06/2020

VLSI Design/ M.Pradeepa /AP/ECE/SNSCT



## d results in a give time) peed)



#### **ASSESSMENT**

- Define critical path
- List out the needs of Pipeline
- Draw the FIR filter using pipeline processing
- Compare pipeline and parallel processing





#### **SUMMARY & THANK YOU**

8/4/2023

VLSI Design/Mr.Br.SideespankAufl/Foffes/Son/SECE/SNSCT

