

# **SNS COLLEGE OF TECHNOLOGY**

**Coimbatore-35 An Autonomous Institution** 

Accredited by NBA - AICTE and Accredited by NAAC - UGC with 'A+' Grade Approved by AICTE, New Delhi & Affiliated to Anna University, Chennai

# **DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING**

# **19ECB302–VLSI DESIGN**

III YEAR/ V SEMESTER

UNIT 3 – SEQUENTIAL LOGIC CIRCUITS

**TOPIC 4 – PIPELINES** 





### **OUTLINE**



- Introduction
- Architectural techniques : critical path
- Synchronous timing
- Self-timed pipelined data path
- Completion signal using current sensing
- Architectural techniques :pipelining
- Activity
- Architectural techniques : fine-grain pipelining
- Unrolling the loop using pipelining
- Architectural techniques : parallel processing
- Assessment
- Summary & thank you





# **INTRODUCTION**

# **Combinational logic**

output depends on current inputs **Sequential logic** 

output depends, on current and previous inputs Requires separating previous, current, future Called state or tokens Ex: FSM, pipeline



2131/026/22002220





# **ARCHITECTURAL TECHNIQUES : CRITICAL PATH**

Critical path in any design is the longest path between

- 1. Any two internal latches/flip-flops
- 2. An input pad and an internal latch
- 3. An internal latch and an output pad
- 4. An input pad and an output pad

•Use FFs right after/before input/out pads to avoid the last three cases (off-chip and packaging delay)

Water In







#### **SYNCHRONOUS TIMING**



### **Pipelining:**

- Comes from the idea of a water pipe: continue sending water without waiting the water in the pipe t
- Used to reduce the critical path of the design





# be out



#### **SELF-TIMED PIPELINED DATA PATH**



23/06/2020

11/2/2022







# **COMPLETION SIGNAL USING CURRENT SENSING**



2131//026//22002220







Smaller Critical Path





# **ARCHITECTURAL TECHNIQUES : PIPELINE DEPTH**

Pipeline depth: 0 (No Pipeline)

Critical path: 3 Adders



| t <sub>1</sub> | t <sub>2</sub> | t <sub>3</sub> |  |
|----------------|----------------|----------------|--|
|                |                |                |  |
|                | X(1)           | X(2)           |  |
|                | Y(1)           | Y(2)           |  |

PIPELINES /19ECB302-VLSI DESIGN/SWAMYNATHAN.S.M/ECE/SNSCT



►Y(n)



wire w1, w2;

assign w1 = X + a;

assign  $Y = w^2 + c$ ;

assign  $w^2 = w^1 + b$ ;



# **ARCHITECTURAL TECHNIQUES : PIPELINE DEPTH**

- Pipeline depth: 1 (One Pipeline register Added)
  - Critical path: 2 Adders



| <u>tı t</u> | 2 t <u>í</u> | 3    |  |
|-------------|--------------|------|--|
| X(1)        | X(2)         | X(3) |  |
|             | Y(1)         | Y(2) |  |

PIPELINES /19ECB302-VLSI DESIGN/SWAMYNATHAN.S.M/ECE/SNSCT



wire w1; reg w2; assign w1 = X + a; always @(posedge Clk)

w2 <= w1 + b;





# **ARCHITECTURAL TECHNIQUES : PIPELINE DEPTH**

- Pipeline depth: 2 (One Pipeline register Added)
  - Critical path: 1 Adder











Clock period and throughput as a function of pipeline depth:







# **General Rule:**

> Pipelining latches can only be placed across feed-forward cutsets of the circuit.

### Cut set:

 $\triangleright$  A set of paths of a circuit such that if these paths are removed, the circuit becomes disjoint (i.e., two separate pieces) **Feed-Forward Cutset:** 

 $\triangleright$  A cutset is called feed-forward cutset if the data move in the forward direction on all the paths of the cutset





- **Example:** 
  - **FIR Filter**  $\bullet$







### **CLASS ROOM ACTIVITY**

### Group Discussion & Debate

15

2131//04//22002220





### Critical Path: 1M+2A



PIPELINES /19ECB302-VLSI DESIGN/SWAMYNATHAN.S.M/ECE/SNSCT



### **Critical Path: 2A**

16/25





| Cloc k | Input | 1     | 2     | 3           | 4     | 5                 | Output |
|--------|-------|-------|-------|-------------|-------|-------------------|--------|
| 0      | X(0)  | aX(0) | -     | aX(0)       | -     | aX(0)             | Y(0)   |
| 1      | X(1)  | aX(1) | bX(0) | aX(1)+bX(0) | -     | aX(1)+bX(0)       | Y(1)   |
| 2      | X(2)  | aX(2) | bX(1) | aX(2)+bX(1) | cX(0) | aX(2)+bX(1)+cX(0) | Y(2)   |
| 3      | X(3)  | aX(3) | bX(2) | aX(3)+bX(2) | cX(1) | aX(3)+bX(2)+cX(1) | Y(3)   |





#### Even more pipelining



| Clock | Input | 1     | 2     | 3           | 4     | 5                 | Output |
|-------|-------|-------|-------|-------------|-------|-------------------|--------|
| 0     | X(0)  | -     | -     | -           | -     | _                 | -      |
| 1     | X(1)  | aX(0) | -     | -           | -     | -                 | -      |
| 2     | X(2)  | aX(1) | bX(0) | aX(0)       | -     | aX(0)             | Y(0)   |
| 3     | X(3)  | aX(2) | bX(1) | aX(1)+bX(0) | -     | aX(1)+bX(0)       | Y(1)   |
| 4     | X(3)  | aX(2) | bX(1) | aX(2)+bX(1) | cX(0) | aX(2)+bX(1)+cX(0) | Y(2)   |





# **ARCHITECTURAL TECHNIQUES : FINE-GRAIN PIPELINING**

- Pipelining at the operation level
  - Break the multiplier into two parts ullet









# **UNROLLING THE LOOP USING PIPELINING**

- Calculation of X<sup>3</sup>
  - Throughput = 8/3. or 2.7
- Timing = One multiplier in the critical path
- Iterative implementation:
- No new computations can begin until the previous computation has completed





module power3( output reg [7:0] X3, output finished, input [7:0] X, input clk, start); reg [7:0] ncount; reg [7:0] Xpower, Xin; assign finished = (ncount == 0); always@(posedge clk) if (start) begin XPower <= X;</pre> Xin<=X; ncount <= 2; X3 <= XPower; end else if(!finished) begin ncount <= ncount - 1; XPower <= XPower \* Xin; End endmodule



# **REMOVING PIPELINE REGISTERS (TO IMPROVE LATENCY)**

Calculation of X<sup>3</sup>

- Throughput = 8 bits/clock (3X improvement)
- Latency = 0
- $\succ$  Timing = Two multipliers in the critical path

<u>Latency can be reduced by removing pipeline registers</u>







module power3( Output [7:0] XPower, input [7:0] X); reg [7:0] XPower1, XPower2; reg [7:0] X1, X2; always @\* XPower1 = X; always @(\*) begin X2 = XPower1; XPower2 = XPower1\*XPower1; end assign XPower = XPower2 \* X2; endmodule



# **ARCHITECTURAL TECHNIQUES : PARALLEL PROCESSING**

a(n)

In parallel processing the same ha dware is duplicated to

- $\succ$  Increases the throughput without changing the critical path
- $\succ$  Increases the silicon area



b(n)





#### Clock Freq: f Y(n) Throughput: M samples **Parallel Processing** b(2k) a(2k) ► Y(2k) b(2k+1) a(2k+1) ►Y(2k+1) **Clock Freq: f Throughput: 2M samples**

22/25



# **ADVANTAGEOUS**

- Reduction in the critical path
- Higher throughput (number of computed results in a give time)
- Increases the clock speed (or sampling speed)
- Reduces the power consumption at same speed

PIPELINES /19ECB302-VLSI DESIGN/SWAMYNATHAN.S.M/ECE/SNSCT

11/2/2022

23/06/2020





### **ASSESSMENT**

- Define critical path
- List out the needs of Pipeline
- Draw the FIR filter using pipeline processing
- Compare pipeline and parallel processing





### **SUMMARY & THANK YOU**

11/2/2022

