Timing closure

File:Clocked Synchronous circuit, clock signal, and logic gates with path delay.png, and Path Delay. tCLK1=tCLK2 have the same period. tsu / th: the set-up time and hold time of the register. tc-q / tc-q, cd: maximum and contamination (minimum) delay of the register. tlogic / tlogic, cd: maximum and contamination (minimum) delay of the combinational logic.]]

Timing closure in VLSI design and electronics engineering is the iterative design process of assuring all electromagnetic signals satisfy the timing requirements of logic gates in a clocked synchronous circuit, such as timing constraints, clock period, relative to the system clock. The goal is to guarantee correct data transfer and reliable operation at the target clock frequency.

A synchronous circuit is composed of two types of primitive elements: combinatorial logic gates (NOT, AND, OR, NAND, NOR, XOR etc.), which process logic functions without memory, and sequential elements (flip flops, latches, registers), which can store data and are triggered by clock signals. Through timing closure, the circuit can be adjusted through layout improvement and netlist restructuring{{Citation |last=Kahng |first=Andrew B. |title=Timing Closure |date=2011 |work=VLSI Physical Design: From Graph Partitioning to Timing Closure |pages=219–264 |url=https://link.springer.com/10.1007/978-90-481-9591-6_8 |access-date=2025-05-22 |place=Dordrecht |publisher=Springer Netherlands |language=en |doi=10.1007/978-90-481-9591-6_8 |isbn=978-90-481-9590-9 |last2=Lienig |first2=Jens |last3=Markov |first3=Igor L. |last4=Hu |first4=Jin}} to reduce path delays and make sure the signals of logic gates function before the required timing of clock signal.

As integrated circuit (IC) designs become increasingly complicated, with billions of transistors and highly interconnected logic. The mission of ensuring all critical timing paths satisfy their constraints has become more difficult. Failed to meet these timing requirements can cause functional faults, unpredictable consequence, or system-level failures.

For this reason, timing closure is not a simple final validation step, but rather an iterative and comprehensive optimization process. It involves continuous improvement of both the logical structure of the design and its physical implementation, such as adjusting gate's logical structure and refining placement and routing, in order to reliably meet all timing constraints across the entire chip.

Overview

In simple cases, the user can compute the path delay between elements manually. If the design is more than a dozen or so elements this is impractical. For example, the time delay along a path from the output of a D-Flip Flop, through combinatorial logic gates, then into the next D-Flip Flop input must satisfy (be less than) the time period between synchronizing clock pulses to the two flip flops. When the delay through the elements is greater than the clock period, the circuit will not function. Therefore, modifying the circuit to remove the timing failure (and eliminate the critical path) is an important part of the logic design engineer's task. Critical path refers to the longest path (in terms of delay) between two sequential elements in a design. It also defines the maximum delay in all the multiple register-to-register paths, and it must not be greater than the clock cycle time.

Timing constraints

File:Timing constraints mechanism.png

In the process of IC design, the IC layout should satisfy geometric constraints and timing constraints. Geometric constraints refer to physical design regulations and rules imposed by the assembly process, such as correct cell alignment and minimum wire spacing. Timing constraints refer to the timing requirements that all signal paths should satisfy. Usually, before the output of the signal from flip-flop at the clock edge, the signal should also remain stable in the element for a period, which is called Setup time. After the electromagnetic signal reaches the next flip-flop at the clock edge, the signal should remain stable in the storage element for some time, which is called Hold time. The timing constraints have two types:

Setup constraints (long-path constraints):

These constraints specify the time length before the clock edge of flip-flop where the data input signal should stay steady, so that the data has enough time to propagate through a logic path and reach the next flip-flop before the next clock edge. If the path delay is too long, it may violate setup time constraints and cause problematic data to be latched.

Hold constraints (short-path constraints):

These constraints specify the time length after the clock edge of flip-flop where the data input signal should stay stable. Violating a hold constraint can result in metastability or unwanted behaviors.

Hold time constraint: t_{logic}>t_h-t_{c{-}q}

Setup time constraint: t_{logic}

Where:

  • t_{logic} = Combinational Logic Delay
  • t_{CLK} = Clock Period
  • t_{su} = Setup Time
  • t_h = Hold Time
  • t_{c{-}q} = Clock-to-Q Delay of the flip-flop{{Cite web |title=Timing Closure in FPGA |url=https://www.vemeko.com/blog/timing-closure-in-fpga.html |archive-url=http://web.archive.org/web/20240913124036/https://www.vemeko.com/blog/timing-closure-in-fpga.html |archive-date=2024-09-13 |access-date=2025-05-27 |website=www.vemeko.com |language=en-US}}

Timing closure iterative process

Timing Closure is a vital step that ensures that all signals achieve their destinations in the required time, and then the circuit works reliably. Designers start with the Register-Transfer Level (RTL) abstraction and Verilog or VHDL code that describes the circuit. This is turned into a netlist, which is a collection of logic gates and connections, and used to configure the FPGA hardware.{{Cite web |title=Intel Quartus Prime Pro Edition User Guide: Timing Analyzer |url=https://www.intel.com/content/www/us/en/docs/programmable/683243/18-1/introduction.html |access-date=2025-05-27 |website=Intel |language=en}}

Because FPGAs have flexible logic and wiring, signal delays can vary. If signals arrive too late, the design may fail timing. The Timing Constraints Designers begin to define accurate and realistic timing constraints that reflect the system's performance goals in the SDC (Synopsys Design Constraint) format.{{Cite web |title=AMD Technical Information Portal |url=https://docs.amd.com/r/en-US/ug903-vivado-using-constraints/Output-Delays |access-date=2025-05-27 |website=docs.amd.com}} These constraints may include clock period, Input/Output delays, multi-cycle paths, and setup/hold requirements. It's critical to analyze whether they are achievable, based on the logic architecture and path delays within the design. These constraints guide all downstream timing analysis and optimization processes.

Problems in timing closure and static timing analysis

There are three main delays in the clocked synchronous circuit that are primarily considered:

Gate delays is the length of time it takes for a change in a gate's input to propagate to the output. It's often calculated as the time between a change at the input and the resulting change at the output.{{Cite journal |last=Weste |first=Neil |date=2003-04-23 |title=IC technology trends for wireless local area networks |url=https://doi.org/10.1117/12.512737 |journal=SPIE Proceedings |publisher=SPIE |volume=5117 |pages=1 |doi=10.1117/12.512737}}

Wire delays is also known as interconnect delay, meaning the time that takes for a data signal to propagate through metal wires (interconnect) between circuit element in a synchronous circuit. The delay is mostly caused by the resistance and capacitance of the wire.{{Cite journal |last=Das |first=Shamik |last2=Chandrakasan |first2=Anantha |last3=Reif |first3=Rafael |date=2003 |title=Design tools for 3-D integrated circuits |url=https://doi.org/10.1145/1119772.1119783 |journal=Proceedings of the 2003 conference on Asia South Pacific design automation - ASPDAC |location=New York, New York, USA |publisher=ACM Press |pages=53 |doi=10.1145/1119772.1119783}}

clock skew is the difference in arrival time of the same sourced clock signal at different parts of a synchronous circuit. When the clock signal propagates from its source, such as oscillator or clock generator, through many different paths in the circuit, the signal experience propagation delay, which caused the clock skew. In the graph below, the clock skew between points i and j is on a chip:

\delta( i,j)=t_i-t_{j}

While position i and j can vary. The diagram illustrates the concept of clock skew, which refers to the difference in clock arrival times at different flip-flops on a chip. Ideally, all clock signals should reach their destinations simultaneously; however, due to variations in routing, load, and physical placement, this is rarely achieved.

File:Clock Skew.png

After logic synthesis and constraints analysis, the design undergoes Static timing analysis (STA), which is a fundamental, iterative process in validating whether the circuit meets its defined timing constraints in FGPA. (In STA, assume the clock skew is negligible, and postpone it to clock tree synthesis) STA tools(such as Cadence Tempus, Synopsys, PrimeTime, and Intel Timing Analyzer) can evaluate all timing paths in the design without requiring simulation, making them ideal for scalable and exhaustive analysis. In STA, the combinational circuit can represent as directed acyclic graph (DAG) which emphasizes that every node has weight is the same as the wire (gate) delay.

During this process, the STA engine computes:

  • Path delays: Total delay from one register to another through combinational logic.
  • Slack: The difference between required arrival time and actual arrival time.
  • Critical paths: The longest paths with the smallest (or zero) slack.
  • Violations: Paths with negative slack, meaning they fail to meet timing.{{Citation |last=Bhasker |first=J. |title=Timing Verification |date=2009 |work=Static Timing Analysis for Nanometer Designs |pages=227–316 |url=https://doi.org/10.1007/978-0-387-93820-2_8 |access-date=2025-05-23 |place=Boston, MA |publisher=Springer US |isbn=978-0-387-93819-6 |last2=Chadha |first2=Rakesh}}

File:Combinational Circuit as Directed Acyclic Graph.png

File:Computing Slack at Each Node in a Timing Graph using Static Timing Analysis(STA).png

Especially for slack, STA supposes the worst-case scenario where every gate transitions, we can compute the slack for each node.

\mathrm{Slack} = \mathrm{RAT} - \mathrm{AAT}

Where:

  • RAT = Required Arrival Time
  • AAT = Actual Arrival Time

RAT is the required arrival time, meaning the latest time can transit in the required timing. While AAT is the actual arrival time, meaning the latest actual transition time. (AAT is defined at the output of every node) The negative slack at any output means the circuit doesn't meet timing, while the positive slack at all output means the circuit meets timing.

Physical design

Once the STA reports are generated, engineers can utilize timing optimization techniques, or design automation tools, to examine them to identify the critical or failing paths that need attention. They also optimize the physical layout by adjusting placement and routing. This loop repeats until all timing constraints are met.

Through logic synthesis and initial timing optimization, the physical layout of the chip should be mapped. Through placement, clock tree synthesis, and routing of these key steps, the physical designs are altered so that the timing behaviors can change significantly, and therefore reduce the path delays and enhance the timing in circuit.{{Cite web |last=anysilicon |date=2022-09-24 |title=Ultimate Guide: Clock Tree Synthesis |url=https://anysilicon.com/clock-tree-synthesis/ |access-date=2025-05-26 |website=AnySilicon |language=en-US}}

= 1. Placement =

File:Placement in Physical Design Example.jpg

The EDA tool assigns physical locations to each standard cell (logic gates, flip-flops, etc.) and wire on the silicon circuit board. It can reduce path delays by placing interconnected cells close to each other.

= 2. Clock Tree Synthesis (CTS) =

A balanced clock distribution network is built to deliver the clock signal to all sequential elements (flip-flops) evenly and synchronously. The CTS can minimize clock skew (difference in arrival time of the clock signal at different points) and can precisely control the clock latency (enhance the delivery time of clock signal to all sequential elements), while satisfying the maximum transition and maximum capacitance to ensure the clock network meet design constraints. The clock skew usually affects Hold Time and Setup Time, and the clock skew is usually composed of local clock skew and global clock skew.

Commonly there are three types of CTS:

2.1.Single Point CTS

File:Single Point CTS.png

A Single Point clock tree starts off from a single clock source and delivers the clock signal to all sequential elements in a tree structure. This method is easy to implement and is appropriate for low-frequency or multi-clock designs. Nevertheless, it will be unsuitable for high-frequency or large-scale designs because path asymmetry can lead to larger clock skew.

2.2.Clock Mesh

File:Clock Mesh.png

A Clock Mesh dispatches the clock signal through a grid-like structure, providing enhanced clock balance and lesser skew, which is good for high-frequency designs. However, constructing a clock mesh means higher power and area overhead, and the design complexity will be increased.

2.3.Multi-source CTS

File:All design metrics with different variants of Clock Tree.png

A Multi-Source clock tree integrates the advantages of single-point trees and clock meshes. The design is partitioned into multiple components, each with its own local clock source. This clock tree achieves low skew while reducing power and area consumption, making it well-suited for large-scale designs.

= 3. Routing =

File:Routing in Physical Design.png

After placement, the design automation tool creates wires to physically connect cells. The real routing introduces actual parasitic Resistance-Capacitance effects, which can reduce signal delay. Besides, final routing enables more precise timing analysis because the wire lengths and congestion are given.

Timing optimization techniques

One common way to improve the circuit performance is to use Timing Optimization Techniques, such as inserting a register in between the combinational path of the critical path. This might improve the performance but increases the total latency (maximum number of registers from input to output path) of the circuit.{{Cite book |last=Weste |first=Neil H. E. |url=https://www.worldcat.org/title/473447233 |title=CMOS VLSI design: a circuits and systems perspective |last2=Harris |first2=David Money |date=2011 |publisher=Addison Wesley |isbn=978-0-321-54774-3 |edition=4th |location=Boston |oclc=473447233}}

The actual Timing Optimization Techniques usually include physical synthesis, which can eliminate negative slack by using a set of timing optimizations. The physical synthesis includes creating timing budgets and implementing timing corrections. Usually, the timing budgets contain allocating target delays along paths or nets during placement, routing stages, and timing correction operations. The timing corrections include:

File:Gate sizing effect graph.png

Gate Sizing:

Involves replacing logic gates with equivalent versions of different drive strengths. Larger gates can drive larger loads faster, reducing delays in critical paths. This technique balances speed against area and power.

There are 3 logic gates that have 3 sizes where Size(Vc) > Size(Vb) > Size(Va). The gates with larger sizes have smaller output resistance. Then R_{out}(V_c) < R_{out}(V_b) < R_{out}(V_a)

. According to the RC delay formula, t = R_{out} \times C_{load}.

t represents propagation delay, R_{out} represents output resistance, and C_{load} represents load capacitances

Therefore when load capacitances are large, larger logic gates can easily drive larger load capacitances: t(V_c) < t(V_b) < t(V_a).

When load capacitances are small, smaller logic gates can easily drive smaller load capacitances: t(V_c) > t(V_b) > t(V_a).

File:Buffer Insertion to Reduce Logic Gate Delay and Path Delay.png

Buffer Insertion:

Used to break long wires and reduce RC (resistance-capacitance) delays, especially in high fan-out or physically distant connections. Buffers can also help in adjusting path timing to fix hold violations. The buffer is a series of two serially connected inverters, where each inverter is composed of a triangle and a circle in a graph. The triangle in the graph means a logic gate, and the circle behind means logic inversion.

Improvements:

1: Speeding up the circuit or serving as delay elements

Buffers can reduce path delay by easily driving signals through long wires and on large load capacitances. In critical paths, inserting a buffer helps reduce resistance and improve signal propagation. Alternatively, buffers can also be intentionally placed to introduce a fixed delay for timing alignment.

= 2. Changing transition times =

A signal with a slow rise/fall time can cause unreliable switching and timing violations. Buffers sharpen the signal edges, improving the slope of the transitions and resulting in more stable digital behavior. This helps prevent glitches, short-circuit current, and false logic triggering.

= 3. Shielding capacitive load =

If a logic gate drives many other gates or long wires, the total load capacitances become large. This large load slows down the gate’s output response. Inserting a buffer between the gate and its heavy load offloads the burden, allowing the original gate to drive only the buffer and not the full load directly.

However, the drawbacks may include increased Area Usage and Increased Power Consumption.

Netlist Restructuring:

Netlist restructuring refers to the process of modifying the structure of an existing gate-level circuit without changing its logical functionality. It focuses on optimizing timing, area, or power by reorganizing or transforming how existing gates are connected or represented. The transformations include:

File:Netlist Restructuring- Gate Cloning can Reduce Fanout and Downstream Capacitance.png

Cloning: Duplicating gates to reduce load capacitances or balance load across multiple paths.

Redesigning the input/output tree: Changing how signals are distributed or received to improve timing or reduce congestion.

Swapping commutative pins: Reordering inputs of commutative gates (like AND, OR) to optimize critical paths and change connections.{{Cite web |last=ML |date=2024-03-28 |title=Netlist File in Digital VLSI Design Flow |url=https://yogish.com/blog/vlsi-blog/netlist-file-in-digital-vlsi-design-flow |access-date=2025-05-27 |website=Bale Tulu Kalpuga |language=en-US}}

Gate decomposition: Breaking complex gates into simpler forms, such as converting AND-OR logic into NAND-NAND logic by using CMOS inverters to simplify the logic gates and reduce path delay.{{Cite web |last=Banerjee |first=Kaustav |date=May 27, 2025 |title=ECE 225 High-Speed Digital IC Design |url=https://courses.ece.ucsb.edu/ECE125/125_W09Banerjee/Lectures/Lecture14.pdf}}

Boolean restructuring: Applying Boolean algebra rules to simplify or re-express logic equations, often minimizes path delay or leads to smaller implementations.

= Reverse transformations are also possible =

Operations such as gate downsizing, merging, or simplifying previously expanded logic structures can also be performed if it benefits overall design metrics (e.g., area or power).

These techniques are often applied automatically by physical synthesis and place-and-route tools (such as Synopsys IC Compiler, Cadence Innovus, or Intel Quartus), but can also be manually guided by designers through constraints and optimization directives.

Design flow

File:Design Flow.png

Utilize STA in iterative verification and validation:

After the routing steps are completed, the physical details of the design including wire lengths, capacitances, and resistance will be examined and determined. Conduct thorough functional verification and validation of the design such as STA to guarantee the integrity of function of timing optimizations, help to identify the timing violations and delay, and verify the effectiveness of the recent timing closure and optimization. Also, the designers can use simulation, verification, and hardware testing to validate the design's functionality and performance. If the circuit fails to meet the timing then the whole circuit will be placed at the STA process from the start iteratively.

Post-implementation timing analysis:

When the design is completed on the FPGA, post-implementation timing analysis validates that all timing goals are met. This analysis acts as a final examination of timing closure confirms the successful timing closure process and accounts for any implementation-specific factors.

See also

References