LLM aided design
{{Article for deletion/dated|page=LLM aided design|timestamp=20250619085151|year=2025|month=June|day=19|substed=yes|help=off}}
{{Draft article}}
----
LLM-aided design refers to the use of large language models (LLMs) as smart agents throughout the end-to-end process of system design, including conceptualization, prototyping, verification, and optimization. This evolving interdisciplinary model integrates advances in natural language processing (NLP), program synthesis, and automated reasoning to support tasks in domains such as electronic design automation (EDA), software engineering, hardware design, and cyber-physical systems.
Unlike traditional automation tools, LLMs - especially transformer-based architectures like GPT-4, ClaudeAnthropic et al. The Claude 3 Model Family: Opus, Sonnet, Haiku. Anthropic Model Card (PDF), 2024. [https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf Available online], LLaMA, and domain-specialized variants such as [https://ollama.com/library/codellama CodeLlama] - are capable of interpreting, generating, and refining structured and unstructured data including natural language specifications, HDL (Hardware Description Language)/HDL-like code, constraint definitions, tool scripts, and design documentation. LLM-aided design thus represents a shift from tool-assisted engineering to a form of co-design in which machine intelligence participates actively in architectural exploration, logic synthesis, formal verification, and post-silicon validation. It is situated at the intersection of artificial intelligence, computer-aided design (CAD), and systems engineering.
Introduction
Engineering workflows in hardware and software development have traditionally relied on manual translation of high-level design intents into machine-readable specifications. These processes, though robust, are time-consuming and often require significant domain expertise. The introduction of large language models into design workflows aims to streamline this process by enabling natural language interaction, synthesis of domain-specific artifacts, and integration with design toolchains.
In recent years, the field of engineering design has witnessed an exponential conjunction of artificial intelligence (AI) and domain-specific modeling. LLMs - such as GPT-4, Claude, and LLaMA - are capable of understanding and generating code, documents, and designs from natural language descriptions. This capacity opens a new area where human designers can work together with AI systems to ensure design correctness and reduce time-to-market. The aim is to allow designers to express intent in natural language and rely on the model to output Verilog, VHDL, HLS C, or firmware code.
LLM-aided design differs from earlier forms of automated design through its ability to generalize across tasks and contexts. Unlike rule-based or template-driven systems, large language models can encode domain-specific heuristics and adapt to various inputs—including design specifications, codebases, formal properties, and documentation—without requiring extensive retraining. This flexibility supports their use in diverse design settings such as system-on-chip development, embedded systems, robotic control, and cyber-physical system modeling.
A new epistemic layer is added to the engineering process by LLM-aided design, in which models contribute towards design reasoning rather than only carrying out commands. This allows use for flow control automation, formal assertion generation, and template retrieval for HLS code repair. Additionally, it gave rise to domain-adapted LLMs known as circuit foundation models (CFMs), which are capable of reasoning and generating across the whole RTL-to-GDSII pipeline.
Background and Foundations of LLM-Aided Design
The integration of large language models (LLMs) into electronic design automation (EDA) represents a shift in how hardware systems are specified, verified, and developed. While EDA has conventionally been defined by predefined workflows, rule-based synthesis tools, and extensive manual intervention, the growth of LLMs has introduced a new design angle driven by reasoning, abstraction, and human-language interaction. This shift aligns with the broader trajectory of artificial intelligence, where general-purpose models have increasingly been specialized for domain-specific tasks, including those that traditionally needed expert engineers.
=From Transformers to Circuit Reasoning=
The transformer architecture introduced by Vaswani et al. (2017)Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; and Polosukhin, Illia. Attention Is All You Need. *Proceedings of the 31st International Conference on Neural Information Processing Systems* (NIPS'17), 6000–6010. Curran Associates Inc. ISBN 9781510860964. [https://arxiv.org/abs/1706.03762 Available online] serves as the foundation of LLM-aided design. This architecture replaced RNNs and LSTMsSherstinsky, Alex. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena, vol. 404, 2020, p. 132306. [https://doi.org/10.1016/j.physd.2019.132306 Available online] in natural language processing due to its ability to simulate long-range dependencies with self-attention mechanisms. It serves as the basis for the GPT series, beginning with GPT-2 all the way to GPT-4o and more, with each iteration having significantly better capabilities in zero-shot reasoning, code generation, and language understanding.
By 2020, GPT-3's ability to produce functional code - including basic HTML, Python, and even Verilog-had drawn the interest of the AI community. This inspired hardware design researchers to speculate that LLMs could be used for logic design and verification activities by taking advantage of the structural similarities between programming languages and hardware description languages (HDLs). Early experiments using GPT-3 to write Verilog or assist in debugging demonstrated potential but also had critical limitations like poor syntax, hallucinations, and incompatibility with synthesis tools.
The attempt to address these limitations led to the exploration of a new direction - the creation of domain-specific foundation models tailored to EDA. These models - referred to as circuit foundation models — are trained or fine-tuned on HDL codes, simulation traces, synthesis logs, and constraint files. By 2023, tools like RTLLMLu, Yao; Liu, Shang; Zhang, Qijun; and Xie, Zhiyao. RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model. *Proceedings of the 29th Asia and South Pacific Design Automation Conference* (ASPDAC '24), 722–727. IEEE Press, 2024. [https://doi.org/10.1109/ASP-DAC58780.2024.10473904 Available online] began to deliver results with the vision of LLM-aided design through carefully engineered prompts, feedback loops, and domain-aligned datasets.
=Decoder vs. Encoder Models in Co-Design=
1. Decoder-Based Autoregressive Models: Based on architectures like GPT and [https://ollama.com/library/codellama CodeLlama], these models are used for generation tasks. They can translate natural language specifications into HDL, generate testbenches, and repair buggy RTL. Prompt chaining and few-shot learning are a few of many ways to make these models effective in synthesis-aligned code generation.
2. Encoder-Based Graph Reasoning Models: Inspired by models such as BERT and adapted into graph neural networks (e.g., ChipFormerLai, Yao; Liu, Jinxin; Tang, Zhentao; Wang, Bin; Hao, Jianye; and Luo, Ping. ChiPFormer: Transferable Chip Placement via Offline Decision Transformer. *Proceedings of the 40th International Conference on Machine Learning* (ICML '23), Article 757, 19 pages. JMLR.org, 2023. [https://proceedings.mlr.press/v202/lai23a.html Available online]), these models are optimized for inference tasks over structural representations like netlists or IRs. They can estimate timing, identify bottlenecks, and do logic equivalence checks.
The design ecosystem is increasingly adapting hybrid strategies, where decoder models generate artifacts and encoder models verify or optimize them-forming a closed co-design loop. This dual architecture is similar to human design workflows, where generation and validation are heavily co-dependent.
Methodological Landscape of LLM-Aided Design
LLM-aided design covers multiple stages of the hardware-software co-design pipeline, including natural language specification, HDL synthesis, analog circuit design, formal verification, and layout generation. While foundational techniques such as prompting, supervised fine-tuning (SFT), and retrieval-augmented generation (RAG) cover much of the field, their practical application is widespread based on the nature of the task. To provide a comprehensive view, the following summary table classifies typical LLM methodologies by their corresponding EDA task domain for a few recently published domain-specific representative LLMs/Tools:
= Core Methodologies =
Below are a few core methodologies, with insights from recent tools and frameworks:
== Specification to HDL Translation ==
LLMs can generate synthesizable RTL (Verilog, VHDL) directly from natural language specifications. This process is significantly enhanced using:
- Prompt engineering and hierarchical prompting, for structured code generation,
- Context window expansion, to provide multi-level module and signal context,
- Self-refinement and feedback from compiler logs, allowing the LLM to repair and converge to synthesizable HDL,
- Score-based supervised fine-tuning (SFT), as seen in tools like RTLLM, VeriGen, and RTLFixer, to improve alignment with design and functional correctness.
==Testbench and Assertion Generation ==
LLMs synthesize SystemVerilog assertions, property checks, and full test environments using examples and coverage goals. Verification environments, SystemVerilog assertions (SVA), and test stimuli can be automatically synthesized using:
== HDL Debugging and Repair ==
Using templates, similarity search, and error log analysis, LLMs can auto-repair syntax and functional bugs. LLMs assist in both syntactic repair (fixing compilation errors) and semantic repair (correcting logical/functional behavior), leveraging:
- Template libraries and error log parsing,
- Similarity search from past fixes,
- Retrieval-Augmented Generation (RAG) pipelines such as RTLFixer and MEIC, which iteratively improve code until it passes lint, synthesis, or formal checks.
== HLS Code Refinement ==
Standard C/C++ is often incompatible with HLS constraints (e.g., recursion, pointers). LLMs identify and rewrite such constructs by:
- Detecting and rewriting non-HLS-friendly patterns using prompt-repair pipelines,
- Generating test harnesses and compiler hints (e.g., `#pragma HLS unroll`),
- Tools like GPT4AIGChipFu, Y.; et al. GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models. *Proceedings of the 2023 IEEE/ACM International Conference on Computer Aided Design* (ICCAD '23), San Francisco, CA, USA, pp. 1–9. IEEE, 2023. [https://doi.org/10.1109/ICCAD57390.2023.10323953 Available online] convert ML kernels into synthesizable HLS by combining structural abstraction and loop pattern rewrites.
==Constraint Generation==
Constraint files are essential for synthesis, placement, and timing correctness. LLMs like ChatEDA support this through:
- Instruction tuning, enabling fine-grained command generation (e.g., for SDC, XDC formats),
- Retrieval-Augmented Generation (RAG), which pulls prior constraints from similar designs or databases to ensure domain-consistent generation,
- Generating multi-domain timing, placement, and IO constraints with contextual accuracy.
==Floorplan and Layout Synthesis==
Physical design requires careful placement and routing. LLM-vision hybrid models such as LayoutCopilot and ChatEDA employ:
- Vision-language modeling to interpret and manipulate layout imagery (DEF/GDSII),
- TCL script generation, customized for tools like [https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/soc-implementation-and-floorplanning/innovus-implementation-system.html Innovus] and [https://www.synopsys.com/implementation-and-signoff/physical-implementation/ic-compiler.html ICC2],
- Automatic power grid and macro placement proposals, based on learned design intents.
==Analog Circuit Synthesis==
Analog design poses unique challenges due to its sensitivity and lack of digital abstraction. Tools like AnalogCoder and LaMAGIC use:
- Topology suggestion via LLMs, based on specification matching (gain, slew, bandwidth),
- Layout constraint prediction, such as symmetry, matching, and parasitic awareness,
- Bayesian optimization and tuning, informed by LLM predictions for transistor sizing and performance trade-offs.
These methodologies collectively depict LLMs as design agents capable of integrating with CAD flows, reasoning over heterogeneous inputs (text, code, specs, layout), and adapting to domain-specific constraints. As tools mature, the distinction between synthesis, verification, and optimization continues to blur—paving the way for closed-loop, autonomous hardware design.
Among these, HDL generation has emerged as one of the most deeply investigated tasks in LLM-aided EDA research, serving as a methodological testbed for broader design automation challenges. It captures the full interplay between natural language, symbolic code, feedback refinement, and tool integration. The following case study synthesizes key techniques employed in HDL generation workflows.
=Methodological Classification of HDL Generation: A Case Study=
The following table, constructed using detailed insights from recent papers, including the 2025 survey by Pan et al., highlights the methodologies underlying LLM-aided HDL generation
class="wikitable"
|+ HDL Generation Methodologies Using LLMs | |||
Project Name | Model Used | Approach Type | Summary |
---|---|---|---|
RTLLM | GPT-3.5 | Prompt Engineering | Multi-step planning-based prompt design with syntax and functional log feedback. |
Chip-ChatBlocklove, Jason; Garg, Siddharth; Karri, Ramesh; and Pearce, Hammond. Chip-Chat: Challenges and Opportunities in Conversational Hardware Design. In: Proceedings of the 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD), IEEE, Sept. 2023, pp. 1–6. [https://doi.org/10.1109/MLCAD58807.2023.10299874 Available online] | ChatGPT-4 | Conversational Co-design | Full pipeline HDL synthesis guided via interactive dialogue with GPT-4. |
VeriGen | CodeGen-16B | Fine-tuning | Trained on textbook + GitHub Verilog, improved synthesis-valid output, syntax robustness. |
ChatEDA | LLaMA-20B | QLoRA + Instruction Tuning | Trained on GPT-4-generated EDA instructions; interprets and executes user commands. |
RTLCoder | Mistral-7B | Scored SFT | Uses synthesis scores to steer SFT toward functionally valid and resource-efficient HDL |
BetterVPei, Zehua; Zhen, Hui-Ling; Yuan, Mingxuan; Huang, Yu; and Yu, Bei. BetterV: Controlled Verilog Generation with Discriminative Guidance. *Proceedings of the 41st International Conference on Machine Learning* (ICML '24), Article 1628, 9 pages. JMLR.org, 2024. [https://proceedings.mlr.press/v202/pei24a.html Available online] | CodeLlama + TinyLlama | Controlled Gen + SFT | Bayesian discriminator modifies token probability for valid HDL output |
RTLFixer | GPT-4 | RAG + Agent Framework | Uses ReAct prompting and error categorization DBs for debug-oriented HDL refinement. |
These methods highlight key trends and research frontiers:
- Prompting + Logs: RTLLM is an example of tools that show that prompting alone, when combined with feedback from toolchains, is sufficient for competitive HDL generation without model retraining.
- Fine-tuning on RTL: VeriGen and RTLCoder show that focused fine-tuning, especially with quality metrics (e.g., synthesis logs, functional correctness), significantly improves output robustness.
- Controlled Generation: BetterV uses probabilistic controls in token sampling, pushing Verilog generation beyond maximum-likelihood decoding.
- Agent Architectures: RTLFixer embodies an emerging paradigm where LLMs serve not just as code generators, but as self-refining agents—reading logs, tracing waveforms, and performing symbolic analysis.
The table also highlights the significance of multi-agent collaboration, retrieval-augmented generation (RAG), and tool-in-the-loop frameworks, which move beyond simple completion tasks into autonomous reasoning and repair. The performance advantages of fine-tuned and multi-modal frameworks over traditional prompting, as shown in benchmarks like VerilogEvalPinckney, Nathaniel; Batten, Christopher; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code Generation. *ACM Transactions on Design Automation of Electronic Systems* (TODAES), Association for Computing Machinery, February 2025. [https://doi.org/10.1145/3718088 Available online] and PyHDL-EvalBatten, Christopher; Pinckney, Nathaniel; Liu, Mingjie; Ren, Haoxing; and Khailany, Brucek. PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs. In: Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD '24), ACM, 2024, article 10, pp. 1–17. [https://doi.org/10.1145/3670474.3685948 Available online], confirm that tightly integrated model-tool co-evolution is needed for true engineering-grade HDL generation.
Datasets and Evaluation Infrastructure
Large language models in EDA are developed, tuned, and evaluated using robust datasets. These datasets come in a range of formats, from performance metrics and natural language requirements to tokenized Verilog corpora and annotated tool logs. They make it possible for supervised fine-tuning, domain adaptation, and benchmarking for synthesis validity and generation quality.
In addition to increasing dataset volume, recent initiatives have improved granularity and diversity. Instruction-tuned datasets like ChatEDA teach LLMs how to interact with toolchains; benchmark sets such as VerilogEval assess model output quality; and design-level corpora like RTLCoder and [https://huggingface.co/datasets/GaTech-EIC/MG-Verilog MG-Verilog] offer structural annotations and synthesis metadata. Human-annotated multilingual Verilog pairs that facilitate abstraction and cross-language translation are provided by the [https://huggingface.co/datasets/GaTech-EIC/MG-Verilog MG-Verilog]. The VeriGen dataset uses textbook-derived Verilog tasks to facilitate fundamental pedagogical finetuning.
Tooling and Infrastructure: Practical Deployments
Several practical tools now demonstrate that LLM-aided design is no longer theoretical:
- ChatEDA : Serves as a natural language interface for controlling Vivado, [https://www.intel.com/content/www/us/en/products/details/fpga/development-tools/quartus-prime.html Quartus], or [https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/soc-implementation-and-floorplanning/innovus-implementation-system.html Innovus] workflows. It interprets user intent and translates it into tool-specific commands.
- RTLLM-Editor: An IDE that integrates real-time HDL generation, compilation feedback, and syntax repair.
- LLM4DV and AutoSVA: Specialized for formal verification, these tools generate SystemVerilog assertions and support coverage-driven testbench synthesis.
These tools reflect an operational maturity and are being integrated into prototyping, verification, closure, and constraint generation workflows.
See Also
- Large language model
- natural language processing
- Electronic system-level design and verification
- Systems design
- hardware design
- Embedded system
- Design space exploration
- GPT-4
- Llama (language model)
- Hardware description language
- Formal verification
- artificial intelligence
- computer-aided design
- Verilog
- VHDL
- System on a chip
- Retrieval-augmented generation
- SystemVerilog
- Transformer (deep learning architecture)
- Fine-tuning (deep learning)
- Register-transfer level
- Design space exploration