Prompt engineering#Retrieval-augmented generation

{{Short description|Structuring text as input to generative artificial intelligence}}

Prompt engineering is the process of structuring or crafting an instruction in order to produce the best possible output from a generative artificial intelligence (AI) model.{{cite web|first=Dina |last=Genkina |title=AI Prompt Engineering is Dead: Long live AI prompt engineering |url=https://spectrum.ieee.org/prompt-engineering-is-dead |website=IEEE Spectrum |date=2024-03-06 |access-date=2025-01-18}}

A prompt is natural language text describing the task that an AI should perform.{{cite web |last1=Radford |first1=Alec |last2=Wu |first2=Jeffrey |last3=Child |first3=Rewon |last4=Luan |first4=David |last5=Amodei |first5=Dario |author-link5=Dario Amodei |last6=Sutskever |first6=Ilya |author-link6=Ilya Sutskever |year=2019 |title=Language Models are Unsupervised Multitask Learners |url=https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf |publisher=OpenAI |quote="We demonstrate language models can perform down-stream tasks in a zero-shot setting – without any parameter or architecture modification"}} A prompt for a text-to-text language model can be a query, a command, or a longer statement including context, instructions, and conversation history. Prompt engineering may involve phrasing a query, specifying a style, choice of words and grammar,{{Cite book |last1=Wahle |first1=Jan Philip |title=Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing |last2=Ruas |first2=Terry |last3=Xu |first3=Yang |last4=Gipp |first4=Bela |date=2024 |publisher=Association for Computational Linguistics |editor-last=Al-Onaizan |editor-first=Yaser |location=Miami, Florida, USA |pages=11004–11033 |chapter=Paraphrase Types Elicit Prompt Engineering Capabilities |doi=10.18653/v1/2024.emnlp-main.617 |editor2-last=Bansal |editor2-first=Mohit |editor3-last=Chen |editor3-first=Yun-Nung |chapter-url=https://aclanthology.org/2024.emnlp-main.617/ |arxiv=2406.19898}} providing relevant context, or describing a character for the AI to mimic.

When communicating with a text-to-image or a text-to-audio model, a typical prompt is a description of a desired output such as "a high-quality photo of an astronaut riding a horse"{{Cite web|last=Heaven|first=Will Douglas|title=This horse-riding astronaut is a milestone on AI's long road towards understanding|url=https://www.technologyreview.com/2022/04/06/1049061/dalle-openai-gpt3-ai-agi-multimodal-image-generation/|website=MIT Technology Review|access-date=2023-08-14|date=April 6, 2022}} or "Lo-fi slow BPM electro chill with organic samples".{{cite web|last=Wiggers|first=Kyle|title=Meta open sources an AI-powered music generator|url=https://techcrunch.com/2023/06/12/meta-open-sources-an-ai-powered-music-generator/|publisher=TechCrunch|date=2023-06-12|access-date=2023-08-15|quote=Next, I gave a more complicated prompt to attempt to throw MusicGen for a loop: "Lo-fi slow BPM electro chill with organic samples."}} Prompting a text-to-image model may involve adding, removing, or emphasizing words to achieve a desired subject, style, layout, lighting, and aesthetic.{{Cite web |last=Mittal |first=Aayush |date=2023-07-27 |title=Mastering AI Art: A Concise Guide to Midjourney and Prompt Engineering |url=https://www.unite.ai/mastering-ai-art-a-concise-guide-to-midjourney-and-prompt-engineering/ |access-date=2025-05-09 |website=Unite.AI |language=en-US}}

History

In 2018, researchers first proposed that all previously separate tasks in natural language processing (NLP) could be cast as a question-answering problem over a context. In addition, they trained a first single, joint, multi-task model that would answer any task-related question like "What is the sentiment" or "Translate this sentence to German" or "Who is the president?"{{Cite conference |date=2018 |title=The Natural Language Decathlon: Multitask Learning as Question Answering |arxiv=1806.08730 |conference=ICLR}}

The AI boom saw an increase in the amount of "prompting technique" to get the model to output the desired outcome and avoid nonsensical output, a process characterized by trial-and-error.{{Cite journal |last1=Knoth |first1=Nils |last2=Tolzin |first2=Antonia |last3=Janson |first3=Andreas |last4=Leimeister |first4=Jan Marco |date=2024-06-01 |title=AI literacy and its implications for prompt engineering strategies |journal=Computers and Education: Artificial Intelligence |volume=6 |pages=100225 |doi=10.1016/j.caeai.2024.100225 |issn=2666-920X|doi-access=free }} After the release of ChatGPT in 2022, prompt engineering was soon seen as an important business skill, albeit one with an uncertain economic future.

A repository for prompts reported that over 2,000 public prompts for around 170 datasets were available in February 2022.{{Cite conference |date=2022 |title=PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts |url=https://aclanthology.org/2022.acl-demo.9/ |conference=Association for Computational Linguistics}} In 2022, the chain-of-thought prompting technique was proposed by Google researchers.{{cite conference |last1=Wei |first1=Jason |last2=Wang |first2=Xuezhi |last3=Schuurmans |first3=Dale |last4=Bosma |first4=Maarten |last5=Ichter |first5=Brian |last6=Xia |first6=Fei |last7=Chi |first7=Ed H. |last8=Le |first8=Quoc V. |last9=Zhou |first9=Denny |date=31 October 2022 |title=Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |conference=Advances in Neural Information Processing Systems (NeurIPS 2022) |language=en |volume=35 |arxiv=2201.11903}}{{Cite web |last=Brubaker |first=Ben |date=2024-03-21 |title=How Chain-of-Thought Reasoning Helps Neural Networks Compute |url=https://www.quantamagazine.org/how-chain-of-thought-reasoning-helps-neural-networks-compute-20240321/ |access-date=2025-05-09 |website=Quanta Magazine |language=en}} In 2023, several text-to-text and text-to-image prompt databases were made publicly available.{{Cite web|url=https://www.nytimes.com/2023/06/23/technology/ai-chatbot-life-coach.html|title=How to Turn Your Chatbot Into a Life Coach|last=Chen|first=Brian X.|date=2023-06-23|access-date=|website=The New York Times}}{{Cite news |last=Chen |first=Brian X. |date=2023-05-25 |title=Get the Best From ChatGPT With These Golden Prompts |url=https://www.nytimes.com/2023/05/25/technology/ai-chatbot-chatgpt-prompts.html |url-access=registration |access-date=2023-08-16 |work=The New York Times |language=en-US |issn=0362-4331}} The Personalized Image-Prompt (PIP) dataset, a generated image-text dataset that has been categorized by 3,115 users, has also been made available publicly in 2024.{{Cite book |last1=Chen |first1=Zijie |last2=Zhang |first2=Lichao |last3=Weng |first3=Fangsheng |last4=Pan |first4=Lili |last5=Lan |first5=Zhenzhong |chapter=Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting |date=2024-06-16 |title=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |chapter-url=https://doi.org/10.1109/cvpr52733.2024.00738 |publisher=IEEE |pages=7727–7736 |doi=10.1109/cvpr52733.2024.00738|arxiv=2310.08129 |isbn=979-8-3503-5300-6 }}

Text-to-text

Multiple distinct prompt engineering techniques have been published.

= Chain-of-thought =

According to Google Research, chain-of-thought (CoT) prompting is a technique that allows large language models (LLMs) to solve a problem as a series of intermediate steps before giving a final answer. In 2022, Google Brain reported that chain-of-thought prompting improves reasoning ability by inducing the model to answer a multi-step problem with steps of reasoning that mimic a train of thought.{{Cite web |author=Narang |first1=Sharan |last2=Chowdhery |first2=Aakanksha |date=2022-04-04 |title=Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance |url=https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html |website=ai.googleblog.com}} Chain-of-thought techniques were developed to help LLMs handle multi-step reasoning tasks, such as arithmetic or commonsense reasoning questions.{{cite web |last1=Dang |first1=Ekta |title=Harnessing the power of GPT-3 in scientific research |url=https://venturebeat.com/ai/harnessing-the-power-of-gpt-3-in-scientific-research/ |website=VentureBeat |access-date=10 March 2023 |date=8 February 2023}}{{cite web |last1=Montti |first1=Roger |title=Google's Chain of Thought Prompting Can Boost Today's Best Algorithms |url=https://www.searchenginejournal.com/google-chain-of-thought-prompting/450106/ |website=Search Engine Journal |access-date=10 March 2023 |language=en |date=13 May 2022}}

For example, given the question, "Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?", Google claims that a CoT prompt might induce the LLM to answer "A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9." When applied to PaLM, a 540 billion parameter language model, according to Google, CoT prompting significantly aided the model, allowing it to perform comparably with task-specific fine-tuned models on several tasks, achieving state-of-the-art results at the time on the GSM8K mathematical reasoning benchmark. It is possible to fine-tune models on CoT reasoning datasets to enhance this capability further and stimulate better interpretability.{{Cite journal |date=2024 |title=Scaling Instruction-Finetuned Language Models |url=https://jmlr.org/papers/volume25/23-0870/23-0870.pdf |journal=Journal of Machine Learning Research}}{{cite web |last1=Wei |first1=Jason |last2=Tay |first2=Yi |date=29 November 2022 |title=Better Language Models Without Massive Compute |url=https://ai.googleblog.com/2022/11/better-language-models-without-massive.html |access-date=10 March 2023 |website=ai.googleblog.com |language=en}}

An example of a CoT prompting:{{Cite journal |date=2022 |title=Large Language Models are Zero-Shot Reasoners |journal=NeurIPS|arxiv=2205.11916 |last1=Kojima |first1=Takeshi |author2=Shixiang Shane Gu |last3=Reid |first3=Machel |last4=Matsuo |first4=Yutaka |last5=Iwasawa |first5=Yusuke }}

Q: {question}

A: Let's think step by step.

As originally proposed by Google, each CoT prompt included a few Q&A examples. This made it a few-shot prompting technique. However, according to researchers at Google and the University of Tokyo, simply appending the words "Let's think step-by-step" was also effective, which makes CoT a zero-shot prompting technique. OpenAI claims that this prompt allows for better scaling as a user no longer needs to formulate many specific CoT Q&A examples.{{cite web |last=Dickson |first=Ben |title=LLMs have not learned our language — we're trying to learn theirs |url=https://venturebeat.com/ai/llms-have-not-learned-our-language-were-trying-to-learn-theirs%EF%BF%BC/ |website=VentureBeat |access-date=10 March 2023 |date=30 August 2022}}

= In-context learning =

In-context learning, refers to a model's ability to temporarily learn from prompts. For example, a prompt may include a few examples for a model to learn from, such as asking the model to complete "maison {{arrow}} house, chat {{arrow}} cat, chien {{arrow}}" (the expected response being dog),{{Cite journal |last1=Garg |first1=Shivam |last2=Tsipras |first2=Dimitris |last3=Liang |first3=Percy |last4=Valiant |first4=Gregory |date=2022 |title=What Can Transformers Learn In-Context? A Case Study of Simple Function Classes |journal=NeurIPS|arxiv=2208.01066 }} an approach called few-shot learning.{{Cite journal |last1=Brown |first1=Tom |last2=Mann |first2=Benjamin |last3=Ryder |first3=Nick |last4=Subbiah |first4=Melanie |last5=Kaplan |first5=Jared D. |last6=Dhariwal |first6=Prafulla |last7=Neelakantan |first7=Arvind |year=2020 |title=Language models are few-shot learners |journal=Advances in Neural Information Processing Systems |volume=33 |pages=1877–1901 |arxiv=2005.14165}}

In-context learning is an emergent ability{{Cite journal |date=October 2022 |title=Emergent Abilities of Large Language Models |journal=Transactions on Machine Learning Research |arxiv=2206.07682 |quote="In prompting, a pre-trained language model is given a prompt (e.g. a natural language instruction) of a task and completes the response without any further training or gradient updates to its parameters... The ability to perform a task via few-shot prompting is emergent when a model has random performance until a certain scale, after which performance increases to well-above random" |last1=Wei |first1=Jason |last2=Tay |first2=Yi |last3=Bommasani |first3=Rishi |last4=Raffel |first4=Colin |last5=Zoph |first5=Barret |last6=Borgeaud |first6=Sebastian |last7=Yogatama |first7=Dani |last8=Bosma |first8=Maarten |last9=Zhou |first9=Denny |last10=Metzler |first10=Donald |last11=Chi |first11=Ed H. |last12=Hashimoto |first12=Tatsunori |last13=Vinyals |first13=Oriol |last14=Liang |first14=Percy |last15=Dean |first15=Jeff |last16=Fedus |first16=William }} of large language models. It is an emergent property of model scale, meaning that breaks{{Cite journal |date=2023 |title=Broken Neural Scaling Laws |journal=ICLR|arxiv=2210.14891 |last1=Caballero |first1=Ethan |last2=Gupta |first2=Kshitij |last3=Rish |first3=Irina |last4=Krueger |first4=David }} in downstream scaling laws occur, leading to its efficacy increasing at a different rate in larger models than in smaller models. Unlike training and fine-tuning, which produce lasting changes, in-context learning is temporary.{{cite web |last1=Musser |first1=George |author-link=George Musser |title=How AI Knows Things No One Told It |url=https://www.scientificamerican.com/article/how-ai-knows-things-no-one-told-it/ |access-date=17 May 2023 |website=Scientific American |quote="By the time you type a query into ChatGPT, the network should be fixed; unlike humans, it should not continue to learn. So it came as a surprise that LLMs do, in fact, learn from their users' prompts—an ability known as in-context learning."}} Training models to perform in-context learning can be viewed as a form of meta-learning, or "learning to learn".{{Cite journal |date=2022 |title=What Can Transformers Learn In-Context? A Case Study of Simple Function Classes |url= |journal=NeurIPS |arxiv=2208.01066 |quote=Training a model to perform in-context learning can be viewed as an instance of the more general learning-to-learn or meta-learning paradigm |last1=Garg |first1=Shivam |last2=Tsipras |first2=Dimitris |last3=Liang |first3=Percy |last4=Valiant |first4=Gregory }}

= Self-consistency decoding =

Self-consistency decoding performs several chain-of-thought rollouts, then selects the most commonly reached conclusion out of all the rollouts.{{Cite conference |date=2023 |title=Self-Consistency Improves Chain of Thought Reasoning in Language Models |arxiv=2203.11171 |conference=ICLR}}

= Tree-of-thought =

Tree-of-thought prompting generalizes chain-of-thought by generating multiple lines of reasoning in parallel, with the ability to backtrack or explore other paths. It can use tree search algorithms like breadth-first, depth-first, or beam.{{Cite web |last=Mittal |first=Aayush |date=2024-05-27 |title=Latest Modern Advances in Prompt Engineering: A Comprehensive Guide |url=https://www.unite.ai/latest-modern-advances-in-prompt-engineering-a-comprehensive-guide/ |access-date=2025-05-08 |website=Unite.AI |language=en-US}}{{Cite conference |date=2023 |title=Tree of Thoughts: Deliberate Problem Solving with Large Language Models |arxiv=2305.10601 |conference=NeurIPS}}

= Prompting to estimate model sensitivity =

Research consistently demonstrates that LLMs are highly sensitive to subtle variations in prompt formatting, structure, and linguistic properties. Some studies have shown up to 76 accuracy points across formatting changes in few-shot settings.{{Cite conference |date=2024 |title=Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting |arxiv=2310.11324 |conference=ICLR}} Linguistic features significantly influence prompt effectiveness—such as morphology, syntax, and lexico-semantic changes—which meaningfully enhance task performance across a variety of tasks.{{Cite journal |last1=Leidinger |first1=Alina |last2=van Rooij |first2=Robert |last3=Shutova |first3=Ekaterina |date=2023 |editor-last=Bouamor |editor-first=Houda |editor2-last=Pino |editor2-first=Juan |editor3-last=Bali |editor3-first=Kalika |title=The language of prompting: What linguistic properties make a prompt successful? |url=https://aclanthology.org/2023.findings-emnlp.618/ |journal=Findings of the Association for Computational Linguistics: EMNLP 2023 |location=Singapore |publisher=Association for Computational Linguistics |pages=9210–9232 |doi=10.18653/v1/2023.findings-emnlp.618|arxiv=2311.01967 }} Clausal syntax, for example, improves consistency and reduces uncertainty in knowledge retrieval.{{Cite book |last1=Linzbach |first1=Stephan |last2=Dimitrov |first2=Dimitar |last3=Kallmeyer |first3=Laura |last4=Evang |first4=Kilian |last5=Jabeen |first5=Hajira |last6=Dietze |first6=Stefan |chapter=Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models |date=June 2024 |editor-last=Duh |editor-first=Kevin |editor2-last=Gomez |editor2-first=Helena |editor3-last=Bethard |editor3-first=Steven |title=Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) |chapter-url=https://aclanthology.org/2024.naacl-long.201/ |location=Mexico City, Mexico |publisher=Association for Computational Linguistics |pages=3645–3655 |doi=10.18653/v1/2024.naacl-long.201|arxiv=2404.01992 }} This sensitivity persists even with larger model sizes, additional few-shot examples, or instruction tuning.

To address sensitivity of models and make them more robust, several methods have been proposed. FormatSpread facilitates systematic analysis by evaluating a range of plausible prompt formats, offering a more comprehensive performance interval. Similarly, PromptEval estimates performance distributions across diverse prompts, enabling robust metrics such as performance quantiles and accurate evaluations under constrained budgets.{{Cite conference |date=2024 |title=Efficient multi-prompt evaluation of LLMs |arxiv=2405.17202 |conference=NeurIPS}}

= Automatic prompt generation =

== Retrieval-augmented generation ==

Retrieval-augmented generation (RAG) is a technique that enables generative artificial intelligence (Gen AI) models to retrieve and incorporate new information. It modifies interactions with a large language model (LLM) so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information.{{Cite web |date=31 May 2024 |title=Why Google's AI Overviews gets things wrong |url=https://www.technologyreview.com/2024/05/31/1093019/why-are-googles-ai-overviews-results-so-bad/ |access-date=7 March 2025 |website=MIT Technology Review}}

RAG improves large language models (LLMs) by incorporating information retrieval before generating responses. Unlike traditional LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have led to real-world issues like chatbots inventing policies or lawyers citing nonexistent legal cases. By dynamically retrieving information, RAG enables AI to provide more accurate responses without frequent retraining.{{Cite web |date=6 June 2024 |title=Can a technology called RAG keep AI models from making stuff up? |url=https://arstechnica.com/ai/2024/06/can-a-technology-called-rag-keep-ai-models-from-making-stuff-up/ |access-date=7 March 2025 |website=Ars Technica}}

== Graph retrieval-augmented generation ==

File:GraphRAG.svg

GraphRAG (coined by Microsoft Research) is a technique that extends RAG with the use of a knowledge graph (usually, LLM-generated) to allow the model to connect disparate pieces of information, synthesize insights, and holistically understand summarized semantic concepts over large data collections. It was shown to be effective on datasets like the Violent Incident Information from News Articles (VIINA).{{citation |last1=Larson |first1=Jonathan |title=GraphRAG: Unlocking LLM discovery on narrative private data |date=February 13, 2024 |url=https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/ |publisher=Microsoft |last2=Truitt |first2=Steven}}{{Cite web |title=An Introduction to Graph RAG |url=https://www.kdnuggets.com/an-introduction-to-graph-rag |access-date=2025-05-09 |website=KDnuggets |language=en-US}}

Earlier work showed the effectiveness of using a knowledge graph for question answering using text-to-query generation.{{Cite journal |date=2023 |title=A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases |journal=Grades-Nda|arxiv=2311.07509 |last1=Sequeda |first1=Juan |last2=Allemang |first2=Dean |last3=Jacob |first3=Bryon }} These techniques can be combined to search across both unstructured and structured data, providing expanded context, and improved ranking.

== Using language models to generate prompts ==

Large language models (LLM) themselves can be used to compose prompts for large language models.{{Cite conference |date=2023 |title=Explaining Patterns in Data with Language Models via Interpretable Autoprompting |url=https://aclanthology.org/2023.blackboxnlp-1.3.pdf |conference=BlackboxNLP Workshop |arxiv=2210.01848}} The automatic prompt engineer algorithm uses one LLM to beam search over prompts for another LLM:{{Cite conference |date=2023 |title=Large Language Models are Human-Level Prompt Engineers |arxiv=2211.01910 |conference=ICLR}}{{Cite journal |last1=Pryzant |first1=Reid |last2=Iter |first2=Dan |last3=Li |first3=Jerry |last4=Lee |first4=Yin Tat |last5=Zhu |first5=Chenguang |last6=Zeng |first6=Michael |date=2023 |title=Automatic Prompt Optimization with "Gradient Descent" and Beam Search |url=https://aclanthology.org/2023.emnlp-main.494/ |journal=Conference on Empirical Methods in Natural Language Processing |pages=7957–7968 |doi=10.18653/v1/2023.emnlp-main.494 |arxiv=2305.03495}}

There are two LLMs. One is the target LLM, and another is the prompting LLM.
Prompting LLM is presented with example input-output pairs, and asked to generate instructions that could have caused a model following the instructions to generate the outputs, given the inputs.
Each of the generated instructions is used to prompt the target LLM, followed by each of the inputs. The log-probabilities of the outputs are computed and added. This is the score of the instruction.
The highest-scored instructions are given to the prompting LLM for further variations.
Repeat until some stopping criteria is reached, then output the highest-scored instructions.

CoT examples can be generated by LLM themselves. In "auto-CoT", a library of questions are converted to vectors by a model such as BERT. The question vectors are clustered. Questions close to the centroid of each cluster are selected, in order to have a subset of diverse questions. An LLM does zero-shot CoT on each selected question. The question and the corresponding CoT answer are added to a dataset of demonstrations. These diverse demonstrations can then added to prompts for few-shot learning.{{Cite conference |date=2023 |title=Automatic Chain of Thought Prompting in Large Language Models |arxiv=2210.03493 |conference=ICLR}}

Text-to-image

{{see also|Artificial intelligence art#Prompt engineering and sharing|Artificial intelligence art}}

File:Fooocus 2.5.5 screenshot showing the prompt section.webp]]

In 2022, text-to-image models like DALL-E 2, Stable Diffusion, and Midjourney were released to the public. These models take text prompts as input and use them to generate images.{{Cite web |last=Goldman |first=Sharon |date=2023-01-05 |title=Two years after DALL-E debut, its inventor is "surprised" by impact |url=https://venturebeat.com/ai/two-years-after-dall-e-debut-its-inventor-is-surprised-by-impact/ |access-date=2025-05-09 |website=VentureBeat |language=en-US}}

{{multiple image

| direction = vertical

| align = right

| total_width = 200

| image1 = Algorithmically-generated landscape artwork of forest with Shinto shrine.png

| image2 = Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for green trees.png

| image3 = Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for round stones.png

| footer = Demonstration of the effect of negative prompts on images generated with Stable Diffusion

{{bulleted list|Top: no negative prompt

|Centre: "green trees"

|Bottom: "round stones, round rocks"}}

}}

= Prompt formats =

Early text-to-image models typically don't understand negation, grammar and sentence structure in the same way as large language models, and may thus require a different set of prompting techniques. The prompt "a party with no cake" may produce an image including a cake.{{Cite web |title=Prompts |url=https://docs.midjourney.com/docs/prompts |access-date=2023-08-14 |website=docs.midjourney.com}} As an alternative, negative prompts allow a user to indicate, in a separate prompt, which terms should not appear in the resulting image.{{Cite web |date=2022-09-07 |title=Why Does This Horrifying Woman Keep Appearing in AI-Generated Images? |url=https://www.vice.com/en/article/why-does-this-horrifying-woman-keep-appearing-in-ai-generated-images/ |access-date=2025-05-09 |website=VICE |language=en-US}} Techniques such as framing the normal prompt into a sequence-to-sequence language modeling problem can be used to automatically generate an output for the negative prompt.{{Cite journal |last1=Goldblum |first1=R. |last2=Pillarisetty |first2=R. |last3=Dauphinee |first3=M. J. |last4=Talal |first4=N. |date=1975 |title=Acceleration of autoimmunity in NZB/NZW F1 mice by graft-versus-host disease |journal=Clinical and Experimental Immunology |volume=19 |issue=2 |pages=377–385 |issn=0009-9104 |pmc=1538084 |pmid=2403}}

A text-to-image prompt commonly includes a description of the subject of the art, the desired medium (such as digital painting or photography), style (such as hyperrealistic or pop-art), lighting (such as rim lighting or crepuscular rays), color, and texture.{{Cite web|url=https://stable-diffusion-art.com/prompt-guide/|title=Stable Diffusion prompt: a definitive guide|date=2023-05-14|access-date=2023-08-14}} Word order also affects the output of a text-to-image prompt. Words closer to the start of a prompt may be emphasized more heavily.{{Cite web |last1=Diab |first1=Mohamad |last2=Herrera |first2=Julian |last3=Chernow |first3=Bob |date=2022-10-28 |title=Stable Diffusion Prompt Book |url=https://cdn.openart.ai/assets/Stable%20Diffusion%20Prompt%20Book%20From%20OpenArt%2011-13.pdf |access-date=2023-08-07 |quote="Prompt engineering is the process of structuring words that can be interpreted and understood by a text-to-image model. Think of it as the language you need to speak in order to tell an AI model what to draw."}}

The Midjourney documentation encourages short, descriptive prompts: instead of "Show me a picture of lots of blooming California poppies, make them bright, vibrant orange, and draw them in an illustrated style with colored pencils", an effective prompt might be "Bright orange California poppies drawn with colored pencils".

= Artist styles =

Some text-to-image models are capable of imitating the style of particular artists by name. For example, the phrase in the style of Greg Rutkowski has been used in Stable Diffusion and Midjourney prompts to generate images in the distinctive style of Polish digital artist Greg Rutkowski.{{Cite web|url=https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-happy-about-it/|title=This Artist Is Dominating AI-Generated Art and He's Not Happy About It|last=Heikkilä|first=Melissa|date=2022-09-16|website=MIT Technology Review|access-date=2023-08-14}} Famous artists such as Vincent van Gogh and Salvador Dalí have also been used for styling and testing.{{Cite web |last=Solomon |first=Tessa |date=2024-08-28 |title=The AI-Powered Ask Dalí and Hello Vincent Installations Raise Uncomfortable Questions about Ventriloquizing the Dead |url=https://www.artnews.com/art-news/issue/salvador-dali-vincent-van-gogh-ai-installations-ethics-1234714954/ |access-date=2025-01-10 |website=ARTnews.com |language=en-US}}

Non-text prompts

Some approaches augment or replace natural language text prompts with non-text input.

= Textual inversion and embeddings =

For text-to-image models, textual inversion performs an optimization process to create a new word embedding based on a set of example images. This embedding vector acts as a "pseudo-word" which can be included in a prompt to express the content or style of the examples.{{Cite journal |date=2023 |title=An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion |journal=ICLR |arxiv=2208.01618 |quote=Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model. |last1=Gal |first1=Rinon |last2=Alaluf |first2=Yuval |last3=Atzmon |first3=Yuval |last4=Patashnik |first4=Or |last5=Bermano |first5=Amit H. |last6=Chechik |first6=Gal |last7=Cohen-Or |first7=Daniel }}

= Image prompting =

In 2023, Meta's AI research released Segment Anything, a computer vision model that can perform image segmentation by prompting. As an alternative to text prompts, Segment Anything can accept bounding boxes, segmentation masks, and foreground/background points.{{Cite conference |date=2023 |title=Segment Anything |url=https://openaccess.thecvf.com/content/ICCV2023/papers/Kirillov_Segment_Anything_ICCV_2023_paper.pdf |conference=ICCV}}

= Using gradient descent to search for prompts =

In "prefix-tuning",{{cite book | doi=10.18653/V1/2021.ACL-LONG.353 | chapter=Prefix-Tuning: Optimizing Continuous Prompts for Generation | title=Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) | year=2021 | last1=Li | first1=Xiang Lisa | last2=Liang | first2=Percy | pages=4582–4597 | s2cid=230433941|quote="In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning... Prefix-tuning draws inspiration from prompting"}} "prompt tuning", or "soft prompting",{{cite book | doi=10.18653/V1/2021.EMNLP-MAIN.243 | chapter=The Power of Scale for Parameter-Efficient Prompt Tuning | title=Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing | year=2021 | last1=Lester | first1=Brian | last2=Al-Rfou | first2=Rami | last3=Constant | first3=Noah | pages=3045–3059 | s2cid=233296808 |arxiv=2104.08691|quote="In this work, we explore "prompt tuning," a simple yet effective mechanism for learning "soft prompts"...Unlike the discrete text prompts used by GPT-3, soft prompts are learned through back-propagation"}} floating-point-valued vectors are searched directly by gradient descent to maximize the log-likelihood on outputs.

Formally, let $\mathbf{E} = \{\mathbf{e_1}, \dots, \mathbf{e_k}\}$ be a set of soft prompt tokens (tunable embeddings), while $\mathbf{X} = \{\mathbf{x_1}, \dots, \mathbf{x_m}\}$ and $\mathbf{Y} = \{\mathbf{y_1}, \dots, \mathbf{y_n}\}$ be the token embeddings of the input and output respectively. During training, the tunable embeddings, input, and output tokens are concatenated into a single sequence $\text{concat}(\mathbf{E};\mathbf{X};\mathbf{Y})$ , and fed to the LLMs. The losses are computed over the $\mathbf{Y}$ tokens; the gradients are backpropagated to prompt-specific parameters: in prefix-tuning, they are parameters associated with the prompt tokens at each layer; in prompt tuning, they are merely the soft tokens added to the vocabulary.{{Cite conference |date=2024 |title=How Does In-Context Learning Help Prompt Tuning? |arxiv=2302.11521 |conference=EACL}}

More formally, this is prompt tuning. Let an LLM be written as $LLM(X) = F(E(X))$ , where $X$ is a sequence of linguistic tokens, $E$ is the token-to-vector function, and $F$ is the rest of the model. In prefix-tuning, one provides a set of input-output pairs $\{(X^i, Y^i)\}_i$ , and then use gradient descent to search for $\arg\max_{\tilde Z} \sum_i \log Pr[Y^i | \tilde Z \ast E(X^i)]$ . In words, $\log Pr[Y^i | \tilde Z \ast E(X^i)]$ is the log-likelihood of outputting $Y^i$ , if the model first encodes the input $X^i$ into the vector $E(X^i)$ , then prepend the vector with the "prefix vector" $\tilde Z$ , then apply $F$ . For prefix tuning, it is similar, but the "prefix vector" $\tilde Z$ is pre-appended to the hidden states in every layer of the model.{{Citation needed|date=May 2025}}

An earlier result uses the same idea of gradient descent search, but is designed for masked language models like BERT, and searches only over token sequences, rather than numerical vectors. Formally, it searches for $\arg\max_{\tilde X} \sum_i \log Pr[Y^i | \tilde X \ast X^i]$ where $\tilde X$ is ranges over token sequences of a specified length.{{Cite book |last1=Shin |first1=Taylor |title=Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) |last2=Razeghi |first2=Yasaman |last3=Logan IV |first3=Robert L. |last4=Wallace |first4=Eric |last5=Singh |first5=Sameer |date=November 2020 |publisher=Association for Computational Linguistics |location=Online |pages=4222–4235 |chapter=AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts |doi=10.18653/v1/2020.emnlp-main.346 |chapter-url=https://aclanthology.org/2020.emnlp-main.346 |doi-access=free |s2cid=226222232}}

Limitations

While the process of writing and refining a prompt for an LLM or generative AI shares some parallels with an iterative engineering design process, such as through discovering 'best principles' to reuse and discovery through reproducible experimentation, the actual learned principles and skills depend heavily on the specific model being learned rather than being generalizable across the entire field of prompt-based generative models. Such patterns are also volatile and exhibit significantly different results from seemingly insignificant prompt changes.Meincke, Lennart and Mollick, Ethan R. and Mollick, Lilach and Shapiro, Dan, Prompting Science Report 1: Prompt Engineering is Complicated and Contingent (March 04, 2025). Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5165270{{Cite news |date=2025-05-06 |title='AI is already eating its own': Prompt engineering is quickly going extinct |url=https://www.fastcompany.com/91327911/prompt-engineering-going-extinct |work=Fast Company}} According to The Wall Street Journal in 2025, the job of prompt engineer was one of the hottest in 2023, but has become obsolete due to models that better intuit user intent and to company trainings.{{Cite news |last=Bousquette |first=Isabelle |date=2025-04-25 |title=The Hottest AI Job of 2023 Is Already Obsolete |url=https://www.wsj.com/articles/the-hottest-ai-job-of-2023-is-already-obsolete-1961b054 |access-date=2025-05-07 |work=Wall Street Journal |language=en-US |issn=0099-9660}}

Prompt injection

{{see also|SQL injection|Cross-site scripting|Social engineering (security)}}

Prompt injection is a cybersecurity exploit in which adversaries craft inputs that appear legitimate but are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs). This attack takes advantage of the model's inability to distinguish between developer-defined prompts and user inputs, allowing adversaries to bypass safeguards and influence model behaviour. While LLMs are designed to follow trusted instructions, they can be manipulated into carrying out unintended responses through carefully crafted inputs.{{Cite web |last=Vigliarolo |first=Brandon |date=19 September 2022 |title=GPT-3 'prompt injection' attack causes bot bad manners |url=https://www.theregister.com/2022/09/19/in_brief_security/ |access-date=2023-02-09 |website=The Register |language=en}}{{Cite web |date=26 March 2024 |title=What is a prompt injection attack? |url=https://www.ibm.com/think/topics/prompt-injection |access-date=7 March 2025 |website=IBM}}

References

Category:Deep learning

Category:Machine learning

Category:Natural language processing

Category:Unsupervised learning

Category:2022 neologisms

Category:Linguistics

Category:Generative artificial intelligence