Automating Prompt Engineering with DSPy: An Overview

The power of Language Models (LMs, or oftentimes called Large Language Models or LLMs) is often harnessed by chaining them together into sophisticated Language Model Programs, capable of tackling increasingly complex Natural Language Processing (NLP) tasks. Think of these programs as multi-step recipes where each step involves prompting an LM to perform a specific sub-task. Traditionally, building these LM programs relies heavily on manual prompt engineering: the time-consuming process of crafting effective instructions and examples (prompts) for each step through trial and error.
Enter DSPy: a novel framework that introduces automated prompt engineering for LM programs. It is the framework for programming—not prompting—LM programs. Instead of painstakingly hand-tuning prompts, DSPy treats the instructions and demonstrations given to the LMs within a program as parameters that can be automatically optimized to maximize performance on a specific task.
DSPy: Programming and Optimizing in One Go
DSPy provides a declarative programming model that allows developers to define what they want their LM program to achieve, without needing to specify exactly how each LM call should be prompted initially. You define the program as a series of modules, each with prompt templates containing open slots or variables for instructions and demonstrations (examples).
The core innovation of DSPy lies in its ability to automatically find the best values for these prompt variables. It does this by using various optimization strategies that don’t require access to the internal workings (e.g. gradients) of the LMs themselves or detailed labels for the intermediate steps within the program. DSPy only needs the LM program, a metric to measure its overall success (e.g., accuracy, precision, recall, F1), and a training dataset of inputs (and optionally, final outputs).
Tackling the Challenges of Prompt Optimization
Optimizing prompts for multi-stage LM programs presents two key challenges that DSPy is designed to address:
•The Proposal Challenge: The space of all possible instructions and combinations of demonstrations is incredibly vast. DSPy needs efficient techniques to propose a small set of high-quality prompt candidates for each module.
•The Credit Assignment Challenge: When an LM program doesn’t perform well, it’s difficult to determine which module’s prompt is the culprit. DSPy needs strategies to infer the impact of prompt choices in each module on the overall program performance to guide the optimization process.
How DSPy Optimizes: Key Strategies
DSPy employs several intelligent strategies to tackle these challenges:
•Bootstrapping Demonstrations: DSPy can automatically generate potential few-shot examples (input/output pairs) for each module by running the initial version of the program on training data. If the program produces a successful final output (according to the defined metric), the input/output traces of each module are treated as valuable demonstrations. The optimizer can then intelligently select combinations of these bootstrapped demonstrations to include in the prompts.

•Grounded Instruction Proposal: To generate effective instructions, DSPy utilizes another LM, a “proposer” LM. This proposer LM is provided with relevant context to help it craft better instructions. This context can include summaries of the training data, the structure of the LM program itself, and even previously evaluated prompts and their scores. By “grounding” the proposer in this information, DSPy aims to generate instructions that are more tailored to the specific task and the role of each module within the program.
•Surrogate Models for Efficient Search: To navigate the vast space of possible prompt configurations efficiently, DSPy can use surrogate models, such as Bayesian optimization. These models learn to predict the performance of different prompt combinations based on past evaluations, allowing DSPy to focus its search on the most promising areas. This reduces the number of costly LM calls needed for evaluation.
•Meta-Optimization of Proposal Strategies: DSPy can even go a step further by learning the best way to propose prompts. By parameterizing the hyperparameters of the proposal process (e.g., the temperature of the proposer LM, which grounding information to use), DSPy can use techniques like Bayesian optimization to find the proposal strategies that yield the best performing prompts for a given task and LM setup.
MIPRO: A Powerful Optimizer in the DSPy Toolkit
MIPRO (Multi-prompt Instruction PRoposal Optimizer) is an algorithm built using the above insights that demonstrates strong performance. MIPRO jointly optimizes both the instructions and the few-shot examples for each module in an LM program. By separating the task of proposing prompts from the task of evaluating and selecting the best combinations (credit assignment using a surrogate model), MIPRO can effectively find high-performing prompt configurations.
Key Lessons Learned from DSPy Optimization
The evaluation of DSPy optimizers on a diverse set of tasks yielded several important lessons:
•Optimizing bootstrapped demonstrations is often crucial for achieving the best performance in LM programs . Providing relevant examples can be a very effective way to guide LMs.
•Jointly optimizing both instructions and few-shot examples, as done by MIPRO, generally leads to the best overall performance across a range of tasks.
•Instruction optimization becomes particularly important for tasks with complex, conditional rules that are difficult to convey through a limited number of examples. In such cases, refining the instructions can have a significant impact.
•Providing relevant context (“grounding“) to the instruction proposal process is generally helpful, but the most beneficial type of context can vary depending on the specific task.
•There is still much to learn about the optimal strategies for LM program optimization, and future research can explore the performance of different optimizers under varying resource constraints and with different base LMs.

Conclusion: Towards More Efficient and Effective LM Programs
DSPy represents a significant advancement in how we build and optimize Language Model Programs. By automating the often tedious and error-prone process of prompt engineering, DSPy offers the potential for increased efficiency, better performing NLP solutions, and a reduced reliance on manual trial and error. As LM programs become increasingly central to tackling complex tasks, frameworks like DSPy will play a vital role in making this technology more accessible and powerful.
Join Founders Creative at the AI Engineering Summit 3/28 in Palo Alto to connect with 50+ engineering leaders!
Sign up today using code RAYMOND for $50 off! Early bird ticket sales end 3/9 11:59p Pacific!