Digital to AI Transformation in Retail: Breaking Free from the POC Trap

In 2025 and beyond, the retail industry stands at a crossroads. With the rapid evolution of generative AI, retailers worldwide are exploring how this transformative technology can enhance everything from customer service to supply chain logistics. While interest is surging, many retailers are stuck in the experimentation phase. Many companies are conducting Proof of Concepts (POCs), yet the transition to widespread AI deployment in retail remains limited. Few notable examples:
-
A study by IDC revealed that 88% of AI POCs do not progress to production, indicating a significant gap between experimentation and implementation. CIO
-
Gartner predicts that 30% of generative AI projects will be abandoned after the POC stage by the end of 2025, highlighting the difficulties in sustaining AI initiatives beyond initial trials. Informatica
-
According to the Boston Consulting Group, 74% of companies struggle to achieve and scale value from AI initiatives, highlighting the hurdles in transitioning from pilot projects to enterprise-wide adoption. BCG Global
The Next Digital Transformation
Generative AI in retail is not just a trend—it’s the next wave of digital transformation. Much like the shift to e-commerce, omnichannel strategies, and cloud computing over the past two decades, AI is set to redefine core retail processes. This time, it’s about intelligent automation, real-time personalization, and machine-driven creativity.
Just as digital transformation separated the leaders from the laggards in the last era, AI transformation will create a new divide. Retailers who embrace this shift holistically will shape the future of shopping; those who don’t risk irrelevance and will be left behind in this wave of AI transformation.
The Promise of Generative AI in Retail
Generative AI has shown immense promise in transforming retail operations:
-
Hyper-Personalization: AI-powered engines tailor product recommendations and marketing messages based on individual shopper behavior.
-
Content Automation: Retailers use generative AI to craft product descriptions, ad copy, and social media content at scale.
-
Visual Innovation: Virtual try-ons, AI-generated product imagery, and visual search tools are transforming consumers’ shopping experiences.
-
Operational Efficiency: Demand forecasting, dynamic pricing, and smart merchandising are being enhanced by AI insights.
These innovations aren’t theoretical. Companies like Amazon, Walmart, Zalando, and H&M have launched Gen AI pilots that promise significant returns on investment (ROI). Yet even among these giants, widespread adoption remains elusive.
These innovations aren’t theoretical. Companies like Amazon, Walmart, Zalando, and H&M have launched Gen AI pilots that promise significant ROI. For instance,
-
Zalando utilized generative AI to create more than 70% of its campaign marketing images, resulting in a 90% reduction in content creation costs and a decrease in turnaround time from 6–8 weeks to 3–4 days. Zalando’s use of generative AI in marketing
-
H&M has implemented AI-powered product imagery and virtual models to boost online engagement and scale creative output. H&M Group and AI inventory management
-
Amazon has utilized generative AI to assist sellers in automatically creating product titles and descriptions, thereby making it faster and easier to list items, which in turn drives improved seller onboarding and catalog quality. Amazon Gen-AI Power Product Listing
-
Walmart, meanwhile, introduced a Gen AI-powered search experience that enhances product discovery and relevance across its e-commerce platform, improving customer engagement and conversion. Walmart Gen-AI Powered Search
Yet even among these giants, widespread adoption remains elusive.
The POC Paralysis
So, what’s holding retailers back?
-
Budget Constraints: Generative AI solutions often require significant up-front investment. While cloud costs have decreased, training large models or integrating them into legacy systems remains far from inexpensive.
-
Tariff and Regulatory Pressures: Retailers operating across borders navigate a web of data residency laws, AI governance frameworks, and digital service taxes, which add legal complexity to AI rollouts.
-
Organizational Readiness: Many retail organizations lack the technical maturity, cross-functional alignment, or data infrastructure necessary to operationalize AI beyond POC.
-
ROI Uncertainty: Measuring success isn’t straightforward. A virtual try-on may boost engagement, but how does that translate into margin or conversion improvements at scale?
-
Trust and Brand Risk: Retail is a brand-sensitive space. A hallucinated product description or a biased recommendation algorithm can quickly erode trust.
From Pilot to Platform: A Strategic Shift is Needed
To help retailers implement these principles, here’s a downloadable checklist and visual framework for evaluating their AI maturity and readiness for scale.
To overcome the five key challenges above (budget, regulation, readiness, ROI, and brand risk), retailers must take a more disciplined and strategic approach to AI implementation. Below are the recommended strategies, along with real-world examples of how leading brands are realizing value: To move from Gen AI curiosity to competitive advantage, retailers need to rethink their approach:
-
Start with Clear Use Case Prioritization: Focus on 1-2 high-impact, low-risk areas (e.g., content generation for product detail pages) and go deep.
-
For example, Amazon is helping sellers accelerate product listing creation with AI-generated titles and descriptions, thereby simplifying catalog management and onboarding while enhancing SEO and conversion rates.
-
Focus on 1-2 high-impact, low-risk areas (e.g., content generation for product detail pages) and go deep.
-
-
Create AI Centers of Excellence: Cross-functional teams with product, engineering, data science, and compliance leads can help scale AI responsibly.
-
Walmart’s Gen AI initiatives are aligned across digital, merchandising, and product functions to streamline the rollout of its new AI-powered search experience.
-
Cross-functional teams with product, engineering, data science, and compliance leads can help scale AI responsibly.
-
-
Invest in Data Foundations: AI outcomes are only as good as the data fueling them.
-
Retailers must invest in clean, labeled, and accessible data across channels.
-
H&M’s use of AI in product imagery relies on structured visual datasets and consistent product tagging to generate scalable creative outputs across markets.
-
AI outcomes are only as good as the data fueling them. Retailers must invest in clean, labeled, and accessible data across channels.
-
-
Measure What Matters: Define metrics that tie AI initiatives to business outcomes, such as conversion, AOV, return rate, time to market, etc.
-
Zalando, for instance, tracks reduced campaign creation time and cost as core KPIs, seeing a 90% reduction in costs and faster go-to-market with AI-generated marketing visuals.
-
Define metrics that tie AI initiatives to business outcomes, such as conversion, AOV, return rate, time to market, etc.
-
-
Choose the Right Partners: From LLM platforms to retail-specific startups, partnerships can reduce time to value and offer domain expertise.
-
Many retailers collaborate with cloud providers such as Google Cloud, AWS, and Microsoft Azure to leverage their Gen AI models, which are tailored to retail workflows, ensuring scalability and governance.
-
From LLM platforms to retail-specific startups, partnerships can reduce time to value and offer domain expertise.
-
AI Readiness Checklist
Use this quick checklist to assess where your organization stands on the journey from pilot to production:
Use this framework to locate your current position and outline what needs to happen next.
The Road Ahead
Retailers can’t afford to wait on the sidelines. Gen AI won’t just be a competitive edge; it will soon be table stakes. The winners will be those who move decisively, scale responsibly, and measure impact relentlessly. As digital transformation redefined retail over the past two decades, AI transformation will shape the next. The question is no longer whether AI will transform retail, but whether your company is prepared to transform with it.
As we move into the second half of 2025, the challenge isn’t discovering what AI can do; it’s harnessing its capabilities. It’s figuring out what you will do with it and when. Now is the time to move from experiments to execution.
Do you have a perspective on where Gen AI fits into your retail strategy? Let’s start the conversation. If you’re curious to explore more about how generative AI is reshaping retail product management—across search, personalization, merchandising, and customer experience—stay tuned. We’ll be diving deeper into these areas in upcoming posts.
Subscribe to the Founders Creative Substack to receive updates, insights, and frameworks directly in your inbox.
Founders Creative is a community connecting over 10,000 AI founders, investors, engineers, and operators through exclusive events in Silicon Valley, fostering collaboration and innovation.
References
-
https://www.aboutamazon.com/news/retail/amazon-generative-ai-product-search-results-and-descriptions
-
https://corporate.walmart.com/news/2023/10/03/introducing-walmarts-gen-ai-powered-search-experience
-
https://hmgroup.com/media/news/general-news-2024/hm-group-and-ai-inventory-management
-
https://www.microsoft.com/en-us/industry/retail/microsoft-cloud-for-retail
GTM Gets a Glow-Up

I. Introduction
Feeling the pressure to unlock the next level of growth for your company? As an executive or early-stage founder, you’re likely exploring every avenue to gain a competitive edge. At a recent industry marketing meetup I attended, the buzz around AI in Go-To-Market was electric – some were already diving in headfirst, while others were understandably cautious. The reality is clear: marketing, sales, and customer success teams are increasingly leveraging AI to transform everything from crafting compelling content to forecasting and retention.
<“In 2024, marketing and sales teams more than doubled their use of GenAI.” — McKinsey, 2024
If you’re wondering where to even begin, you’re in the right place. This article aims to provide you, as a busy leader, with a pragmatic starting point and key questions to ask your teams. And let’s be clear: I’m not offering a definitive, final answer in this rapidly evolving landscape. Instead, think of this as a guide of considerations.
Read on, you’ll discover:
-
The compelling reasons behind AI adoption in GTM.
-
Practical use cases and the tools making them possible.
-
Essential questions every executive should be considering with their team.
(Coming up next: we’ll explore the tailored AI strategies for early-stage ventures versus larger enterprises. —- In the meantime, feel free to jump to the GTM function most relevant to you.)
-
Market Intelligence
-
Content Creation
-
Product Marketing
-
Demand Generation
-
Sales Enablement
-
Customer Success
II. GTM Functions Being Transformed
1. Market Research
From Manual to Machine-Augmented
Market research has evolved. We’re no longer waiting weeks for survey results or digging through 80-slide PowerPoint decks for insights. AI has changed the tempo. Today’s tools scan customer reviews, forums, and competitor updates in real time—surfacing trends and helping teams pivot fast.
Why the Shift?
-
2–3x faster insights (Gartner)
-
30% better sentiment analysis (Forrester)
-
40% less manual data work (McKinsey)*
These gains explain the rising adoption of AI, especially among competitive intelligence (CI) and strategy teams.
Strategic Questions to Ask Your Team
-
Are we analyzing unstructured feedback—or just structured forms?
-
Can we automate competitive tracking and alerts?
-
Are we incorporating win/loss insights into GTM strategy?
The research may point to trends—but it’s what you say next that matters. That’s where content comes in.
2. Content Creation:
Volume Is Easy—Voice Is Everything
Let’s be honest: posting every day doesn’t make you a thought leader. It just makes you noisier. Have you seen what is happening to LinkedIn posts and responses?
AI can churn out content fast—but it still can’t think for you. Thought leadership requires real thought: a sharp perspective on a stale topic, a new take on familiar data, or a well-argued prediction. In that sense, AI is the accelerator—but humans are still the spark.
AI Is Ubiquitous—but Not Autonomous
Today, over 80% of marketers use tools like Jasper, Copy.ai, and ChatGPT to draft blogs, emails, and ad copy. It’s a huge unlock for teams stuck at the blank page. But let’s not confuse speed with originality. AI handles volume—but voice, nuance, and emotional resonance? That still takes a human hand.
>“When deciding between AI and human input in communications, I consider the 4 Rs: research, reach, relevance, and relationships. AI is great at accelerating insights and scaling reach — but relevance and relationships? Those still need human judgment and connection.”
— Michele Landry, President, Tanis Communications
You can see the impact most clearly on LinkedIn:
-
189% spike in AI-generated posts post-ChatGPT launch (Originality.AI, 2023)
-
54% of long-form LinkedIn posts may be AI-generated by late 2024
-
Average post length up 107%—but not always better thinking
Ask Yourself (And Your Team)
-
Where can AI handle the first pass—without sacrificing quality?
-
Are we adapting content for persona, funnel stage, and industry nuance?
-
How are we protecting tone, authenticity, and brand voice?
Content doesn’t exist in a vacuum. The best messaging is grounded in sharp product positioning, and that’s exactly where AI is starting to play a more strategic role.
Let’s look at how AI is reshaping the way we position and bring products to market.
3. Product Marketing
From Gut Instinct to Evidence-Backed Positioning
It’s easy to think of AI as a tool for content generation or campaign automation—but it’s quietly transforming one of the most strategic functions in the GTM stack: product marketing.
From market segmentation to win/loss analysis to real-time competitive tracking, AI is making product marketing faster, sharper, and more evidence-based. No more digging through spreadsheets or waiting on static reports.
>“Today’s tools can surface patterns in buyer behavior, test messaging hypotheses, and identify gaps in positioning before your next launch goes sideways,” Vasu Madabushi, Product Marketing Executive.
Why Use AI in Product Marketing
-
82% of marketers say AI speeds up workflows like research and positioning (CMI, 2024)
-
2–3x faster insight synthesis using AI vs. manual research (Gartner, 2023)
-
25% faster message validation cycles (Gartner, 2023)
Ask Yourself (and your team)
-
Are we using AI to analyze win/loss trends and fine-tune our positioning?
-
Can AI help us build or refine personas faster?
-
Are we tracking competitor activity in real time—or waiting for reports?
-
Where are we relying on gut instinct when AI could give us actual signals?
-
Can we simulate buyer journeys to validate messaging or positioning?
From positioning to pipeline—once your product story is nailed, demand gen turns it into momentum.
4. Demand Generation:
From Campaigns to Co-Pilots
Once your positioning is dialed in, the pressure shifts to growth—and that’s where demand gen teams are finding new power in AI. Instead of guessing which campaigns will hit or which accounts might convert, marketers now have tools that help them see what’s working and why, in real time. With intent signals, predictive scoring, and automated testing, teams can focus less on setup and more on scaling what drives pipeline. It’s like having thermal imaging for your funnel—AI reveals who’s truly warming up to buy, not just who filled out a form.
>AI is a game-changer for demand gen. It takes care of the repetitive tasks, automates workflows, and scales operations, giving teams the freedom to focus on strategy while AI handles the heavy lifting behind the scenes.
-Deepa Caveney, B2B SaaS MarketingImprovements You Can Expect
-
Lift in MQL-to-SQL conversion rate after implementing AI scoring (often 10–25% improvement)
-
70% faster time-to-market for campaign content using AI Content Automation & Experience Platforms
Source: TechRadar, Adobe Summit 2024 -
37% increase in campaign response rate using AI Real-Time Analytics & A/B Testing Tools to optimize in-flight campaigns.
Source: McKinsey & Co.Ask Yourself (and your team)
-
Are we using AI to prioritize leads based on intent, behavior, and fit — or still relying on basic scoring?
-
How often are we refreshing segmentation based on AI insights or new signals
-
Which parts of our campaign workflows are still manual and could be automated?
While demand generation fills the funnel with qualified interest, sales enablement ensures those leads are converted—arming sellers with the right insights, content, and tools to close the deal.
5. Sales Enablement
Sales Enablement: Smarter Reps, Faster Wins
Sales enablement has always been about giving teams the tools, content, and training they need to close more deals — but AI is taking it to the next level. Today, top teams are using AI to surface the right content at the right time, coach reps based on real call data, automate follow-up tasks, and give managers clear visibility into what’s working. From onboarding and readiness to conversation intelligence, deal reviews, and content optimization, AI is helping sales teams move faster, improve performance, and stay focused on selling.
>“The real value of AI in sales enablement is its ability to drive performance at scale to deliver the right insights at the right time, personalizing coaching to each rep’s strengths, and continuously learning what accelerates deals. It’s not about replacing the human element, it’s about amplifying it to boost productivity, consistency, and results across the team.”
– Louisa Morissutti, Enablement ConsultantImprovements with AI:
-
20–40% faster rep ramp time (Gong internal benchmark)
-
25% more productive calls via real-time recommendations (Chorus)
-
15% higher deal close rates using AI-assisted coaching (Salesloft)
Ask Yourself (and your team):
-
Are we capturing and analyzing rep-customer conversations?
-
Can AI suggest the right enablement content based on deal stage?
-
How can AI help new reps ramp faster?
The AI-driven insights and tools that equip sales teams are just the beginning; the journey continues with Customer Success, where AI offers equally powerful ways to nurture relationships and maximize customer lifetime value.
6. Customer Success
Transitioning from Reactive Support to Proactive Engagement
In the early stages of building a company, retaining and expanding your customer base is paramount. AI is revolutionizing Customer Success by shifting it from a reactive support role to a proactive, data-driven function. By leveraging AI, startups can anticipate customer needs, personalize experiences, and scale support without proportionally increasing headcount. This proactive approach not only enhances customer satisfaction but also drives retention and growth.
Why Customer Success Functions are looking at AI:
-
20% reduction in churn for companies using AI-driven customer success strategies (Gainsight, The State of Customer Churn in 2024)
-
30–50% of support tickets are now resolved by AI chatbots (Zendesk CX Trends 2024)
-
18% higher Net Promoter Score (NPS) for AI-personalized customer engagements (Intercom, 2024)
By integrating AI into your Customer Success strategy, you can proactively address customer needs, enhance satisfaction, and drive sustainable growth—all crucial factors for early-stage companies aiming to scale efficiently.
Ask Yourself (and Your Team)
-
Are we using AI to detect early signs of customer churn?
-
Can we automate routine customer support interactions?
-
How can AI personalize the CS journey by segment?
With AI in their corner, customer success teams can stop playing defense and start driving proactive, personalized experiences that keep customers happy—and coming back for more.
III. Where the Human Still Leads
Despite AI’s speed and scale, humans remain essential for:
-
Strategic judgment
-
Emotional intelligence
-
Brand storytelling and creative inspiration
AI doesn’t replace these functions — it supports them. Human-AI collaboration will become the defining competitive edge of modern GTM teams.
What Leaders Should Do Today
-
Ask the questions of your GTM functional leaders within this article
-
Embrace a test-and-learn culture with AI tools
-
Reskill teams for AI collaboration (prompting, auditing, training models)
-
Set KPIs that reflect both human-led and AI-supported success
Final Thought
The most successful GTM organizations of the future will be those who treat AI not as a threat but as a teammate. The winners won’t be the fastest to automate, but the fastest to collaborate — blending human creativity with machine precision to drive growth at scale.
-
-
-
Watch for my next article in a few weeks that will comparing an early-stage to a large enterprise approach to AI GTM tools.
Building Effective AI and GenAI Product Strategy

Having spent two decades building products with data and AI at their core, I’ve decided to share my perspective on how AI/GenAI product strategy development and execution differs from conventional product strategy approaches.
As organizations increasingly integrate these capabilities into their product portfolios, product leaders must adapt their strategic frameworks to address the unique challenges and opportunities that AI and GenAI present.
1. Problem-Centric Market Research
The most effective AI product strategies avoid the trap of pursuing AI/GenAI for its own sake. Instead, they focus relentlessly on customer and business value, using AI’s distinctive capabilities to solve problems that would otherwise remain intractable.
-
Identify business problems that cannot be effectively solved without AI/GenAI,
-
In larger companies, conduct cross-organizational needs assessment to discover high-value AI/GenAI use cases,
-
Look beyond the obvious use cases to uncover areas where AI/GenAI can create distinctive value.
2. Organizational Alignment & Cross-Functional Collaboration
Organizational alignment is equally critical. AI/GenAI initiatives must support broader company objectives while accounting for internal capabilities and limitations.
-
Secure executive sponsorship and stakeholder buy-in early in the development process,
-
Align AI initiatives with broader company objectives and strategic goals,
-
Establish cross-functional teams to navigate technical, ethical, and business challenges.
3. Data Strategy as Core Business Asset
When it comes to AI/GenAI product development, Data is the King. Weaving data availability and readiness into product strategy is essential.
-
Develop deliberate data acquisition, governance, and enhancement strategies
-
Make strategic decisions regarding data partnerships, feedback loops, annotations, and synthetic data creation
-
Position AI product managers as key stakeholders in data readiness and availability
4. Technical Development Under Uncertainty
The value proposition for AI products demands particular attention. Unlike traditional products, AI solutions often deliver probabilistic rather than deterministic outcomes. This requires careful articulation of how AI/GenAI delivers better solutions than alternatives, with clear success metrics.
-
Implement research-prototype-productize development patterns for AI/GenAI solutions. The most effective AI strategies balance innovation and research with practical customer needs, avoiding the trap of pursuing technical sophistication at the expense of user value.
-
Establish probabilistic success metrics that account for AI’s experimental nature,
-
Plan for ongoing model evaluation, including human-in-the-loop evaluations. Strategic planning must account for how model performance may change over time, including potential performance drift that requires ongoing monitoring.
-
Build vs buy decisions need to be revisited over time, as home-grown solutions mature.
5. Trust-Building and Adoption Strategy
Ethical and regulatory considerations take on heightened importance in AI/GenAI product strategy, especially as regulatory landscapes evolve around AI transparency, fairness, and accountability. GenAI introduces new concerns around authenticity, copyright, and misuse.
-
Develop explicit frameworks for responsible and ethical AI development
-
Incorporate explainability features and trust-building elements into product design
-
Create educational components to support human-AI collaboration and overcome adoption barriers
Each of these points deserves its own article, and many are being published every day. I will share as I come across the ones that resonate with me and, in my opinion, will help my fellow AI product managers create awesome AI/GenAI products and features!
What is a Foundation Model?

I recently moderated a panel on Foundation Models at the Founder’s Creative engineering summit. We started with a simple warm-up question for our audience: “What is a Foundation Model?” Surprisingly, only about 15% of the 100+ engineers in the room could answer. Even within that small group, the definitions varied widely. This experience, along with a similar one at a UC Berkeley paper reading session, highlighted a fundamental gap in understanding. What actually is a foundation model?
I recommend that you pause here and and think about how you define it before proceeding. Don’t forget to add it to the comments section below.
The term Foundation Model was introduced and defined in the 2021 report titled Foundation Models: Opportunities and Risks of Large AI Models from Center for Research on Foundation Models (CRFM) and Institute for Human-Centered Artificial Intelligence (HAI) at Stanford University. I have pulled out three relevant nuggets from the report:
A foundation model is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks
On a technical level, foundation models are enabled by transfer learning and scale.
Foundation model designates a model class that are distinctive in their sociological impact and how they have conferred a broad shift in AI research and deployment.
Combined together, we have a very clear definition of foundation model, in terms of how it is created, its capabilities and impact:
-
Massive Scale: Built with large number of parameters to capture intricate data relationships and trained on a very large corpus of data.
-
Self supervised learning: Trained using self supervised learning and hence does not need human annotations at scale.
-
Versatile: Can be adapted to new tasks through fine-tuning, prompt-based learning, or few-shot/zero-shot methods without full re-training.
-
Impact: Causing a major and broad shift in multiple domains with real world impact on people from all walks of life.

The selection of the term “Foundation” was deliberate as well:
The word “foundation” specifies the role these models play: a foundation model is itself incomplete but serves as the common basis from which many task-specific models are built via adaptation. We also chose the term “foundation” to connote the significance of architectural stability, safety, and security: poorly-constructed foundations are a recipe for disaster and well-executed foundations are a reliable bedrock for future applications
Arguably, we can limit the definition to the capabilities and impact and say that
Foundation model is versatile, adaptable to new tasks without full retraining, and has a significant real-world impact on daily life.
That is it, very simple and yet has a lot behind it. What do you think? Do you agree with this definition? Have you seen usage of the term in any other context?
Designing for AI: A Designer’s Guide to Building Trust, Adaptability, and Ethics

As artificial intelligence (AI) reshapes industries, product designers face the challenge of crafting user experiences (UX) that align with AI’s capabilities, evolution, and ethical implications. Designing for AI is always beyond visuals, it’s about creating trust, enabling collaboration, and ensuring adaptability while prioritizing ethics and making users in control. Here is a guide by a designer to design for AI-driven products, from regular systems to autonomous AI agents and assistants.
Core Principles for Designing AI Experiences
I always call AI is UX because AI is all about User Experiences. Here are some core principles of AIX.
1. Design to create Human Trust
Trust is always psychological and Trust is the foundation of AI interactions. Users must feel confident in the AI’s reliability without unknown or scary expectations.
-
Be Transparent: Clearly communicate the AI’s capabilities and limitations on what it can do.
-
Set Expectations: Use visual cues and confidence score so that users know how much they can realy upon the AI recommendations and to explain AI decisions.
-
Avoid Too much of Anthropomorphism: Don’t make AI too human, its not real human and its just assistant software, not a replacement for human connection.
2. Design for Quick Iteration, AI Evolution, and Hyper Personalization
AI is evolving fast and user needs evolve rapidly, so designs must be adaptable with trends and useless.
-
Build Modular Systems: Use flexible frameworks which are modular to accommodate updates and upgrading to newer stuff.
-
Prioritize Scalability: Design is never done and its a living system, not a static User Interface, it needs updates and scale.
-
Enable Personalization: Provide control to the user and lets user edit the outputs to match their expecations and support rapid iteration.
3. Design for Ethical Use
The big thing in AI is biases and it can be amplify easily, invade privacy, or hide ownership. Designers must mitigate these risks and consider while designing the system.
-
Reduce Bias: Check the system AI responses and create feedback loops for users to flag issues and raise concerns.
-
Data Privacy: Must be very transparent about data usage on where data is coming, where data is going and provide required details and provide control to the users.
-
Showcase Ownership: The AI generated outputs should be defined well about the ownerships weather its uers, platform or proivude so there is now confusion and its very clear.
4. Design for Collaboration
AI should be always a partner or a assistant. It should not be a dictator or show authority, it should be a good collaborator to help the users accomplish their tasks.
-
Enable to Refine: Allow users to adjust AI response outputs tweaking the tone, sliders or adjusting the values, regenerate with variations, etc.
-
Foster Dialogue: Design interfaces for conversation and interaction and just consumption and implementation. This will fetch better results.
-
Encourage Co-Creation: AI is a tool which learns based on its training and keep training the system so it learn better and serves your better.
Designing for AI Agents
AI agents are autonomous systems that perceive, decide, and act to achieve goals, from simple rule-based bots to complicated machine learning models. Poor design can lead to wrong responses, misinterpretations, biased outputs, or unpredictable behavior, risking trust and accuracy. Thoughtful design ensures agents align with human values, handle both positive and alternative use cases, remain transparent and unbiased.
Behavioral Attributes of Agentic AI
Agentic AI systems are defined by these 6 key attributes and they are are the foundational elements of Agnetic user experiences. There is a general agreement on the key attribute of the simplest form of an AI agent.
-
Goal-orientation — following given objectives.
-
Perception — understanding the environment and context.
-
Reasoning — ability to deduce.
-
Acting — do things. Either by itself or use “tools” to do it.
-
Learn and Adapt — ability to update its memory and improve reasoning.
-
Autonomy — a degree of independence and self-governance.
Agentic UX and Designing for Agents
Agentic UX focuses on intuitive, AI experiences for users interacting with autonomous AI agents building trust. Unlike traditional UX, it accounts for the dynamic, proactive nature of agents. Designing for Agents is all about building systems that ensuring seamless user interaction with these 3 major principles craft experience.
1. Agentic UX Principles
-
Transparency in Autonomy: Clearly communicate actions on what its going to do and aligning with Goal-orientation and Autonomy.
-
Contextual Feedback: Use Perception and Reasoning for relevant responses tying to Proactivity and provide details.
-
User Empowerment: Let users override with contextual actions balancing Autonomy with user control to what user want to do.
-
Proactive Assistance: Offer non-intrusive suggestions which are relavant ensuring Proactivity feels helpful.
-
Adaptive Tone: Adjust style of tone of language so it could be personal ,professional or soft or harder, etc.
2. Designing for Agents
-
Support Acting: Enable tool integration and provide required API’s and inform users of actions with transparency.
-
Enable Learning: Build memory systems to track preferences and prioritize reflecting Learn and Adapt.
-
Facilitate Perception and Reasoning: Provide required data inputs for context-aware responses.
-
Design for Proactivity: Use predictive models to anticipate the necessity and predict accordingly.
-
Balance Autonomy: Create permission settings for user oversight and balance between system and human.
3. Integration with AI Assistant Design
-
Optimize Interaction Flows: Add Transparency and Proactivity where ever required and balance accordingly.
-
Visualize Context: Support Learning by showing past interactions and data insights.
-
Keep It Simple: Use Proactivity subtly like notifications, inline instructions and streamline with right Actions.
Conclusion
There is so many agents growing and AI agents is the the future of intelligent software products. Designing for AI is very tricky and it requires balancing trust, adaptability, and ethics. By aligning designs with AI’s capabilities, supporting rapid interaction and evolution, prioritizing ethical use, and fostering collaboration, designers can create great user experiences that empower users. For AI agents, Agentic UX and thoughtful system design ensure better autonomy, disrupts the user experience. This holistic approach of designing thoughtful AI Agents builds AI systems that are capable, ethical, and delightful to interact with, fostering trust and collaboration. This all will result in a greater AI expeirences that delight users.
The Unique Responsibilities of AI Product Managers and Implementers: Balancing Innovation, Adoption, and Responsible AI

Introduction
The rapid advancement of artificial intelligence (AI) is reshaping the landscape of product management and IT product implementation. As AI risk has expanded beyond the traditional boundaries of software risk, the AI evolution brings forth new responsibilities and skill requirements essential for building and deploying AI products that are not only effective but also responsible and trustworthy.
Traditional software follows a rule-based, deterministic approach: it operates according to explicitly programmed logic and typically produces predictable outputs. In contrast, AI systems are fundamentally probabilistic, relying on statistical patterns learned from large datasets. This makes their behavior less predictable and more sensitive to variations in inputs, training data quality, and context.
Modern AI systems – particularly those powered by machine learning models like large language models (LLMs) – introduce additional layers of complexity and unpredictability. They are often difficult to interpret (“black box” models), can evolve over time, and may produce outputs that even their developers struggle to explain. Moreover, these systems frequently rely on dynamic feedback loops, operate across diverse contexts, and demonstrate a degree of autonomy and adaptability – especially in multi-agent architectures where independent AI components interact (“AI Fusion”) to achieve complex goals.
The Limits of AI Governance Functions
From an AI governance perspective, this shift presents significant challenges. While governance frameworks, such as the NIST AI Risk Management Framework (AI RMF), offer structured methodologies for managing AI risks, their implementation across organizations is often inconsistent and incomplete.
This is primarily because governance frameworks tend to remain high-level, providing general principles without sufficient guidance on how to operationalize them effectively across diverse business units. In addition, AI governance functions within companies are often relatively new, small, and frequently under-resourced. Even when governance frameworks are established, it is not feasible for AI governance functions to monitor every aspect of AI model development and deployment throughout an entire organization.
Additionally, limited awareness and understanding at the C-level about the complexity of AI risks make it difficult to secure the necessary resources and support to establish robust AI governance. Without strong executive buy-in, governance frameworks are sometimes destined to remain aspirational documents rather than practical tools for managing AI risks.
The combination of resource constraints and lack of executive prioritization can create governance gaps where critical responsibilities are either neglected or pushed onto product managers and implementers without sufficient guidance.
Why Many PMs Struggle with Responsible AI
When frameworks remain high-level and resources are limited, the burden of operationalizing responsible AI often falls on product managers and implementers -individuals and teams who may lack both the authority and the support to address these risks comprehensively. While AI product managers are expected to take on a level of responsibility that goes far beyond traditional product management, a 2025 study by Berkeley and other institutions found that most product managers are unprepared to address AI-specific risks.
Out of 300 surveyed PMs and 25 in-depth interviews, five major barriers to responsible AI emerged: widespread uncertainty about ethical requirements, diffusion of responsibility, lack of incentives, limited leadership support, and a failure to integrate responsible AI principles into everyday workflows. Without structured guidance or incentives, PMs often assume AI ethics or compliance teams will handle these issues, resulting in gaps where no one feels directly accountable.
These findings reveal a troubling reality: while AI product managers are positioned as key actors in ensuring responsible AI, they often lack the tools, incentives, and guidance to fulfill this role effectively. As a result, AI-specific risks are often left unaddressed, increasing the likelihood of failures and unintended consequences.
The Case for a Distributed Responsibility Model with a Central Role for the AI Product Manager
These governance gaps are not just theoretical problems; they are structural challenges that prevent organizations from effectively managing AI risks.
This is where a distributed responsibility model becomes essential. Instead of relying solely on governance teams to oversee every aspect of AI development – or leaving product managers to navigate complex risks on their own – responsibility must be deliberately shared across the organization. A distributed responsibility model involves dividing AI governance tasks across multiple teams – such as data collection, model development, testing, deployment, and security – ensuring that each unit is accountable for specific aspects of ethical and risk management throughout the AI lifecycle.
While such a modular setup promotes specialization and efficiency, it also creates the risk of maintaining gaps in governance when ethical responsibilities are not clearly assigned. Here, the AI product manager or team implementing a new AI solution can be of tremendous help and influence.
AI PMs are uniquely positioned to translate ethical and regulatory principles into practical, actionable product decisions, ensuring responsible AI development is effectively executed across distributed teams. Acting as the bridge between governance frameworks, technical teams, and business leaders, distributed responsibility centers the AI product manager as the key integrator. While governance teams provide policy direction, and business units may own risk, it’s the product manager who ensures these principles are applied – across design, data handling, development workflows, and real-world monitoring.
Empowering AI Product Managers for Responsible AI
Emphasize Traditional vs. AI-Specific Risks in Product Management
AI product managers face the same foundational risks that apply to any product development effort, as outlined by Marty Cagan in his book INSPIRED: value, usability, feasibility, and business viability.
Value risk asks whether customers will actually use or buy the product – an LLM-powered chatbot, for instance, may be technically impressive but fail to meet real user needs. Usability risk concerns whether users can effectively interact with the system; if an AI-powered tool is too complex, it risks being ignored or misused. Feasibility risk challenges whether the product can be built within technical and resource constraints, especially for AI systems requiring specialized expertise or infrastructure. And business viability risk asks whether the product aligns with the broader business model, regulatory requirements, and brand promise.
In addition to these well-known product risks, AI product managers and implementers must navigate a distinct layer of AI-specific risk. These include technical concerns such as model robustness, bias, privacy leakage, and security vulnerabilities, as well as societal risks like discrimination, ethical misalignment, and unintended misuse.
Embedding AI Risk Early: The Shift-Left Imperative
To manage the complexities of AI effectively, product managers and implementers must adopt a “shift-left” approach embedding risk considerations from the earliest stages of product development and continuously revisiting them throughout the AI lifecycle. Drawing inspiration from how security and privacy have been systematically embedded into software development, AI governance must follow a similar transformation.
“Shift-left” approach refers to proactively embedding AI risk considerations – such as bias detection, model robustness, and legal compliance – into the earliest stages of the AI product lifecycle. This involves integrating responsible AI practices during design, model selection, data handling, and iterative testing, rather than treating them as compliance checks at the final stages of deployment.
Leadership Support for AI PMs: Conditio Sine Qua Non
Under all circumstances, the central role of AI product managers and AI use case implementers requires more than just assigning responsibility to these central roles – it demands that they are empowered with the right resources, skills, training, and leadership support to fulfill their roles effectively.
Organizations must ensure that PMs selected for AI product management are equipped with foundational knowledge in AI ethics, risk assessment, regulatory requirements, and responsible AI principles. Without adequate training and resources, even the most competent PMs will struggle to bridge governance frameworks and practical implementation.
Conclusion
Effective AI product managers and implementers must intentionally design their workflows to address both traditional product risks – such as usability, feasibility, and business viability – and AI-specific challenges like bias, explainability, robustness, and data integrity.
As AI capabilities scale, organizations will need AI product leaders and implementers who can bridge strategy, ethics, and engineering. AI product managers and implementers must be equipped with the right knowledge, tools, and frameworks to address the unique risks posed by AI systems while establishing clear lines of accountability.
Those who proactively embrace their expanded responsibilities and embed responsible AI practices into every stage of development and deployment will be the most successful. The future of AI success lies not just in what we build – but how responsibly we build it.
What’s New in AI SaaS Funding?

SaaS is dead. Or at least, that’s the narrative floating around from VCs, industry leaders, and even Satya Nadella himself. And when the CEO of a $3 trillion company says something, there must be some merit to the statement
As someone who has sat on both sides of the tables, I see a more nuanced reality unfolding and it doesn’t take a genius to guess that AI has been mongering over our heads to transform SaaS as we knew it, yesterday a year or a decade back. And AI’s influence is extending far beyond traditional software budgets—it’s now making inroads into labor spend, fundamentally reshaping how businesses allocate resources. Current Market Dynamics
Venture funding to U.S. companies totaled $178 billion — gobbling up 57% of global capital. The Bay Area absorbed $90 billion of this – a 50% jump from 2023, thanks to the boom from AI investing.
As per crunchbase, funding to AI-related companies crossed $100 billion. That’s an 80% YoY surge from $55.6 billion in 2023. Investor confidence is evident in the numbers as AI SaaS startups are commanding revenue multiples of 37.5x, significantly higher than the average SaaS multiple of 7.6x (finrofca). The speed at which AI is scaling makes every other wave of innovation look slow.
Much of this momentum comes down to R&D investment. It’s no coincidence that China, the U.S., and Japan—home to the highest concentration of AI patent applications – are also leading the charge in developing the most advanced foundational models.
Where’s the Money Moving?
-
Seed Stage – AI companies are raising rounds 28% higher than non AI startups
Nearly 3 in 4 AI deals (74%) were early-stage in 2024. AI startups raising seed rounds are seeing median rounds of $1.6 million – that’s 28% higher than non-AI startups, who’re pulling in about $1.25 million.
AI startups made up 22% or $7 billion of first-time VC financing. Just two years ago, that number was half.
Source: Pitchbook
The barrier to entry for AI has lowered, thanks to accessible cloud resources, open-source models and fine-tuning APIs. The real differentiator at this point is deep technical defensibility and teams’ ability to solve hard enterprise problems, not just layering AI on existing workflows.
-
Series A & B median valuations are 2.5x higher
The median AI pre-money valuations of Series A and Series B are $34.0M and $150.0M valuation, respectively. AI startups in these rounds are seeing median valuations 2.5x higher than in traditional SaaS.
Why? Because If you’ve made it to Series A/B, investors are now betting on you to achieve hyperscale adoption. But not all will make it. The adoption cycles in enterprise AI are brutal, and many of these companies are realizing that just because the tech is promising doesn’t mean it fits into existing workflows. Some will break through. Many won’t.
-
Growth Stage (Series C & Beyond) demands proprietary infrastructure
This is where the pressure is mounting. The biggest AI bets of 2024 include Databricks ($10B round), OpenAI ($6.6B), xAI ($6B), and Anthropic ($4B). These late-stage rounds are dominated by hyperscalers and sovereign wealth funds, signaling that true differentiation now demands proprietary infrastructure, not just incremental improvements.
The capital is there, but so are sky-high expectations. The ones solving real problems will raise massive rounds. The rest? They’ll burn out just as fast as they took off.
Source: finrofca
How different industries are adopting
AI funding across industries is directly proportional to the substantial data volumes, potential for automation and the complexity of existing operational workflows. An Accenture report shows the profit potential in these industries after implementing AI integrations.
Source: Springsapps
-
Healthcare & Life Sciences: AI is revolutionizing drug discovery, diagnostics, and clinical workflows. Currently, 79% of healthcare organizations have adopted AI technology; $3.20 is made for every $1 invested. AI is moving from experimental technology to core infrastructure in medical delivery.
-
Finance: From fraud detection to automated underwriting and trading algorithms, generative AI alone could contribute $200–340 billion annually to global banking, primarily through productivity gains.
-
Manufacturing & Supply Chain: AI-driven predictive maintenance, logistics optimization, and robotics are redefining efficiency, reducing downtime, and streamlining operations.
Market Correction in Full Swing
Low-hanging fruits like customer service, automation, and sales are now dime a dozen. AI in developer tools has been commoditized. And when tech gets cheaper, returns shrink, investors pull back, and the price tags start correcting themselves. Incumbents will take the advantage, evident in 384 AI acquisitions in 2024, nearly matching 2023’s 397.
The AI gold rush isn’t over, but the market’s getting a reality check. Unsustainable burn, superficial value propositions, and the incumbent disadvantage are driving this consolidation.
It’s Back to the Fundamentals
At the end of the day, funding metrics and exit multiples make for good headlines, but they’re not the whole story. The enduring companies will be built by teams who understand that AI isn’t just a technical challenge – it’s a transformation challenge. Whether you’re writing checks or cashing them, it’s time to wear the operator’s hat and get back to the basics. Which ones, will make it? Here’s a framework that will help evaluate that: AI Startups: does your idea have hype, hope, or is it a hard pass
Chatbots, CoPilots, and Choreographers: Where will you start with AI Agents?

Agentic AI and the Shift Beyond Traditional Automation
Artificial Intelligence is no longer just a tool for automation—it’s becoming an active decision-maker, collaborator, and orchestrator of complex business processes. The rise of Agentic AI marks a fundamental shift from traditional automation, where AI simply follows predefined rules, to intelligent systems that act autonomously, adapt dynamically, and execute tasks with minimal human intervention.
For product leaders, this shift is impossible to ignore. AI is no longer limited to Chatbots handling customer inquiries—it’s evolving into CoPilots that assist humans in high-stakes decision-making and Choreographers that autonomously coordinate business processes across multiple systems and stakeholders. These AI-driven agents are redefining productivity, accelerating workflows, and unlocking new revenue streams across industries.
The real question isn’t whether your business will use Agentic AI—it’s whether you’re prepared to leverage its full potential before your competitors do.
In this post, we explore three critical areas where Agentic AI is making an impact: Chatbots that go beyond scripted conversations, CoPilots that augment human decision-making, and Choreographers that automate complex business processes.
Let’s dive in.
1. Chatbots: From Simple Responders to Intelligent Agents
Chatbots have been a staple of AI applications, particularly in customer service. However, Agentic AI expands their role beyond simple query resolution, enabling them to handle multi-step workflows, dynamically escalate issues, and integrate across business functions—acting more like autonomous agents than mere responders.
Use Cases:
Customer Service Agents: Unlike traditional chatbots that follow scripted responses, Agentic AI-driven bots can diagnose problems, autonomously retrieve relevant data, and execute actions such as processing refunds or modifying orders—reducing human intervention while improving customer satisfaction.
AI-Driven Marketing Assistants: Modern marketing teams leverage AI-powered assistants to analyze consumer behavior, generate personalized content, and optimize campaign performance in real time. These intelligent agents go beyond automation, dynamically adjusting messaging, budget allocation, and audience targeting based on evolving market trends.
Sales and Lead Qualification: AI agents autonomously assess potential leads, schedule follow-ups, and provide tailored product recommendations—functioning as autonomous sales assistants rather than passive responders.
2. CoPilots: AI-Powered Decision-Makers for Humans
Agentic AI is particularly powerful when deployed as a CoPilot—an AI assistant tailored to specific industries and human personas, enhancing efficiency in specialized roles. Rather than replacing humans, these AI companions augment expertise, reduce cognitive load, and improve decision-making by providing real-time insights, automating repetitive tasks, and adapting to unique workflows across sectors like IT, healthcare, HR, and finance.
Use Cases:
Health CoPilot for Clinicians: AI-powered clinical decision support tools assist doctors by summarizing patient histories, flagging anomalies in diagnostics, and suggesting personalized treatment plans based on real-time data.
HR CoPilot for Employee Experience: HR teams leverage AI-driven assistants to proactively address employee concerns, recommend career growth paths, and personalize benefits—enhancing engagement and reducing attrition.
IT & Storage CoPilot (Pure Storage’s IT Manager’s Companion): This AI assistant autonomously monitors infrastructure, predicts failures, and optimizes storage allocation, ensuring seamless IT operations with minimal human oversight.
3. Choreographers: Automating Business Processes with AI
One of the most transformative applications of Agentic AI is in business process automation, where AI systems dynamically identify bottlenecks, orchestrate workflows, and optimize complex operational tasks. These AI-driven Choreographers act as autonomous conductors of business functions—seamlessly coordinating multiple systems, departments, and tasks.
Use Cases:
Autonomous Supply Chain Management: AI agents predict demand fluctuations, automate vendor negotiations, and optimize logistics in real time, significantly reducing inefficiencies in global supply chains.
Finance and Risk Management: AI-driven financial agents autonomously analyze transaction patterns, detect fraud, and make real-time credit decisions—revolutionizing risk assessment processes.
HR Onboarding Automation: Instead of manual onboarding processes, AI agents coordinate across IT, HR, and operations to provision accounts, assign training materials, and personalize onboarding workflows for new employees.
The Imperative for Product Leaders
For product leaders, Agentic AI is not just another efficiency tool—it represents a fundamental shift in how AI-driven systems interact, execute, and learn. The key to unlocking its full potential lies in identifying complex, high-value workflows where AI can operate autonomously—reducing friction, enhancing decision-making, and driving real business impact.
Where will you start? How will you strategize?
The challenge ahead is not just about automation—it’s about redesigning product strategies to embrace autonomy and orchestration. To stay ahead, product leaders must move beyond automating repetitive tasks and intentionally integrate Chatbots, CoPilots, and Choreographers into their business ecosystems. The winners in this AI revolution will be those who leverage AI not just to support but to lead business processes, unlocking new efficiencies, innovations, and competitive advantages.💡 Are you ready to embrace Agentic AI, or will your competitors beat you to it?
Automating Prompt Engineering with DSPy: An Overview

The power of Language Models (LMs, or oftentimes called Large Language Models or LLMs) is often harnessed by chaining them together into sophisticated Language Model Programs, capable of tackling increasingly complex Natural Language Processing (NLP) tasks. Think of these programs as multi-step recipes where each step involves prompting an LM to perform a specific sub-task. Traditionally, building these LM programs relies heavily on manual prompt engineering: the time-consuming process of crafting effective instructions and examples (prompts) for each step through trial and error.
Enter DSPy: a novel framework that introduces automated prompt engineering for LM programs. It is the framework for programming—not prompting—LM programs. Instead of painstakingly hand-tuning prompts, DSPy treats the instructions and demonstrations given to the LMs within a program as parameters that can be automatically optimized to maximize performance on a specific task.
DSPy: Programming and Optimizing in One Go
DSPy provides a declarative programming model that allows developers to define what they want their LM program to achieve, without needing to specify exactly how each LM call should be prompted initially. You define the program as a series of modules, each with prompt templates containing open slots or variables for instructions and demonstrations (examples).
The core innovation of DSPy lies in its ability to automatically find the best values for these prompt variables. It does this by using various optimization strategies that don’t require access to the internal workings (e.g. gradients) of the LMs themselves or detailed labels for the intermediate steps within the program. DSPy only needs the LM program, a metric to measure its overall success (e.g., accuracy, precision, recall, F1), and a training dataset of inputs (and optionally, final outputs).
Tackling the Challenges of Prompt Optimization
Optimizing prompts for multi-stage LM programs presents two key challenges that DSPy is designed to address:
•The Proposal Challenge: The space of all possible instructions and combinations of demonstrations is incredibly vast. DSPy needs efficient techniques to propose a small set of high-quality prompt candidates for each module.
•The Credit Assignment Challenge: When an LM program doesn’t perform well, it’s difficult to determine which module’s prompt is the culprit. DSPy needs strategies to infer the impact of prompt choices in each module on the overall program performance to guide the optimization process.
How DSPy Optimizes: Key Strategies
DSPy employs several intelligent strategies to tackle these challenges:
•Bootstrapping Demonstrations: DSPy can automatically generate potential few-shot examples (input/output pairs) for each module by running the initial version of the program on training data. If the program produces a successful final output (according to the defined metric), the input/output traces of each module are treated as valuable demonstrations. The optimizer can then intelligently select combinations of these bootstrapped demonstrations to include in the prompts.

•Grounded Instruction Proposal: To generate effective instructions, DSPy utilizes another LM, a “proposer” LM. This proposer LM is provided with relevant context to help it craft better instructions. This context can include summaries of the training data, the structure of the LM program itself, and even previously evaluated prompts and their scores. By “grounding” the proposer in this information, DSPy aims to generate instructions that are more tailored to the specific task and the role of each module within the program.
•Surrogate Models for Efficient Search: To navigate the vast space of possible prompt configurations efficiently, DSPy can use surrogate models, such as Bayesian optimization. These models learn to predict the performance of different prompt combinations based on past evaluations, allowing DSPy to focus its search on the most promising areas. This reduces the number of costly LM calls needed for evaluation.
•Meta-Optimization of Proposal Strategies: DSPy can even go a step further by learning the best way to propose prompts. By parameterizing the hyperparameters of the proposal process (e.g., the temperature of the proposer LM, which grounding information to use), DSPy can use techniques like Bayesian optimization to find the proposal strategies that yield the best performing prompts for a given task and LM setup.
MIPRO: A Powerful Optimizer in the DSPy Toolkit
MIPRO (Multi-prompt Instruction PRoposal Optimizer) is an algorithm built using the above insights that demonstrates strong performance. MIPRO jointly optimizes both the instructions and the few-shot examples for each module in an LM program. By separating the task of proposing prompts from the task of evaluating and selecting the best combinations (credit assignment using a surrogate model), MIPRO can effectively find high-performing prompt configurations.
Key Lessons Learned from DSPy Optimization
The evaluation of DSPy optimizers on a diverse set of tasks yielded several important lessons:
•Optimizing bootstrapped demonstrations is often crucial for achieving the best performance in LM programs . Providing relevant examples can be a very effective way to guide LMs.
•Jointly optimizing both instructions and few-shot examples, as done by MIPRO, generally leads to the best overall performance across a range of tasks.
•Instruction optimization becomes particularly important for tasks with complex, conditional rules that are difficult to convey through a limited number of examples. In such cases, refining the instructions can have a significant impact.
•Providing relevant context (“grounding“) to the instruction proposal process is generally helpful, but the most beneficial type of context can vary depending on the specific task.
•There is still much to learn about the optimal strategies for LM program optimization, and future research can explore the performance of different optimizers under varying resource constraints and with different base LMs.

Conclusion: Towards More Efficient and Effective LM Programs
DSPy represents a significant advancement in how we build and optimize Language Model Programs. By automating the often tedious and error-prone process of prompt engineering, DSPy offers the potential for increased efficiency, better performing NLP solutions, and a reduced reliance on manual trial and error. As LM programs become increasingly central to tackling complex tasks, frameworks like DSPy will play a vital role in making this technology more accessible and powerful.
Join Founders Creative at the AI Engineering Summit 3/28 in Palo Alto to connect with 50+ engineering leaders!
Sign up today using code RAYMOND for $50 off! Early bird ticket sales end 3/9 11:59p Pacific!
DeepSeek’s Model Training Methodology

In the previous post, we explored how the DeepSeek team utilized an HPC codesign approach. By enhancing both the model architecture and the training framework, they were able to train DeepSeek models effectively while using fewer resources. In this article, we will delve into the innovative techniques they employed in their training methodology.
Large Scale Reinforcement Learning
DeepSeek-R1-Zero naturally acquires the ability to solve increasingly complex reasoning tasks by leveraging extended test-time computation. This computation ranges from generating hundreds to thousands of reasoning tokens, allowing the model to explore and refine its thought processes in greater depth
One of the most remarkable aspects of this self-evolution is the emergence of sophisticated behaviors as the test-time computation increases. Behaviors such as reflection—where the model revisits and reevaluates its previous steps—and the exploration of alternative approaches to problem-solving arise spontaneously.
– DeepSeek-R1 paper
One of the primary innovations from the DeepSeek team is the application of direct, large-scale unsupervised Reinforcement Learning (RL) for LLM training. Their research demonstrates that the use of RL naturally enhanced the model’s reasoning abilities.
Reinforcement learning (RL) has been used extensively in multiple domains, including robotics. It is a machine learning technique where the model learns to make decisions based on feedback. Desired behaviors are rewarded, and undesired behaviors are punished.
At a high level, the RL learning process involves an agent (the model being trained) and an interpreter model. The interpreter model reviews the agent’s results and takes input from the environment to determine a reward to apply to the model.
RL enables an agent to learn optimal strategies in complex environments by interacting directly with the environment and focusing on maximizing long-term rewards. This makes the model particularly adept at handling dynamic, uncertain situations where immediate feedback may not be available. In contrast, traditional machine learning methods often rely on pre-labeled datasets. RL essentially helps the model learn by experiencing the consequences of its actions and adapting its behavior to achieve the best possible outcome.
Existing Proximal Policy Optimization (PPO)
While there are many reinforcement learning algorithms, PPO, introduced by OpenAI in 2017, has been the default RL algorithm since 2018.
PPO’s inner workings
As shown in Fig 2 below, PPO uses four models:
-
Policy Model: This is the LLM model being tuned.
-
Reference Model: This is identical to the policy model, but it’s frozen and used to reduce model divergence.
-
Reward Model: This is a pre-trained model that evaluates the reward for generated text.
-
Value Model: This is trained as part of the RL process to estimate the long-term value for the generated output.

The PPO Process
-
A query (q) is submitted to the policy model, which generates an output (o).
-
The reward model computes a reward (r) for the output.
-
The value model estimates a value (v) for the output.
-
The Generalized Advantage Estimation (GAE) function combines r, v, and the reference model output to estimate the advantage (A).
-
The advantage is then used to update the policy model weights.
Key Takeaway
PPO’s use of four models makes it compute-intensive, presenting challenges for large-scale RL implementations.
DeepSeek’s Group Relative Policy Optimization (GRPO)
The DeepSeek was able to address the scaling challenges by simplifying two key parts of PPO:
-
replacing a learnt value model with a simpler rules based reward computation
-
simplified KL regularization
With these optimizations, the team was able to use RL at scale during pre-training of the model.

Four Stage Training Pipeline
The use of large-scale unsupervised RL led to the development of a strong reasoning model. However, this model encountered challenges related to readability, language mixing, and generalization to non-reasoning tasks. To address these issues, the team devised a new four-stage training pipeline that incorporates two supervised fine-tuning (SFT) and two reinforcement learning (RL) stages.
The initial SFT stage utilizes high-quality cold start data to stabilize the subsequent RL step, which in turn enhances the model’s reasoning capabilities. This is followed by another SFT stage that employs rejection sampling to further strengthen the model and includes non-reasoning examples to improve its performance on non-reasoning tasks.
The final RL stage incorporates more general tasks to align the model with human expectations and includes a reward for readability and single language usage in its policy optimization step.
DeepSeek-R1 Results
The DeepSeek-R1 model showcased impressive results, with the team summarizing their findings as follows:
-
DeepSeek-R1’s performance was on par with OpenAI-o1-1217 on multiple tasks.
-
The model’s strong document analysis capabilities were evident in its performance on FRAMES, a long-context-dependent QA task.
-
DeepSeek-R1 displayed strong instruction-following capabilities, based on impressive results on the IF-Eval benchmark.
-
The model’s strengths in writing tasks and open-domain question answering were highlighted by good performance with AlpacaEval2.0 and ArenaHard.
-
DeepSeek-R1’s performance on the Chinese SimpleQA benchmark was worse than DeepSeek-V3 due to the addition of Safety RL to control output (censorship?).
-
Large-scale reinforcement learning was highly effective for STEM-related questions with clear and specific answers.
-
Reasoning models were generally better at handling fact-based queries.
-
Reasoning tasks
-
79.8% Pass@1 on AIME 2024
-
97.3% on MATH-500
-
2,029 Elo rating on Codeforces
-
-
Knowledge
-
90.8% on MMLU
-
84.0% on MMLU-Pro
-
71.5% on GPQA Diamond
-
-
Others
-
87.6% on AlpacaEval 2.0
-
92.3% on ArenaHard
-
Model Distillation
Distillation Creates Smaller, More Efficient Models
The DeepSeek team found that smaller, denser models trained on data from the larger R1 model, through a process called distillation, performed very well on benchmarks. This finding can help create smaller, more efficient models in the industry.

Six Distilled Models Created
Using the Llama and Qwen models, the team created six distilled models:
-
Distill-Qwen-1.5B (1.5 billion parameters)
-
Distill-Qwen-7B (7 billion parameters)
-
Distill-Qwen-14B (14 billion parameters)
-
Distill-Qwen-32B (32 billion parameters)
-
Distill-Llama-8B (8 billion parameters)
-
Distill-Llama-70B (70 billion parameters)
Distilled Models Perform Well
The distilled models also performed very well compared to existing similar models on multiple tasks.
Summary
The DeepSeek team has innovated on multiple facets of model building to create a best of the breed reasoning model. By open sourcing everything, they are also enabling innovation in the industry. It will be very fascinating to see how these innovations power more improvements in LLMs.