Blog

What is a Foundation Model?

I recently moderated a panel on Foundation Models at the Founder’s Creative engineering summit. We started with a simple warm-up question for our audience: “What is a Foundation Model?” Surprisingly, only about 15% of the 100+ engineers in the room could answer. Even within that small group, the definitions varied widely. This experience, along with a similar one at a UC Berkeley paper reading session, highlighted a fundamental gap in understanding. What actually is a foundation model?

I recommend that you pause here and and think about how you define it before proceeding. Don’t forget to add it to the comments section below.

Founders Creative is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

The term Foundation Model was introduced and defined in the 2021 report titled Foundation Models: Opportunities and Risks of Large AI Models from Center for Research on Foundation Models (CRFM) and Institute for Human-Centered Artificial Intelligence (HAI) at Stanford University. I have pulled out three relevant nuggets from the report:

A foundation model is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks

On a technical level, foundation models are enabled by transfer learning and scale.

Foundation model designates a model class that are distinctive in their sociological impact and how they have conferred a broad shift in AI research and deployment.

Combined together, we have a very clear definition of foundation model, in terms of how it is created, its capabilities and impact:

  • Massive Scale: Built with large number of parameters to capture intricate data relationships and trained on a very large corpus of data.

  • Self supervised learning: Trained using self supervised learning and hence does not need human annotations at scale.

  • Versatile: Can be adapted to new tasks through fine-tuning, prompt-based learning, or few-shot/zero-shot methods without full re-training.

  • Impact: Causing a major and broad shift in multiple domains with real world impact on people from all walks of life.

Fig 1: A foundation model can centralize the information from all the data from various modalities. This one model can then be adapted to a wide range of downstream tasks. Source: On the Opportunities and Risks of Foundation Models

The selection of the term “Foundation” was deliberate as well:

The word “foundation” specifies the role these models play: a foundation model is itself incomplete but serves as the common basis from which many task-specific models are built via adaptation. We also chose the term “foundation” to connote the significance of architectural stability, safety, and security: poorly-constructed foundations are a recipe for disaster and well-executed foundations are a reliable bedrock for future applications

Arguably, we can limit the definition to the capabilities and impact and say that

Foundation model is versatile, adaptable to new tasks without full retraining, and has a significant real-world impact on daily life.

That is it, very simple and yet has a lot behind it. What do you think? Do you agree with this definition? Have you seen usage of the term in any other context?

Founders Creative is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.