What are Generative AI models?

What are Generative AI models?
SHARE

Exploring the Impact of Large Language Models

Over the past few months, large language models (LLMs) like ChatGPT have gained significant attention, showcasing their capabilities in diverse tasks such as writing poetry or planning vacations. These advancements highlight the transformative potential of artificial intelligence (AI) in driving enterprise value.

Introduction to Foundation Models

Kate Sol, Senior Manager of Business Strategy at IBM Research, explains that LLMs are a subset of a broader class known as foundation models. The term “foundation models” was coined by Stanford researchers who observed a shift in AI development. Traditionally, AI applications relied on training multiple models on specific tasks using task-specific data. However, the paradigm is shifting towards using a single foundational model that can be adapted for various applications.

Generative AI and Foundation Models

Foundation models are trained on vast amounts of unstructured data, enabling them to predict and generate the next word in a sentence. This generative capability categorizes them under generative AI. For example, a model might complete the sentence “no use crying over spilled” with “milk.” Despite being trained for generative tasks, these models can be fine-tuned with minimal labeled data to perform traditional natural language processing (NLP) tasks, such as classification or named entity recognition.

Tuning and Prompting Techniques

Foundation models can be adapted to specific tasks through two primary methods:

  1. Tuning: Introducing a small amount of labeled data to update the model’s parameters for specific NLP tasks.
  2. Prompting: Using carefully crafted prompts to guide the model’s responses, even in low-label data scenarios. For instance, asking the model to determine if a sentence has a positive or negative sentiment based on its generative capabilities.

Advantages of Foundation Models

  1. Performance: Due to extensive training on large datasets, foundation models excel in various tasks, outperforming models trained on limited data.
  2. Productivity Gains: These models require less labeled data for specific tasks, leveraging their pre-training on vast unlabeled datasets to deliver efficient results.

Challenges of Foundation Models

  1. Compute Costs: Training foundation models is resource-intensive, making it challenging for smaller enterprises to train their own models. Running inference on large models also demands significant computational power, often requiring multiple GPUs.
  2. Trustworthiness: The extensive data used to train these models may include biased or toxic content, raising concerns about the reliability of their outputs. Additionally, the exact datasets used for training are often unknown, complicating efforts to ensure data quality.

Innovations and Applications

Beyond language applications, foundation models are being applied in various domains:

  • Vision: Models like DALL-E generate custom images from text descriptions.
  • Code: Tools like Copilot assist in code completion.
  • Chemistry: Moleformer aids in molecule discovery for targeted therapeutics.
  • Climate Change: Earth science foundation models utilize geospatial data for climate research.

These advanced models are integrated into products and driving innovation across language, vision, code, and scientific research domains.

Conclusion

Foundation models represent a significant leap in AI capabilities, offering robust performance and productivity benefits while also presenting challenges in compute costs and trustworthiness. Ongoing research and development aim to enhance these models, making them more efficient and reliable for diverse business applications.