Language Model Overview: Software Engineering Full time 13 Phase 3 Hybrid

Learning Goals

Define a language model.
Describe briefly how language models work.
Discuss their uses and how we can utilize pre-trained models.

Introduction

We have learned how to use chatbots like ChatGPT for learning, debugging, and data generation. These chatbots use language models under the hood to generate responses.

In this lesson, we will discuss what these models are, how they work, and how they can be used in programs. Although we don’t have to understand exactly how to build these models in order to use them, understanding some basic principles will make it easier to work with language model APIs.

What is a Language Model?

Language models are artificial intelligence systems designed to understand and generate human language. These models can’t reason from first principles like humans do which can cause them to generate incorrect information. It’s important to understand how they work so we can be aware of their limitations and use them more effectively.

How Do They Work?

There are several complex models available now such as GPT-4 by OpenAI Links to an external site.,LLaMA by Meta Links to an external site., and PaLM 2 by Google Links to an external site.. Although these models have unique properties and development processes, we are going to focus on the basic steps required to develop any language model.

Here are the five main steps that are usually performed when creating a language model from scratch:

Training on Massive Text Data: The model is trained on a huge amount of text data from the internet. This training data helps it learn grammar, vocabulary, and the relationships between words and sentences.
Learning Word Representations: The model breaks down the text into small pieces called tokens. These tokens can be individual words or even parts of words. We will discuss tokens in more detail later since they are the main units of usage for these models.
Understanding Context: The model focuses on understanding the context of words and sentences. It learns by analyzing how words relate to each other in different contexts. For example, it learns that "apple" can refer to a fruit or a technology company based on the context in which it appears.
Generating Text: Once trained, the models can generate text based on the patterns it learned from the training data. We can give it a prompt or a question, and it will generate a response that's contextually relevant and resembles human-written text.
Fine-tuning and Improvements: The model then goes through a process called fine-tuning, where human reviewers provide feedback on the model's responses. This feedback helps improve the model's accuracy, safety, and usefulness.

This video by Google Links to an external site. gives a good overview of language models. And if you’re curious and want to dive a bit deeper into how these models work,this playlist by 3Blue1Brown Links to an external site.provides great visualizations of neural networks.

How Do Software Engineers Use Language Models?

As software engineers, we will mainly work with pre-trained models, i.e., we won’t have to train models from scratch over several months to years. But some of the ideas discussed here such as tokens and fine-tuning will come up when working with language model APIs.

Companies like OpenAI host their language models and allow others to interact with their model through an API. Often they provide different models for different tasks. For example, the GPT-4 model is for understanding and generating human language while the DALL-E model is for generating or editing images based on human language.

Conclusion

We have looked at how language models work and how software engineers can use various APIs provided by the companies that created the models.

In the following lessons, we will look at specific APIs and how to set up an environment for using these APIs.