Please note: this blog has been migrated to a new location at https://jakesgordon.com. All new writing will be published over there, existing content has been left here for reference, but will no longer be updated (as of Nov 2023)

AI, ML, and Large Language Models


I often describe myself as a generalist software engineer with a broad range of experience in web app development, system development, front end, back end, devops, etc, but I must confess I have a somewhat blind spot when it comes to data science.

In the past, I’ve worked with machine learning at Eagleview, constructed analytic dashboards at LiquidPlanner, and tackled an array of random business queries with hastily assembled SQL, R, or Excel at different points in my career. However, the recent surge in AI interest has left me feeling somewhat like I’m missing out.

Therefore, I’ve been dedicating some time to getting back up to speed with machine learning and - unsurprisingly given the current hype - with large language models and natural language processing. You can find some of my thoughts on the subject below…


Artificial Intelligence

The term AI covers a wide range of topics:

When applied correctly, AI techniques can help solve or enhance a variety of business problems across numerous domains. From a software engineering perspective, our primary concerns are:


Traditional Machine Learning

Traditional machine learning algorithms have been successfully used for decades. These algorithms perform well, boasting robust implementations in machine learning libraries like scikit-learn. They should probably remain the go-to solutions for many problems, especially when dealing with structured tabular data, where decision trees (and ensemble techniques like random forests or gradient boosting) continue to prove more accurate than deep learning techniques.


Deep Learning and Neural Networks

… but in the last decade or so, deep learning has taken center stage, using neural networks that can decipher complex non-linear relationships given sufficient parameters and training data. A neural network is considered a universal function approximator - signifying that it can be trained to approximate the result of any function.

A neural network can be defined by:

There exists a variety of neural network architectures, each designed to handle specific tasks in the field of artificial intelligence:


The Transformer Architecture

A 2017 Google paper - Attention is all you need introduced a novel type of neural network architecture known as the transformer.

This was designed to excel at text transformation tasks, such as language translation, and could be trained on a large corpus of text using unsupervised (e.g. unlabelled) training techniques like masking words and training the model to predict the missing word.

Its proficiency in text transformation enables it to easily:

In addition to text transformation the transformer architecture also excels at:

Surprisingly, a language model employing the transformer architecture, when provided with sufficient training data and configured with a large number of parameters, begins to exhibit emergent behaviour that can appear intelligent.

This marked the advent of the Large Language Model, and - perhaps - some of the first real signs of emerging artificial intelligence…


AutoComplete vs AGI

… However, it’s crucial to understand that a large language model is - at its heart - a predictive text model that:

Yes, an LLM can provide many useful abilities that, I believe, can significantly improve our applications and their user experience for the better. And yes, an LLM can exhibit some emergent behaviors that appear intelligent if trained or prompted under controlled circumstances. However, an LLM on its own:

… but these models certainly are a powerful tool to add to our software development toolbelt.


The LLM Landscape

Let’s explore the current large language model landscape with the help of a survey of large language models. The diagram below demonstrates the evolution of OpenAI’s GPT models, culminating in the recent release of GPT-4, which is available in ChatGPT, BingAI, and the Open AI platform.



However, GPT is not the only game in town. This diagram, also derived from a survey of large language models, was published merely a month ago (at the time of writing) and is already outdated due to last week’s release of Meta’s Llama 2 and Stability AI’s Free Willy 2.

The diagram illustrates the evolutionary paths of various large models from prominent companies in recent years. Given the excitement in the field, new models are being released continuously.



Some models are commercial, while others are open source. However it’s necessary to carefully review each model’s license to ascertain whether it can be utilized for your tasks.

Commercial Open
GPT (Open AI) LLAMA 2 (Meta)
CLAUDE (Anthropic) ALPACA (Stanford)
PALM/BARD (Google) MPT (Mosaic ML)
GPT/BING (Microsoft) FREE WILLY 2 (Stability AI)
COLPILOT (GitHub) OPEN LLAMA (Berkeley)
BLOOM (Big Science)
BERT (Google)
T5 and FLAN-T5 (Google)

The latest advancements on the commercial side are GPT-4 and Claude, while on the open-source side, the recent releases of Llama2, Free Willy 2, and MPT are generating excitement. This information will likely be outdated by the time you read this article :-)

There are numerous models being published every week…

huggingface.co is a tech company that originally provided the open-source transformer library for building transformer networks in Python. Now it hosts many open models, provides open datasets, and is evolving into a community for open-source LLM knowledge sharing and tools. They offer a portal to search for models as well as a leader board tracking the performance of various metrics across the most popular models.


Prompt Engineering

NOTE: The prompt examples in this section come from the exceptional online course Generative AI with Large Language Models from Andrew Ng’s deeplearning.ai

If you are reading this article (thank you) - you have likely used ChatGPT and already have a good understanding of what it means to prompt a language model. The most fundamental prompt often comes directly from a user. The prompt is sent to the model, and the LLM attempts to COMPLETE THE TEXT WITH THE MOST PROBABLE SUBSEQUENT WORDS. Thus, it’s crucial to understand that the LLM isn’t exactly “answering the question” as we might perceive it. Instead, it uses the knowledge it acquired from its training data to predict the set of words that best completes the response.


Without structure, a simple prompt coming directly from a user can cause an LLM to face some serious challenges:


To mitigate these issues, we employ a process called prompt engineering.

Being a predictive text model, an LLM responds effectively if the question or instruction aligns with its training data. Prompt engineering is the art of structuring the prompt to guide the model to perform the desired task. In the example below, we provide an instruction to “Classify this review” accompanied by a hint that the text should be followed by the sentiment, but we do not include any examples. This is called zero-shot inference:

If the model struggles to perform the task, we can enhance the prompt to include a single example. This is called one-shot inference:

If the model continues to struggle, we can improve the prompt to include multiple examples. In the example below we include both a positive and a negative example to guide the model towards performing the task we desire, in this case sentiment analysis. This is called few-shot inference:

In more complex cases, for example (below), if we are using the LLM to perform math or draw a conclusion, our one-shot example can include a breakdown of the steps required to reach the answer. In this case our example explains why the answer is 11.

Again, it’s crucial to understand that the LLM is not making any mathematical calculations here. Our one shot example is broken down into smaller steps which guides our model to respond with a similar structure and break its own task down into smaller steps. Simpler steps equate to easier math, and easier math is much more likely to be guessed correctly by the model.

So there are various techniques, suggestions, and patterns that you can use to structure the prompt to guide the model to perform the desired task, including:


In all honesty, this feels a lot like early SEO (search engine optimization), where various patterns emerged to try to rank at the top of the Google search results page. Some of the patterns were based on the underlying algorithm, while others were a bit more speculative - YMMV.


LLM Memory (or lack thereof)

One complication of working with large language models is that they possess no real ability to retain information. They do not learn from their tasks (ChatGPT has to include the recent conversation for context), they only have general knowledge up to their training cut-off date, and they have a limited context window (prompt) size for assimilating new knowledge.

For simple tasks, the solution to this problem might involve breaking the task down into sub-tasks. So for this example we could ask the model to summarize each chapter (or whatever portion fits into the context window) and then ask the model to summarize the summaries.

However, more complex tasks necessitate other strategies…

RAG - Retrieval Augmented Generation

One approach to handling the context window limitation involves storing our company documents in a vector database that can be used to perform a semantic search for the documents related to the user’s question.

  1. The user asks a natural language question
  2. A semantic search is performed, returning related documents
  3. The LLM is provided with the original question plus the related documents
  4. A natural language answer is returned to the user

In this way, we only send a small, relevant subset of our full documentation that is compact enough to fit into the model’s context window.

PAL - Program Aided Language Model

A different, and potentially more powerful pattern, involves augmenting the LLM with an external orchestrator that can perform actions on its behalf.

  1. The user asks a natural language question
  2. The LLM transforms the question into an action (or query)
  3. The action is performed by the orchestrator
  4. The LLM transforms the results into a natural language answer
  5. The natural language answer is returned to the user

In this way, we are only using the LLM as a text transformation engine, transforming the user’s natural language request into actions that our application can perform - such as calling a function, querying a database, or making an API request - and then transforming our application’s responses back into natural language. I believe this could be a very powerful orchestration pattern, avoiding many of the complexities of working with an LLM.

Training a Model

Many of the prompt engineering and RAG patterns are workarounds for the fact that:

A more robust solution is to train your own model, but that can be expensive and time consuming. Alternatively, you can fine tune a pre-trained model. This is much more efficient, and there are various mechanisms for undertaking fine tuning:

Once you start down the path of training (or fine tuning) your own model, you will need a robust way to measure the performance - e.g. accuracy - of your model output. There are a number of different metrics, each designed to measure accuracy for different tasks:


Technical Challenges

Large language models are not a silver bullet. There are a number of technical challenges, not least of which is the cost to train and host the model.

However, given the growing interest and availability of open source models, experiments, prototypes, and concepts can be explored fairly quickly and easily.


Random Ideas

Using language models to provide a natural language user experience (UX) is a very different approach to building applications than traditional forms and rules-based UX. However, now that we have a tool that allows us to understand the semantic meaning behind natural language, you can easily start to imagine new ways of providing values, and new feature and product ideas…


Development Technologies

There are a lot of technologies that enable this eco-system. At the top of the chain is the Open AI platform. Their API and cookbook are driving a lot of innovation. Fast.ai provides a very valuable training course and a high level library for experimentation. Hugging Face is becoming THE hub for the open-source deep learning community, hosting models, datasets and more. Langchain is the latest exciting agent/orchestrator library of the moment, but the underlying python library eco system is vast with fundamentals provided by numpy, pandas, matplotlib, tensorflow, pytorch, keras, and scikit. However, python is not the only language embracing ML. ELixir has a growing machine learning community with Nx, Scholar, Axon, and Bumblebee…

Training Courses

There are many ML training courses available for free. I found the short courses from deeplearning.ai to be excellent starting points, and their more in-depth “Generative AI with LLMs” course delves into further detail. The “Practical Deep Learning” course from fast.ai is outstanding but requires a longer time commitment as it is lengthy and detailed.

Books

… and of course there are some good old fashioned books…

Papers

The ML community is a scientific, open, and sharing community, and almost all the key insights can be found in the scientific papers shared on arxiv.org. Due to the current excitement, new papers appear almost daily. Here are some of the ones I’ve found most insightful:

Blog Articles

… and of course there are an infinite number of blog articles, but here are some of the ones I’ve found helpful:

Conclusion

So, phew, that’s a wrap. This article ended up much longer than I expected, but it still really just scratches the surface of where we might be going with LLM development. There are many established and useful tools in the wider ML ecosystem, and the current focus on LLMs and NLP is exciting to see. Personally, I don’t really consider these techniques as “intelligence” as we might intuitively understand the term, and there is significant work required to make an LLM useful in a robust production environment… but they can be very powerful tools for understanding semantic language. As such they can be used - with the right training and orchestration - to open up new solutions to complex problems.

I think it’s going to be pretty exciting to see if these models can live up to their hype and become another tool in our developer toolbox.