Cracking the Code - ChatGPT and Other LLMs

Alright, let’s talk about it. What is all of this and why should you care about it? The LLM space is much larger than just ChatGPT, but for simplicity’s sake, let’s start there. 

To understand what ChatGPT is, let’s start with what it isn’t:

  1. It is NOT a knowledge base. ChatGPT isn't this repository of pre-existing knowledge like an encyclopedia or a database like Google

  2. ChatGPT does NOT have personal experiences, emotions, or consciousness. It doesn’t truly understand the world like a human does.

  3. It does NOT possess critical thinking abilities. It can't evaluate information for accuracy or context in the same way humans can (reasons why you should not completely lean on this technology)

  4. Unlike humans, ChatGPT does NOT learn from interactions in the same way. It doesn't remember previous interactions or adapt based on individual conversations.

Alright…so then what is it?

ChatGPT fits into a category of technology called Large Language Models. Large Language Models are essentially super-smart text generators that have learned how to compile responses based on a high amount of training data. In the last article, we talked about how models need to be trained on massively large data sets, well, ChatGPT was trained on 570GB of datasets, which includes web pages, blogs, books, scientific journals, and even stuff like Wikipedia and Reddit. So this means that ChatGPT has not only learned a large amount of vocabulary but also a wide array of grammatical structures, context and sentiment analysis etc.

How Does This Work?

In the previous article, we discussed how Neural Networks are the heart of Deep Learning - their architecture takes inspiration from the structure of the human brain. They are responsible for generating the responses that we see when prompting ChatGPT. To understand this, let's start with a simple prompt and ask ChatPGT what the difference is between Java and Javascript:

It will generate a response such as:

The text we provide as a prompt to the model serves as the "Input Layer," while each word that ChatGPT generates in its response forms part of the subsequent stages. Within ChatGPT's architecture, deep neural networks are organized to construct responses based on the calculated likelihood of a word to follow the one before it. These weights (such as 0.4, 0.5, 0.02, and so forth) are pivotal elements of the "Hidden Layer." This process unfolds across the entirety of the response, word after word. That’s the whole thing - all this LLM is, is a model that simply predicts future words based on statistical probabilities established in its training. If, for instance, the initial sentence starts off with "Java," a specific set of weighted word choices related to "Java" guides the subsequent words, and this pattern repeats until we get our full response.

BUT, if you think ChatGPT always selects the word with the greatest weight to succeed the preceding one, you're mistaken :) It incorporates an inherent element of randomness, this was deliberately built into the model to add a touch of creativity to the responses we get. To see this in action, let me prompt the same question again:

Now, since this sentence starts off with "Certainly," the model deals with an entirely different set of weighted word choices than we would have seen in the first response that began with “Java”, and an entirely new set of randomness. The example above is obviously simplified as there are deep neural networks built out with innumerable layers, but you get the general gist 🙂

Like I said at the beginning of this article, the realm of LLMs stretches far beyond just ChatGPT. And guess what? ChatGPT wasn't even close to being the first LLM around (that idea goes way back to the 1950s). Now we have a bunch of snazzy LLMs like PaLM and Claude, each bringing their own cool stuff to the table. But let me tell you, right now might just be the most exciting time ever for this tech to take off.

The Current State of LLMs - Why is NOW so exciting?

For the longest time, all the major Large Language Models (such as the few I listed above) were closed-source models. “Closed-Source” means that the larger tech community outside of the organizations that made this technology can’t see the original source code of these LLMs. So the general public doesn’t understand how these models were trained and they can’t make iterations to the original code to build upon the product further. But, Llama2 (a Meta and Microsoft product released in July of 2023) became the first major LLM to be released as an open-source model - and the gates have opened. “Open-Source” means that people outside of the corporations that develop these models can work with the original source code and even improve/edit early iterations of the models. There is a plethora of talent and passion for cool tech outside of the organizations that create the technology, so once that larger tech community gets its hands on it, it’s game over. The Open Source community doesn’t work with restrictions you see in large corporations such as strict product pipelines or compliance considerations, so we get a lot of advancement very quickly (and I mean QUICKLY). You can see here some of the top iterations that have been made to models such as Qwen 7B (Alibaba's recent open-source LLM) and Llama2.

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

All of this is important as we see what unfolds over the course of the next year. The ChatGPT you interact with is an application that sits on top of the ChatGPT LLM. And not every LLM has an accompanying application that sits on top of it - so what we’re seeing right now is the explosion of the infrastructure behind future applications. 

Great AI Podcasts that cover the biggest trends and advancements in the space:

Good video resources: