The public's interest and imagination have been captured by the AI chatbot known as ChatGPT, created by the startup OpenAI. Some uses of the technology, like its capacity to explain difficult subjects or carry on lengthy conversations, are quite astounding.
It's hardly surprising that rival AI firms have been scrambling to publish their own large language models (LLMs), the term for the system that powers chatbots like ChatGPT. Some of these LLMs will be used in different products, such search engines.
I chose to test the chatbot on Wordle, the word game from the New York Times, which I have been playing for a while, in light of its outstanding capabilities. A five-letter word can be guessed six times by players. The game shows which letters, if any, are in the right places in the word for each guess.
I found that the performance of the most recent version, known as ChatGPT-4, on these puzzles was unexpectedly subpar. You may anticipate that word games on the GPT-4 would be simple. LLMs receive knowledge to help them become better at what they do, or they are "trained" on text. About 500 billion words were used to train ChatGPT-4, including the entirety of Wikipedia, all books in the public domain, massive amounts of scientific literature, and content from other websites.
AI chatbots may become a significant part of our life. Understanding why ChatGPT-4 has trouble using Wordle might help one better understand how LLMs represent and use words, as well as the limits this imposes.
First, I used a Wordle problem where I knew exactly where two letters were in a word to test ChatGPT-4. "#" stood for the unidentified letters in the pattern, which was "#E#L#". The word "mealy" was the solution.
Six answers from ChatGPT-4 did not follow the pattern in five instances. Beryl, Feral, Herald, Merle, Revel, and Pearl were the answers.
The chatbot occasionally discovered workable answers using other combinations. Overall, though, it was really hit or miss. It discovered five viable alternatives for a term that suited the pattern "##OS#". However, when the pattern was "#R#F#", it suggested two words missing the letter F and a term - "Traff" - that isn't in dictionaries.
Under the bonnet
A deep neural network, a sophisticated mathematical function or algorithm that converts inputs into outputs, lies at the heart of ChatGPT. Numbers must be used as both inputs and outputs. Since the neural network in ChatGPT-4 operates on words, words must be "translated" into numbers in order for it to function.
A computer programme known as a tokenizer performs the translation and keeps a massive list of words and letter combinations known as "tokens" in its database. Numbers are used to identify these tokens. A word like "friendship" is divided into the tokens "friend" and "ship" because words like "friend" have a token ID of 6756. The IDs 6756 and 6729 are used to denote them.
Before ChatGPT-4 even begins handling the request, the user's words are converted into numbers when they are entered as a question. The deep neural network cannot effectively reason about the letters since it does not have access to the words as text.
ChatGPT-4 is good at working with the first letters of words. I asked it to write a poem where the opening letter of each line spelled out “I love robots”. Its response was surprisingly good. Here are the first four lines:
I am a fan of gears and steel
Loving their movements, so surreal,
Over circuits, they swiftly rule
Vying for knowledge, they’re no fool,
There are a tonne of textbooks in the training data for ChatGPT-4, many of which include alphabetical indexes. This could have been sufficient for GPT-4 to pick up on the relationships between words and their initial letters.
When users input their request, the tokenizer also appears to have been altered to identify requests like this and appears to separate phrases like "I Love Robots" into discrete tokens. However, requests to deal with word's last letters could not be handled by ChatGPT-4.
It may seem strange that a huge language model like ChatGPT-4 would find it difficult to create palindromes or solve straightforward word puzzles given that nearly all of the vocabulary it has access to is included in the training data.
However, this is due to the need that all text inputs be encoded as integers, and the method used to achieve this does not take into account the arrangement of letters inside words. The necessity to represent words as numbers will remain since neural networks only function with numbers.
Future LLMs can get around this in one of two ways. The training data for ChatGPT-4 might be expanded to include mappings of every letter position inside each word in its vocabulary because it is known that ChatGPT-4 understands the initial letter of every phrase.
The second option is more interesting and all-encompassing. As I've demonstrated, future LLMs might produce code to address issues like these. An approach known as Toolformer, where an LLM employs external tools to complete tasks where they often struggle, including mathematical operations, was recently presented in a study.