AI Аutumn. Winter comes next. Soon.

6 min readJust now

First comes spring, and it’s glorious: many new leaves on the trees. Then comes summer, and it’s hot. Next is аuthm, and leaves grown in the spring are falling. Then comes winter: it’s cold and frozen out here for a long time, and nothing happens outside.

Mapping seasons to current generative AI, even if not necessarily accurate in detail, might look like this:

Last AI spring: 2017 with Transformers and 2019 with BERT, GPT-2.

AI summer, 2020: GPT-3 and many startups relying on it, then 2022 with ChatGPT, with even more startups relying on it and cannibalising many previous ones. 2024: LLMs are more prominent and brighter, Anthropic was, or still is, better than ChatGPT, Meta shines with fine-tunable LLama models, and even Google finally managed to join the race. Also, there are many open-source LLMs and a rise in SLMs.

AI autumn 2025: Things are calming down. Next is winter. Let’s elaborate on this statement.

In how LLMs use them, transformers are good at only one thing: predicting the next, most probable word in a given text. GPT-3 stopped there. Making ChatGPT meant finetuning GPT-3 with massive human-generated data, showing him how to react when somebody asks something. The keyword used is instruction-based, and the work under the hood is an enormous amount of human labeling and generating examples of tasks and how they can be solved. OpenAI paid a lot for this, where it was cheap to get it, and it was still well orchestrated and conducted.

GPT-4: another WOW. It is better than ChatGPT. That’s what multiple instances (experts) trained separately on different instruction sets delivered to us.

The last GPT is a Ph. D.-level expert highly skilled at math, problem-solving, reasoning, and coding, among other things. Yup, another WOW. OpenAI invested a lot of money into generating data for finetuning that contain instructions on reasoning step by step with appropriate results.

Now, several BUTs.

LMMs occasionally diverge from the truth; in other words, they hallucinate. This is a feature, not a bug. It’s how they operate.

LMMs are becoming more and more brilliant, and everybody is happy. Hey, even SLMs started to perform quite acceptable; it seems that we finally figured out how to train them :-)

That opens the door for the first BUT: how and where to use them?

Current plausible use-case:

“I want to talk about something, I have an idea on my mind, but I’m unsure how to formulate it. ChatGPT will give me a few ideas.” It works like a charm, using it from time to time. ChatGPT is a storyteller given to us by gods, and I mean it. But how many people would pay, and how much money, for storytelling? ChatGPT is nothing more than a chit-chat engine, and according to OpenAI’s bank account status, only a few people are willing to pay for this technology’s greatness. And that is great, considering the cost of each hundred automatically generated words.

How do you make it worthwhile, something that brings value to somebody?

A few implemented use cases are already in production: copilots, RAG, and agents. But … they halucinate. That means that the person is asking something from them; they provide a text that looks incredibly convincing but with possible hallucinations. Either the chances for hallucination are 1% or 10%; it doesn’t really matter: it might happen. That further implies that the person who asked something and was provided with an answer MUST check and validate the response because there is a chance that it is wrong in some part, even with few words. That further requires that if a person uses a cooking copilot to compile a recipe, she must know that glue is not used as a food ingredient. And, if the person using MS Word copilot for writing something about quantum mechanics will get some heavily sounds-so-scientific text, she MUST check its validity; in other words, have a solid background on the theme that she is chit-chatting with ChatGPT used under the hood by the copilot. Using coding copilot? Convenient thing, I must admit. But, relying on the autogenerated code without validation and putting it into production … well, may the force be with you. In other words, nothing (yet) saves you from the need to know how to code, software engineering, and all the surroundings.

The more text is retrieved, and its length increases with each subsequent iteration, the more text must be validated by the human expert. Using autogenerated ChatGPT text in any form without carefully checking it is simply irresponsible and dangerous. RAGs produce answers to your questions and provide sources based on which they generated the answer. Without reading the sources and proving that she would figure out the same answer by herself, based on a given source, using the answer is not a good idea.

Agents generate a lot of text as steps. Read carefully what they plan to do, and avoid giving them access to anything physical.

In short, everything that comes out of the mouth of any LLM needs constant supervision by trustworthy human professionals. Therefore, instead of letting humans do the job, we need to implement processes where LLMs are used to produce something. Then, this is exposed to humans, and they review it, with the possibility of altering it where hallucination occurs, before letting the LLM-generated content go further in the process’s pipeline. The latest news on the topic is that autogenerated code on GitHub increased rapidly but with a much higher rate of fast modification. Let it be your lucky guess why. In many situations where LLMs were put into production, they slowed the previously established processes.

So, what is the business value of them?

What is coming next in 2025? I bet on better reasoning and better agents, naturally served by OpenAI, Anthrophic, or Google. But at what cost? Microsoft and Google are already preparing ground to restart nuclear power plants that they would pay for. That could happen in the next 2 or 4 years. And it may cover the needs for the next step forward. Since each step costs exponentially more than the previous, how many nuclear power plants are needed for the next big step after the next big step? I am not advocating against nuclear plants: when a big plane crashes in the air, 250 passengers die, and when a car crashes on the road (cars (still) don’t fly). Planes are still the most secure way to travel. However, this situation demonstrates how unsustainable the current growth rate in the current path is, with a clue as to how it can make money.

All of these are technologically impressive achievements. But, as the LLM engines become more and more intelligent, the use cases where businesses will need them are less and less, and their cost for training and inference is 10 times higher. We are talking about Ph.D-level smart LLMs, but how are Ph.Ds currently needed in business?

Another case to show how dilusional the situation is: 3 months ago, the smartest models wasn’t able count r-letters in the word strawberry. Are we talking about some “intelligence”? Why? Because nobody showed to them how to count letters, they weren’t told how to do that.

Sam Altman recently said that superintelligence will be reached in several thousand days. According to Wikipedia (https://en.wikipedia.org/wiki/Superintelligence), “A superintelligence is a hypothetical agent that possesses intelligence surpassing that of the brightest and most gifted human minds.” That is scary because what would validate something generated by something more brilliant than the more intelligent humans? That’s the AI we should be afraid of, not today’s AI. Luckily, several thousand days translated to years are several (5–15) X 3 years (thousand days) = 15 to 45 years. We’ll see. Anyway, saying this, Sam is positioning himself as a salesman selling something nobody knows how to make. The Anthropic CEO recently joined Sam with such “predictions”.

Until then, let’s see how far we are from a 3-year-old kid’s intelligence that does not hallucinate. Kids can learn to distinguish an elephant in a photo from, let’s say, three images. How many pictures do you need to show today’s computer to teach him to correctly label an elephant in a photo? Thousands.

We must pursue that intelligence, which requires more than generating the most probable word in a text. Because, you know, intelligence is not only repeating or immitating already seen content. It needs to be able to leart new things. As for now, we are still trying to figure out where to start.

Final words:

2025 will show a slower rate of AI growth.
Many ChatGPT-based startups will die.
The technology, which is questionable in accuracy, is still far from massive adoption in business and industry, except for marketing and consultancy agencies and similar storytellers, where nobody dies of occasional mistakes.

After that comes AI winter: we’ll have to figure out something better than guessing the next probable word that needs nuclear plants to run.

AI Аutumn. Winter comes next. Soon.

Written by Stojancho Tudjarski