GPT-3 is not a way to go!


You simply couldn’t miss the GPT-3 hype. Everybody loves it. From the very first day when it was “shown”. “Shown” in quotes because the number of the people that were able to play with it was, and still is, very constrained. I gained access to it a couple of months ago.

So, all the first reactions were WOW. Not in vain: GPT-3 is strikingly doing its job it is training for: automatically generating text, similar to the texts it was exposed to, during its training phase. And it was exposed to a lot of texts with very different contexts. I…


It's cool tool to explain what happened, and when. But it can't predict the future, that's for sure!

Yes. Just like RNNs and LSMTs. Just scaled up to the next level, where the firs layer here is covered by self-attention heads.

Guys, when will somebody figure out that we need hierarchical attention, i.e. attention over attention?

Finally, nice said about GPT-x stuff. x >= 2.

The are just statistically averaged human writers that generated a "lot" of text in one step, with preserved context. All the rest is: take the last several words/sentences and make the next step. Can't miss to lose the initially assigned context.

Till April 2020: GPT-2 was the king of AI, with his stunning 1.5B parameters.

It is not easy to deal with it. It takes 6GB on your disk, but that’s not the problem. The problem is processing speed: you have to wait several minutes for a single inference running on the CPU. With GPU, it would be at least ten times faster, in a case when you have NVidia GPU with at least 24 GB of Video RAM.

Somehow, you started to wish to fine-tune its behavior. Not that hard with the most miniature version, with its only 124 M…

Stojancho Tudjarski

ML and AI enthusiast, learning new things all the time and looking at how to make something useful with them.

