Finally, nice said about GPT-x stuff. x >= 2.
The are just statistically averaged human writers that generated a "lot" of text in one step, with preserved context. All the rest is: take the last several words/sentences and make the next step. Can't miss to lose the initially assigned context.