In the first part of the post ( link here ) we briefly presented the Transformer model – its architecture and the idea related to its operation. As we mentioned, it was a revolution in NLP, and its elements are used in modern language models such as GPT and BERT. In this part, we will deal with a specific language model, which is GPT-2, and present how we use it in Literacka to generate texts .
GPT-2 is the next language model described by us, after BERT. It was developed by a team of researchers at OpenAI. As a curiosity, it is worth mentioning that one of the founders of this organization is none other than Elon Musk, co-founder of PayPal, SpaceX, Tesla, Neuralink and Boring Company. It is a generative model for natural language. Its name comes from ‘ G enarative P re- T raining for Language Understanding’. Like BERT, its architecture is based on the elements of the Transformer model. Below is a reminder of the simplified scheme of operation of the Transformer:
The BERT architecture was based on the series-connected encoders of the Transformer model. In the case of GPT, these are series-connected decoders. This is illustrated in the figure below:
Depending on the number of decoder layers, we distinguish four variants of the GPT-2 model. And so the “Small” model consists of 12 layers, and thus has 117 million trainable parameters. The “Extra Large” is a 48 layer model with over 1.5 billion parameters to train. In the case of the issue under consideration, the 12-layer model turned out to be sufficient. Variants of individual models depending on their size are shown below:
The main difference from the BERT model is that the GPT model accepts single tokens, not whole sequences at one time. Model pre-training is also based on unsupervised learning and requires large text corpora for this. Finally, the model is used to generate text .
The “raw” GPT model generates further text based on the input sentences . In our case, the assumption was to generate notes describing the book based on its literary category and keywords. This approach significantly increases the control over the result obtained on the output. In our case, the keywords are NERs selected from the song. In order to achieve this goal, it was necessary to perform a fine-tuning of the pre-trained GPT-2 model for the Polish language, i.e. to tune the model to such a specific task. Of course, tuning is connected with preparing an appropriate dataset. In our case, it was over 500,000. excerpts of books containing from 400 to 600 words along with the literary categories assigned to them (biography, fantasy, historical, horror, crime, drama, travel, reportage, romance) and the NERs separated from them. From the 12 layers of the pre-trained model, 6 lower layers were frozen and the remaining 6 layers were trained on the prepared dataset. The pre-trained model already has a lot of knowledge of the language itself, and training only half of the parameters has significantly reduced training time. After receiving such a dose of data, the model is able to generate text based on keywords and song category .
We had a little fun trying to read the future from texts generated by artificial intelligence. In October we will be at the Frankfurt Book Fair and therefore we gave AI the following keywords: fair, books, Frankfurt, presentation, scientist, guest, research, passions, stall, hotel, room, road, receptionist, people. We asked for a text from the “reportage” category. And here is what artificial intelligence generated on this basis:
- “When it comes to the book fair in Frankfurt am Main, I can confidently say that this was the first time I had the opportunity to encounter something so different from anything I have known so far. I am convinced that my presentation will be very positively received by both foreign visitors and among Polish readers. All this makes me feel fulfilled as a literary research scientist. “
- “Fortunately, it turned out that you can meet many interesting people there – those who are willing to share their passions or interests. So I decided to take this opportunity to share my favorite book topics with the world.”
- “It turned out that no one else was at the book stand except me. I got up and continued on my way. When I got to the hotel, it turned out that the hotel was closed. I asked the receptionist to let me know when I had a free room.” .
The author of the text: Sebastian Jankowski
BIBLIOGRAPHY:
[1] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. Language Models are Unsupervised Multitask Learners, (2019).
[2] Jay Alammar. The Illustrated GPT-2 (Visualizing Transformer Language Models), http://jalammar.github.io/illustrated-transformer/ , (2019).
[3] Ivan Lai. Conditional Text Generation by Fine Tuning GPT-2, https://towardsdatascience.com/conditional-text-generation-by-fine-tuning-gpt-2-11c1a9fc639d, (2021).