NER (Named Entity Recognition) is a popular technique used to identify and segment named entities and classify or categorize them into various predefined classes, i.e. people, time, objects, occupations, emotional states and much more. What do NERs give publishers? They provide knowledge about the content of the book in a minute, thanks to the use of NER, the publisher, before reading the book, will find out where and when the action of the piece takes place, what events appear in the text, what characters appear and what they do (e.g. a young blogger Kajtek who is a werewolf ) and even the emotional state of the characters.
As we wrote in this article, NERs are the result of the use of artificial intelligence algorithms.
An example of the identification of NERs in the text:
Mariusz (person) felt happy (emostate) when meatballs (food) appeared on the (product) table on Sunday (date-period). After a hearty lunch (event), he went for a walk (event) around Olsztyn (gpe). He took with him Arnold’s dog (person-animal) and the book (product) “Mrs. Zosia walks around Olsztyn” (art), he would sit on the bank (loc) of Łyna (gpe) near the castle (object) and planned a trip (event). From the corner of his eye he watched the amused (emostate) guests (person-type) of the Warmia Brewery (brand). He recalled the evenings (date-period) he spent here in 2020 (date) with Germany (norp) from Bavaria (gpe). He thought back to those summer weeks (date-period) spent with them on an event trip through Warmia (gpe).
In the example above, there are several different NER units:
- PERSON – named heroes of the book,
- PERSON-TYPE – denoting the type of person / people, separated from the previous PERSON,
- NORP – denoting nationality or a named group that has its customs, culture, values,
- OBJECT – a closed space where the heroes can stay,
- GPE – marking places possible for geolocation that can be found on the world map,
- LOC – an open space, undefined, in which the heroes can stay (apart from the geographical location),
- ART – titles of books, films, names of musical bands, titles of art works, etc.,
- EVENT – meaning events – e.g. battles, wars, sports and cultural events, festivals, elections etc.,
- PRODUCT – denoting things, objects, products, man-made goods,
- DATE-PERIOD – indicating a time period or time,
- PERSON-ANIMAL – meaning animals (real or fictional),
- FOOD – meaning food – names of dishes or food products that can be eaten,
- EMOSTATE – denoting emotional states and feelings, e.g. joy, sadness, depression, fatigue,
- BRAND – identification of the producer, brand, company, institution, organization etc.
How can we practically use NER analysis in a publishing house? Some examples below:
1. The most common words in a given category
When we look at the NER analysis of Maja Lunde’s novel “The History of the Bees” at the top of the GPE (category of words for locations to which specific geographic coordinates can be assigned) we have places related to England, China and the United States.
And yet a look at LOCi (categories of words defining space to which specific geographic coordinates cannot be added) shows that the most common words include: forest, meadow and field, along with city and street. This gives an interesting picture of the space in which the book takes place. Those who know “The History of Bee” know that this is a story with a strong ecological tone, drawing a dark vision of the world suffering the consequences of human actions.
2. Book geography
Where is the action of Agnieszka Pietrzyk’s book “Stay at home” taking place? Artificial intelligence tells us, listing among the most common location words: ELBLĄG. It appears in the book 31 times. Such information may allow the publisher to determine the region whose readers will be particularly interested in reading the book.
3. Where is the action of the book taking place?
In the novel “Dżozef” by Jakub Małecki, the most common professions are: doctor, doctor, nurse and patient. On the basis of just these few words, we are able to predict where the action of the novel is taking place.
See for yourself how important the analysis of text by artificial intelligence algorithms is. On the basis of the most common words in a given category, we can really tell a lot about a book, even if we have not read it yet.
What about a piece in which the most common professions / activities are: priest, bishop, king, rabbi, peasant, emperor, dean, scholar, merchant, prior, archbishop, pilgrim, nuncio, Talmudist, sultan, translator, count, servant, messenger?
Play with us and select all of the answers below that match the most common activities in the book. You can find the correct answers at the end of this article *.
A) Religion is an important plot point.
B) The heroes of the novel came from the future.
C) Women play a key role in the piece.
D) The action takes place at a time when society was divided into states.
E) Trading plays a significant role in the lives of the heroes.
What do NERs give us? There are many uses of the categories of words in the text. For example, they can be used to tag titles in an online bookstore, making your marketing work easier.
Magdalena Koperska , head of the ANAGRAM Publishing House , when asked what she thought might interest the readers when it comes to the results of the work’s analysis by artificial intelligence, replied: – The NER analysis looks very interesting. I was intrigued by the Book Map of the item published by our publishing house – Bohdan Kołomijczuk’s “Hotel Wielkie Prusy” translated by Ryszard Kupidura. NERs have said a lot about the book and even if we have not read it yet, we are able to imagine where and when the action of the piece takes place, what events dominate the text or what the main characters are like. It’s very interesting.
NERs make it easier for publishers to work, allowing for a preliminary estimate of what the publishing proposal sent by the author is about. We offer NER analysis of the text within the BookScout.ai system created by Literacka. Contact us to be able to take advantage of the opportunities offered by technology to modern publishers.
* Correct answers: A, D, E.