On June 24-25, 2021 we participated in the on-line conference “CLARIN-PL-Biz – language technologies for science and business II” . Her speakers included scientists and entrepreneurs who used the CLARIN-PL infrastructure. Literacka Technologie is a member of the “CLARIN – common language resources and technological infrastructure” consortium.
On Thursday, the scientific part of the event took place, and on Friday – the part on the commercial applications of the CLARIN infrastructure.
During the first day, the following were presented:
- The Corps of Four Sorcerers – a project whose aim is to create a modern resource containing the full work of the bards of the Romantic era,
- new text processing tools, incl. Punctuator – a tool for introducing punctuation and a new thematic classifier of texts,
- COMBO dependency parser – a language preprocessing system that performs e.g. morphological analysis, tagging, lemmatization,
- Korpusomat application that allows you to create language corpora based on the users’ own text resources.
On the second day, the issues raised included, among others:
- speech recognition and evaluation,
- anonymizer – a tool for automatic text anonymization,
- business applications of wordnets and ontologies,
- extracting information from texts and classification of qualifications,
- Chronopress – exploration of diachronic press corps.