CLARIN – Literacka Sp. z o.o.

Literacka Technologie is a business partner of the project “CLARIN – common language resources and technological infrastructure” . The leader of the project is the Wrocław University of Technology. CLARIN is co-financed by the Intelligent Development Operational Program 2014-2020 (Priority IV: Increasing the scientific and research potential, Measure 4.2: Development of modern infrastructure).

Project budget : PLN 136.1 million.
Co-financing : PLN 105 million.
Entrepreneurs’ contribution : PLN 19.8 million.

CLARIN (Common Language Resources & Technology Infrastructure) is a pan-European research infrastructure that enables researchers in the humanities and social sciences to work comfortably with very large collections of texts. The CLARIN ERIC consortium consists of 22 countries . Poland is among the 8 founding members of CLARIN.

CLARIN-PL is a consortium that carries out the Polish contribution to the construction and maintenance of the European CLARIN ERIC. The consortium in Poland consists of 5 scientific units, headed by the Wrocław University of Technology (others are: Institute of Computer Science of the Polish Academy of Sciences, Institute of Slavic Studies of the Polish Academy of Sciences, University of Łódź, University of Wrocław) and 22 entrepreneurs , including Literary Technologies, which provide their resources and software . Therefore, CLARIN-PL-Biz works for the benefit of science and scientists, as well as the economy and entrepreneurs, and its goal is to use the developed technological infrastructure in business.

CLARIN-PL has databases describing natural language and its use. The consortium develops computer programs for text and speech analysis at various levels of natural language description and research applications supporting research in the areas of humanities and social sciences. The beneficiaries of the consortium’s activities are all scientific units and scientists in Poland. The developed tools are generally available and anyone can use them free of charge.

The aim of CLARIN-PL-Biz activities aimed at business is to contribute to the development of artificial intelligence by providing resources and tools. CLARIN-PL-Biz is working on creating an IT architecture for the construction of effective and efficient systems for mining large linguistic data (text and speech) and multimodal data. The result of the works in the first stage will be the creation of a system enabling the collection and permanent storage of language data, and then the adaptation of language tools to commercial standards by extending the scope of their functionality. CLARIN-PL-Biz will build basic linguistic resources for the Polish language, combined with resources for the English language, and develop tools for analyzing the polarization of sound and emotions. In the next stages, an IT environment for creating dialogue systems and tools for extracting information from text data will be created. The aim is also to design methods and prepare tools based on semantic text analysis and elements of discourse analysis and pragmatics for the needs of the process of extracting knowledge from data. CLARIN-PL-Biz aims to develop a general, base system for answering questions in Polish.

60% of the knowledge developed by CLARIN-PL-Biz will be generally available and free of charge. For business on a commercial basis, the consortium will offer a supercomputer focused on multi-scale natural language engineering and artificial intelligence. The cost of work on the supercomputer is estimated at about PLN 40 million. The services provided for entrepreneurs will include the collection and permanent storage of large linguistic data (text, speech, multimodal data, related numeric and symbolic resources). It will also be possible to extract information and knowledge from large linguistic data, generate answers to questions based on the corpus of texts, and automatic reasoning based on linguistic data. CLARIN-PL-Biz will offer semantic indexing and searching of data and will provide an IT environment for creating dialogue systems of various modalities (including text and speech generation and speech recognition).