2nd interdisciplinary workshop Physics meets Humanities

# 2nd interdisciplinary workshop Physics meets Humanities

### Verba et numeri (quantitative approaches to the analysis of language and text)

### Conference Hall of the Ukrainian Catholic University (Lviv, 2a Kozelnitska st.), March 28, 2017, 15:30 - 19:00

### Poster

The goal of the workshop is to demonstrate how concepts and methods of natural sciences are applied in social sciences and humanities. The workshop aims to serve as a meeting place between academics and students in humanities, social and natural sciences. Besides obvious popularisation of scientific ideas the seminar promotes the quest for common interdisciplinary approaches to various problems.

### Program:

### Quantitative methods and models in linguistics.

*Alexey Vasiliev (Taras Shevchenko National University of Kyiv)*

In modern science problems requiring interdisciplinary approaches emerge incresingly often. Linguistics is no exception. First of all, it is about bringing mathematical tools in linguistic studies. Among the problems solved by mathematical methods are "classical", such as construction of frequency dictionaries and laws determining rank distribution. However, there are also new challenges. An important factor for this was the rapid development of computer technology. On the one hand, opened opportunities for the application of powerful computer systems for the analysis and processing of large data sets texts corporas. On the other hand, the development of computer technology raises interesting problems. An important applied value have machine translation problems, creating systems with elements of artificial intelligence, the development of military technologies, building expert systems and various others. Arsenal of techniques that are used in linguistics, includes probability theory and mathematical statistics, set theory, the theory of differential equations, graph theory, modeling based on neural networks etc. The interdisciplinary nature of tasks requires the involvement of various specialists: linguists, mathematicians, programmers, psychologists, historians, sociologists and physicists. The purpose of the talk is to illustrate the capabilities of modern mathematical methods in linguistics and identify areas of cooperation researchers of different scientific fields in solving complex problems.

#### Presentation

### Complex network concepts in language and text analysis

*Yurij Holovatch (Institute for Condensed Matter Physics of the National Acad. Sci. of Ukraine, Lviv)*

Zipf’s law is perhaps the most widely known law of quantitative linguistics. However, it does not provide an insight into language organization, since sentences are made by words interacting with each other. Recently numerous attempts have been made to describe such organization within the complex network science framework. The goal of my talk will be to make a short introduction into this science and to demonstrate its application for language and text analysis.

1. Yu. Holovatch, R. Kenna, S. Thurner. Complex systems: physics beyond physics. Eur. Journ. Phys. 38 (2017) 023002 (19pp)

2. Ю. Головач, К. фон Фербер, О. Олемской, Т. Головач, О. Мриглод, І. Олемской, В. Пальчиков. Складні мережі. Журн. Фіз. Досл. 10 (2006) 247-2912.

3. Yurij Holovatch, Vasyl Palchykov. Complex Networks of Words in Fables. In: Maths Meets Myths: Complexity-science approaches to folktales, myths, sagas, and histories. R. Kenna, M. Mac Carron, P. Mac Carron (Editors), Springer, 2016, pp. 159-175

4. Ю. Головач, В. Пальчиков. Лис Микита і мережі мови. Журн.Фіз.Досл. 11 (2007) 22-33

#### Presentation

###
**Quantitative analysis of writing systems: The Nko alphabet**

**Quantitative analysis of writing systems: The Nko alphabet**

**Andrij Rovenchak (Ivan Franko National University of Lviv)**

The alphabetic writing system Nko () was created in 1949 by the Guinean enlightener Sòlomáana Kántɛ for the Manding languages in West Africa [1,2]. This script was studied in various aspects, which can be parametrized quantitatively. In particular, it is possible to analyze whether the complexity of graphic shapes correlates with the frequency of symbols [3]. Quantitative linguistic laws (for instance, the Menzerath–Altmann law) give hints towards a correct definition of mora in the Maninka language [4,5], for which Nko is mostly used. Frequency analysis can also serve as an additional justification for some orthographic principles of the Nko script [5].

[1] Dalby, David (1969). Further indigenous scripts of West Africa: Manding, Wolof and Fula alphabets and Yoruba ‘holy’ writing. African Language Studies 10: 161–181.

[2] Vydrine, Valentin (1999): Manding–English Dictionary (Maninka, Bamana). Vol. 1: A, B, D–DAD, supplemented by some entries from subsequent volumes. (St. Petersburg: Dimitry Bulanin Publishing House).

[3] Rovenchak, Andrij & Vydrin, Valentin (2010). Quantitative properties of the Nko writing system. In P. Grzybek, E. Kelih, J. Mačutek (eds.), Text and Language: Structures - Functions - Interrelations. Quantitative perspectives (Wien: Praesens), 171–181.

[4] Rovenchak, Andrij (2015). Quantitative studies in the corpus of Nko periodicals. In A. Tuzzi, M. Benešová and J. Mačutek (eds.), Recent Contributions to Quantitative Linguistics (Berlin–Boston: Mouton de Gruyter), 125–138.

[5] Rovenchak, Andrij (2011). Phoneme distribution, syllabic structure, and tonal patterns in Nko texts. Mandenkan: Bulletin semestriel d'études linguistiques mandé 47: 77–96.

#### Presentation

###
*The quantitative analysis and qualitative findings (evaluetion of Borys Grinchenko purism). *

*The quantitative analysis and qualitative findings (evaluetion of Borys Grinchenko purism).*

*Orest Drul (Portal Zbruch) * ** . **

http://zbruc.eu/node/57752

#### Presentation

###
*Some quantitative features of works of the West Ukrainian writers. *

*Some quantitative features of works of the West Ukrainian writers.*

*Igor Kulchytsky (National University of Lviv Polytechnic).*

#### Presentation

###
*Аnalysis of word embedding techniques in small Ukrainian text corpora.*

*Аnalysis of word embedding techniques in small Ukrainian text corpora.*

*Andriy Romanyuk (Ukrainian Catholic University). *

Linguistic model based on word embeddings is widely used to solve many linguistic problems [1, 2]. The accuracy of these models depends on the amount of text data on which they are created [3, 4]. The purpose of the report is to familiarize with the techniques of word embedding and analysis models are derived from a small corpоra.

1. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1555–1565

2. Word Embedding techniques for Content-based Recommender Systems: an empirical evaluation

Cataldo Musto, Giovanni Semeraro, Marco De Gemmis, Pasquale Lops

RecSys 2015 Poster Proceedings

3. Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database

Edgar Altszyler, Mariano Sigman and Diego Fernández Slezak

4. Word Embeddings for the Construction Domain

Antoine J.-P. Tixier, Michalis Vazirgiannis, Matthew R. Hallowell

#### Presentation

### Archive of the workshop

**Organizers: **

Ukrainian Catholic University

Institute for Condensed Matter Physics NAS of Ukraine