New Information Technologies
in Systematization and Analysis of Literary Texts
(On example of the Russian short story of the Twentieth century).

G.Y. Martynenko
St. Petersburg State University
Address: St. Petersburg,, 199034, Universitetskaya nab., 11
Phone: +7 (812) 328-9519
Fax: +7 (812) 312-2246

Digital Anthology of the Russian Short Stories of the XX Century is currently being created at the Department of Computational, Applied and Mathematical Linguistics of St. Petersburg State University. The corpus represents a full-text database of Russian stories, which is divided into so-called synchronic groups (sub-anthologies) according to traditional conceptions of dividing the Russian literature into periods. Sub-anthologies include selected works of the maximum number of writers, active in correspondent literary period. For most prominent writers (Chekhov, Bunin, Kuprin, Gorkij, Sologub, Platonov, Bulgakov, Zoschenko, Shukshin and others) individual author's anthologies are being made. The system of frequency dictionaries is built for the whole Anthology, for concrete literary periods and for works of individual writers. Then each dictionary is exposed to structurisation based on the system of parameters. Being founded on the information contained in frequency dictionary, the statistical distributions of the certain type can be further constructed (depending on what parameters are used in the role of dependent and independent variables). Analysis of the recent scientific works and the results of our own investigations allowed us to determine a rather complete list of parameters, which may be used for description of text (and corpora) lexicostatistical structure. All parameters had been tested, and in the result we obtained the list of statistically consistent parameters, which may be recommended to use for text systematisation and analysis. Digital Anthology of the Russian Short Stories and results of textual analysis, carried out on its material represent significant interest both for traditional researches in the field of the Russian literature, linguistic poetics and stylistics of the literary texts, and for experts on cultural heritage and specialists on new information technologies. The original methodology of texts systematisation and their investigation on lexicostatistical level can be also successfully used for analysis of texts in any language and of various genres (not only literary, but also business, publicistic, scientific texts, etc.).

Martynenko Gregory Y. - doctor of science in Computational Linguistics, professor of the Department of Mathematical, Computational and Applied Linguistics of St. Petersburg State University, Founder and Head of St. Petersburg Stylometrics School.
