A Master of the Polish Word
In his lifetime, the writer and artist Bruno Schulz witnessed the reigns of power change hands five times in his hometown of Drohobych. Today, Drohobych is a town in western Ukraine. However, for over one hundred years it was located in the region called Galicia that was under imperial control of the Austro-Hungarian Empire from 1792. For a year in 1919 it became part of the West Ukrainian People’s Republic, and then the Second Polish Republic for 20 years until 1939. After the Soviet invasion of Poland it was part of Soviet Ukraine until it came under Nazi Germany control after the German invasion of the Soviet Union in 1941.
Changing tides of power was not new to this region. Evolving borders and alternating sovereignty characterized the eastern “borderlands” of Europe for hundreds of years. While I have blogged about the tumultuous history of this region and about the region’s diversity before, I have not yet considered the cultural impact of these changes. Taking an in-depth look at the small body of Bruno Schulz’s work is an interesting way to do so.
Schulz’s identity was as complicated as the political dynamics in his hometown. Schulz wrote his only two volumes of stories in Polish, and despite his small oeuvre many consider him a master of the Polish language. He was born to a Jewish family and was immersed in Jewish culture, and though he was fluent in German he did not know Yiddish—a rarity for Jews in the region. However, Schulz’s complicated identity was not unique for the time. Like many cities in the “borderlands”, Drohobych was a miniature Tower of Babel — inhabitants included not just Jews and Poles but also Ukrainians, Germans and Russians, many of whom spoke several languages on a daily basis. The plaque that remains on Schulz’s childhood home speaks to this linguistic and cultural diversity:
In Ukrainian, Polish and Hebrew the inscription reads: “In this house between 1910-1941 lived and worked Bruno Schulz, an artist and writer and a master of the Polish word.”
Bruno Schulz’s life was cut short in a concentration camp outside Drohobych. Schulz had just finished creating a mural in the home of a Gestapo officer who recognized his artistic talent. Another Gestapo officer killed Schulz in response to the first officer’s killing of his own “personal Jew”.
Schulz left behind only two small collections of short stories entitled Sanatorium Under the Hourglass and Cinnamon Shops (published in English as Street of Crocodiles). He also left a number of remarkable drawings. Like his writings, they are evocative, unsettling and deeply erotic. His drawings feature recurring themes. A common one features indifferent women set in the foreground in a place of power while ghoulish male figures in the background gape and ogle desiringly. You can see across these drawings the artist’s own face amongst the crowd of contorted male faces.
Just as his drawings feature common elements, there are characters and themes that reappear throughout Schulz’s stories. His father appears in many of the stories. His father was an eccentric man who collected rare bird specimens. His stories use his father’s disturbed mind as a way to cast his stories in a surreal light. Adela the house servant also appears throughout his work, often in an erotic context. I do not think it is a stretch to say her image appears throughout his art, and that Adela is the same detached muse at the center of his drawings. Here is one arresting passage, in both the original Polish and English:
Adela wracała w świetliste poranki, jak Pomona z ognia dnia rozżagwionego,
wysypując z koszyka barwną urodę słońca lśniące, pełne wody pod przejrzystą
skórką czereśnie, tajemnicze, czarne wiśnie, których woń przekraczała to,
co ziszczało się w smaku; morele, w których miąższu złotym był rdzeń długich popołudni…
Adela returned on luminous mornings, like Pomona from the fire of the enkindled
day, tipping from her basket the colourful beauty of the sun: glistening wild
cherries, full of water under their transparent skins, mysterious black cherries
whose aroma surpassed that which would be realised in their taste, and apricots,
in whose golden pulp lay the core of the long afternoons…
Though his body of work small, some of his stories are among the best I’ve read. They elevate his small town of Drohobych to a place both surreal and sublime. To grow more familiar with his work, I’ll now use natural language processing (NLP) to peer even deeper into his writings. In true spirit to the multi-linguistic nature of his writings and upbringing, I performed the analysis using the original Polish text as well as a translated Russian text.
The writings in the original and translated texts are here. The script I used to scrape the text is here. I performed the analysis in a jupyter notebook here. Many of the mechanics of NLP that I will relate here you can also find in that notebook. The code used to produce the plots can also be found there.
NLP Investigation
Text Processing
Before analyzing any text, we must do some data cleaning. The first step is usually to remove any punctuation. The second step is removing so called “stop words”. Stop words are commonly found words in language. While there is no pre-defined way of identifying these words, there are various collections online that allow us to remove common words (such as articles and pronouns) so that we can focus on words that convey greater meaning.
Here is a collection of English stop words:
'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because'
While here is a collection of Polish stop words:
'a', 'aby', 'ach', 'acz', 'aczkolwiek', 'aj', 'albo', 'ale', 'alez', 'ależ', 'ani', 'az', 'aż', 'bardziej', 'bardzo','bedzie', 'bez'
And a collection of Russian stop words:
'c', 'а', 'алло', 'без','будем', 'будет', 'будете', 'будешь','бы'
Besides removing stop words and punctuation, there is another important step in preprocessing text for analysis. This step is called normalization. In any language, the same word takes many forms i.e. “smile”, “smiles”, “smiling”, “smiled” in English. There are two common ways to standardize language:
Stemming
Stemming reduces words to their root word (‘stem’) often by removing any endings or prefixes to the base word.
Lemmatization
Lemmatization transforms words to a common base word.
For example, take the English word “create”: The lemma is “create”, but the stem is “creat-”. Lemmatization yields better standardization in my opinion because the result of the normalization is itself a real word. “Creat-” isn’t a word, but “create” is.
This is really important for Slavic languages like Polish and Russian because a single word can have a great number of endings. Linguists call such languages “highly inflected” languages. For example, for the base word “jabłko” i.e. “apple”, the word can take several different forms depending on the context of the sentence. It can take these forms: jabłko, jabłku, jabłkom, jabłka, jabłek, jabłkach, jabłkiem, jabłkami
. Lemmatization allows us to link any forms of the word jabłko
to the base word instead of treating these different forms as different words.
Textual Statistics
If we treat all 26 of Schulz’s stories as one corpus, we can retrieve some interesting statistics. Had Schulz wrote his works on a Macbook Pro as I am now, he would have pressed the keys of his computer 333905
times! Not including stop words, he typed 36509
words of which 10043
were distinct. Here is a plot of total words by each story:
If we divide the total number of words by the total number of distinct words, we come up with a statistic called lexical richness
. For the whole corpus, we calculate a lexical richness of about 18%
. This means that roughly 1 in every 5 words that Schulz typed was a word he never used before! This is surprisingly high. However, the sample size is relatively small compared to other writers with larger cannons. Further, Schulz is known for using creative words to detail his stories, and this statistic attests to this originality.
These original words—ones that are only used once in a work or body of work—have the technical name hapax legomena
. After removing stop words, there are 5069 of these words in the whole corpus. Here are a few examples of these words: zgryźliwy, zmurszeć, brzęk
. These translate roughly to the adjective snappy
, as in cranky, the verb to moulder
, as in to decay, as well as the onomatopoetic noun clink
. Writers rarely use such specific words, and in Schulz’s case he used them only once.
Lexical Insights
The most common words Schulz uses across his work are also revealing. The most common is a pronoun meaning ‘me or mine own’, a nod to the author’s first-person narrative style. The second most common is the word for ‘father’, which as we will see is one of the most frequent recurring characters in Schulz’s stories. ‘Dzien’ and ‘noc’, or ‘day’ and ‘night’, are also common. The contrast between day and night are common themes in Schulz’s work. Further, we have two parts of the human face: eye (‘oko’) and face (‘twarz’). Most of the stories are character studies of individuals, and Schulz goes to great lengths to describe the changing expressions of his characters as they wrestle with their inner anxieties.
Another interesting linguistic term is collocation
, or a sequence of words that co-occur more often than one would expect by chance. Here are a few notable pairings: mój ojciec, rzecz dziwna, wyrzucać siebie, szklane drzwi
.
These pairings reveal a great deal about the content of Schulz’s stories. “Mój Ojciec” means my father, and as we’ll see many of the stories are about the author’s father. “Rzecz dziwna” means “a strange thing”, and often his stories center around surreal phenomenon.
The last two are more abstract. “Wyrzucać siebie” is actually “wyrzucać z siebie”, which roughly translates to “blurt something out” or to let something out that you’ve been straining to keep inside. This small phrase gets to the heart of Schulz’s writings. His stories feature characters that have pent-up anxieties that sizzle and occasionally erupt, as when his father descends into a fit of madness in Nawiedzenie (A Visitation)
. Finally, szklane drzwi
simply means “glass door”. In his story Wiosna (Spring)
, a glass door symbolizes the entrance to the mysterious world inside one’s head. Everyday items such as chairs, doors, and desks play a large symbolic role in Schulz’s works. The everyday items contrast the surreal images of the imagination. Doors recur often and contrast the safety of one’s room to the chaos of what exists outside.
Word Dispersion
While Schulz uses many novel words, his stories involve only a small number of characters. His family members recur throughout his works. To understand when and how frequently Schulz mentions each character, we can look at a lexical dispersion plot which traces when a word appears in a text. Since Ojciec
is the second most common word, one might guess he would appear most often. The following plot bears this out:
Mother (‘matka’) appears frequently through his work, while uncle (‘wuj’) and ciotka (‘aunt’) recur but to a lesser extent. A notable case is Adela the house servant, who I mentioned earlier as Schulz’s muse. Together with his father and mother, Adela recurs the most often throughout the entire body of work. There are entire stories devoted to her interactions between her and his father, and you can see clearly that her name appears in the text often in conjunction with the word ‘ojciec’.
To figure out exactly in which stories these characters appear, we can plot vertical lines when each story begins. While ‘ojciec’ appears often, we can see that he does not appear in every story. For example, none of these familial characters appear in the longest story (‘Emeryt’):
At the bottom of the jupyter notebook you can find an analysis of the text translated to Russian. I could not find a translation of every story, but it’s still interesting to see which words are more and less common in the translations than in the originals. It was actually easier to perform lemmatization in Russian than in Polish since there is a specific package devoted to this in Russian.
Conclusion
In performing this analysis I gained an even deeper appreciation of the work of one of my favorite writers. Each writer has their own style and signature, and it’s through data science that we are able to at scale investigate these sensibilities and bring them to light. In a sense, I view NLP as analyzing the nuts and bolts of an artist’s work to reveal the patterns that even the writers themselves don’t know about their work.
However, I also like to remember that works do not descend from the ether. Though individuals produce works that we can universally enjoy and relate to, like all of us great artists are just the result of a set of specific circumstances. When zooming in on the minute, it’s equally important to pull back and view Schulz’s work in the context of the whole gestalt of the age. Schulz is very much a byproduct of a time—a uniquely tumultuous and anxious one—and a place—one characterized by cultural and linguistic diversity. It is unsurprising but still fascinating to see how this existential apprehension and cultural variety impacts the themes of his work. It is perhaps not a coincidence that an enigmatic mix of identities and circumstances produced one of the most enigmatic literary figures of world literature.