The fall break started two days ago, and I have had
just the leisure to get back to writing a short Python script. I
have been working on this project for a while, but as a newbie I just take
steps forward pretty slowly. The script I am working on is supposed to analyse any
text but actually every modification I introduce into it is the result of the
problems I face when I run the script to analyse quantitatively the quarto
edition of Shakespeare’s Much Ado About Nothing.
I am wondering if you have to tune the script for every text. But then this
would mean that comparing different texts would be impossible. This, however,
would lead too far, so instead of this let me mull over a specific problem.
In this post I am going to share one type of insight
into the text that I have gained when working with the quarto text of Much Ado About Nothing. When running the
script I encountered a problem. This problem concerns the hyphens in the text, insofar
as words divided at the end of lines with a hyphen were counted as two separate
words. To overcome this problem I tried to remove these hyphens from the end of
the lines automatically, but then I ran into a further problem: the machine
either removed them simply but left the words divided without a hyphen, and
this was no good, as they remained two separate strings. Or if they were
removed and the two halves of the words were united, this was no better either, because then the two lines
in which the two halves were located became united, too, and this resulted in
the distortion of the number of lines. So finally I removed the hyphens and
united the words manually so as to avoid the unification of lines. The manual
unification of words was beneficial on a further account as well, as I could
make a decision on an individual bases in which line the word was to be placed.
When working on this task, which did not last long, it
took approximately 15 minutes, I noticed that actually compound words divided
with hyphens appeared in mid-line position as well. So what I did next was
writing up a short script to collect all these instances of compounds separated
with a hyphen, count the number of lines where there are instances of this and
also count the number of lines of the play. Once having these numbers I also
counted the relative frequency of the lines in which compounds appear.
Compound words
divided with a hyphen in the order of appearance in the quarto edition of Much Ado About Nothing are the following:
['turne-coate,'], ['Hare-finder,'],
['Ballad-makers'], ['warre-thoughts,'], ['ouer-heard'], ['March-chicke,'], ['start-vp'],
["heart-burn'd"], ['mid-way'], ['ouer-masterd'], ['day-light.'], ['Schoole-boy,'],
['ouer-ioyed'], ['tooth-picker'], ['sun-burnt,'], ['working-daies,'], ['loue-gods,'],
['kid-foxe'], ['night-rauen,'], ['out-rage'], ['ouer-heardst'], ['hony-suckles'],
['heare-say:'], ['wood-bine'], ['bow-string,'], ['hang-man'], ['tooth-ach.'], ['tooth-ach.'],
['Dutch-man'], ['French-man'], ['lute-string,'], ['tooth-ake,'], ['hobby-horses'],
['Ote-cake', 'Sea-cole,'], ['Sea-cole.'], ['Hot-blouds,'], ['worm-eaten'], ['cod-peece'],
['gentle-woman,'], ['night-gown'], ['Sea-cole,'], ['eie-liddes'], ['ouer-whelmd'],
['candle-wasters:'], ['tooth-ake'], ['milke-sops.'], ['out-facing,',
'fashion-monging'], ['trans-shape'], ['vnder-neath,'], ['gossep-like'], ['Lacke-beard,'],
['grey-hounds'], ['carpet-mongers,'], ['witte-crackers'].
It seems that out
of the 2589 lines of the play, hyphenated compounds appear in 54 lines, and in
two lines there are two of these compounds, so altogether there are 56
hyphenated compound words in the text. The relative frequency of the lines in
which there are hyphenated compounds is 0.0208574739282 . Furthermore, as there
are 22, 171 words in the text, the relative frequency of hyphenated compound
words in the texts is 0.00252582201976.
Now why are these
numbers important? The significance of these numbers can only be gauged if
compared to another text, to other texts, because then a pattern may emerge. But
then what kind of texts are to be compared and contrasted to. Those of
Shakespeare? Or those of the printer? If Shakespeare’s, only the quarto
editions, as these are close in time, or all the early prints, i.e. the First
and Second Folios as also books of the same period or only those early printed
editions that go back to some form of a manuscript, as Much Ado About Nothing, because then these may reveal something
about Shakespeare? Or only those that were published by Andrew Wise and William
Aspley, as they were the publishers of the quarto edition of the play, or those
that were printed by Valentine Simmes, as it is his employees who created the
printed text in the final analysis? Or in reality these features do not have anything
to do with Shakespeare but rather with the publishers, i.e. Wise and Aspley, or
the printer, i.e. Simmes, and these features should be compared only to books one
of these parties printed and not necessarily authored by Shakespeare, as they
are the people who are responsible for the text that we can witness nowadays.
In other words is this statistical analysis related more to studying the
history of the book, or the history of spelling than to studying Shakespeare? Answering
these questions might be unavoidable when looking for texts to compare the
quarto of Much Ado About Nothing to.