Dorothy Pennyman’s Cookbook 1698-1754
How my first stab at Python took me on a trip through Shakespearian English via A medieval lady’s kitchen
Back in 2020 I began playing around with different interfaces and digitization projects, including the Early Modern Recipes Online Collective (EMROC) project. EMROC gathers and transcribes handwritten recipe books from the early modern period into a searchable database. This database lives within the Folger Library in LUNA. Using this resource, I wrote and produced a podcast and wrote an essay about the ways in which technology mediates food culture.
For the podcast, I interviewed historian Mairi Cowan from the Department of Historical Studies at UTM, who is involved in this semi-diplomatic transcription process. This is the first layer of data curation; a human being needs to read the texts, write them down, and enter them into the database. As Professor Cowan explained, it’s mostly objective, but as in any collection of data, when humans are involved, there are decisions to be made as to what is important. For example, do we annotate that a particular recipe, it’s pages flat and clean, seems to have been never used, or that another was so covered with marks that it was obviously a favourite? As someone interested in food history, I think this is important information, but these points were not noted anywhere in the transcription notes.
The content of the writing, in this case, superseded the importance of the object itself, and the additional context the physical object could provide was not captured, at least not by using the search function. So what I would be working with was just the words, although images of the scanned pages are available on the website, providing another digital resource to further interpret the results of the data study later.
In preparation for my next project – in which I used Python to tokenize another medieval recipe text – I spoke to Elspeth Brown, a scholar of queer history and the Director of the Digital Humanities Network at the University of Toronto, and the Associate VP, Research at UTM. I wanted to know how scholars see databases of this kind, and what ways they could be useful beyond the study of food culture. She and I, in speaking agreed they are very valuable, and so I decided to tackle a section of the EMROC database to perform some text analysis, and discover what, if anything, could be learned from the writings of one woman, named Dorothy Pennyman, between 1698 and 1754.
The scope of this particular project – the analysis of one text, 80 or so pages out of out of a corpus of 2731 – seems small. I’m not sure what this small amount of information, from one person, will show, but I think diving in is a good start to create some systems for doing more, with more manuscripts.
One of the limitations of the Luna collection is that it is searchable, but the transcripts are not downloadable. To obtain this transcribed book, I had to go into each page and copy and paste the transcription into one document to create my text file. To do this with 2731 pages in the corpus would have been, obviously, prohibitive. So while the results may not provide as much information as would an entire library of writing from female cooks in the 17th century, it will give me the framework and the code to apply it to the entire corpus when it becomes available for download, a project presently in the works.
EMROC is also adding more and more texts to their transcription schedule as they become available. I consider mine to be a beta project, and plan to create a system for tackling the text and developing some code to analyse the entire collection in the future, perhaps for another digital humanities project.
As the authors Catherine D’Ignazio and Lauren Klein of Data Feminism have noted, what gets counted, counts. The reason we have so few female-authored books and manuscripts from the early and late modern periods to work with is because the study of food and cooking from a female perspective is has not been a priority. The recipes of famous male chefs Careme and Escoffier were coveted, but a housewife’s handwritten recipe book was never considered valuable to anyone but her and maybe her family. As a both a woman and a person with a keen interest in food culture and history, I’m interested in what was going on behind the closed doors of Dorothy Pennyman’s kitchen, and if what she was making, and what she wrote about, could tell us more about who she was. When I can analyse the full corpus, as well as texts from other sources, I want to research if food culture is influenced to a greater degree by the domestic practices of women at home than by food trends or the work of famous, mostly male, chefs.
From the manuscript, I’d like to know:
What ingredients were available to her and what did she use the most?
Where did these recipes come from?
What was her economic status and place in society?
Was she, and her family, eating in a balanced way?
Did any of these change over time?
I also want to look specifically at the words she used. During the transcribing process (known as transcribathons), the organizers throw out some hashtags to share on Twitter during the event. One of these was #feministOED. The idea was to look at words/ingredients/dishes that were being used by women in their writings but were not represented in the OED until much later. A quick and disappointing search on Twitter resulted in only a few entries, but this could be either from a lack of “female-usage only” words, or the fact that transcribers don’t tweet when they’re busy reading Medieval texts! Cross checking a list of words with the OED will yield a more scientific result.
Observations
POS limitations: POS tagging was not great at picking out nouns that were food ingredients. After tagging the words and manually picking out any nouns that were food words, I used Excel to compare that list to the overall cleaned text. I found an additional 122 words that POS did not catch
Spellchecker: I know I can build a spellchecker within Python, but without one, I had to manually edit my food list to account for different spellings. For example, anchovy was listed six different ways, under “anchovie”, “anchovies”, “anchovy”, “anchoves”, “anchovy”, and “anchoveys”, each with their own count. There were also 5 spellings of “caraway”. It’s interesting to note, the varied spellings as transcribed give context when looking at the original manuscript but become a problem when analysing the words as tokens. I made the decision to choose the spelling that had the most occurrences, because a word written one way 20 times while another written once, probably means an error in transcription. A more balanced divide could mean a word changing over time, or two people writing in the book, each spelling it a different way.
Plurals and Quantitative data: In recipe writing, a recipe may say “Gooseberry Pie”, and then the ingredients list “1 pound gooseberries”. In this case, it doesn’t really affect the count result too much, because it’s easy to see the difference. You would probably never prepare 1 gooseberry. In other cases, it does matter, as in, “Beef Pie” which calls for “1 pound beef”- 2 instances of the word would be counted.
Language: Doing this type of work on a text provides interesting insights into language.
Why in in English we say, “gooseberry pie” and not “gooseberries pie”?
The transcriber: As I was downloading sections of the manuscript, there were clues to when another transcriber had taken over, such as the way the way the text was formatted. Interesting!
Knowledge of the subject: I found it very helpful to have culinary knowledge when working with this text. Recipe writing is its own language with its own syntax, and knowing a codfish from a codlet, or what a comfit is, made the text easier to work with.
Conclusion
I didn’t get all the answers I wanted, but I leaned much more about the ways in which a manuscript like this can be recorded and stored, and about how different things (a table, a bunch of tweets, a recipe, a story) need to he handled in different ways. I think a combination of the transcripts and clean texts used together, would get me closer to the answers.
My ambition for what I wanted to do was greater than my coding skills for making them happen. The final Jupyter notebook contains very simple coding on this text, but that’s not to say I didn’t do a lot of coding. I spent a lot of time researching, trying various strategies, and looking back on all the exercises and notes from a class I took in Python. In the end, I wanted to present code that worked, and some visual representations of the foods most represented in Dorothy Pennyman’s book, including the circle map of ingredients I made using Tableau, at top.
I would really like to take this further, answer more of my research questions, and learn enough coding to make this work more efficient.
The corpus I chose needed a lot more work on it to make it usable, such as picking out the specific nouns I wanted to look at. This project taught me a lot about what tools to use, and how to better tackle raw data. Not everything comes as a neat .csv file.
The source itself and its usefulness in text analysis became the focus of my questions. As it turns out, the scanning of the pages and their transcriptions are only the very first step in creating something that is useful for textual analysis. I have a few thoughts as to why.
Prose vs instructional writing. This manuscript and the Shakespeare corpus were written around the same time and using old English. I believe there is a difference between prose and something instructional or technical, like a recipe. The language is different, and there are not the same language patterns.
POS tagging was not very accurate. I speculate the raw text, with old English words and multiple strange spellings, made it difficult for it to work.
I had to make a lot of subjective decisions. For example, as I went through the list of nouns, I had to decide:
Which spellings of the words to keep
Whether to combine the plural instances of the words with the singular
If something was actually an ingredient, e.g. charcoal is for cooking over, but also used for purifying
Whether mistakes in spelling were the author’s, or the transcriber’s
One application does not fit all. When analyzing texts like this, and I think recipes in general, one application cannot do everything. I was limited as to what I could do in Python, and the text needed processing in other applications for me to use it there. I used excel which was actually quite efficient in cleaning the text to use in Python
Transcription isn’t enough. The database is searchable, and LUNA is working on working spelling variations into their search engine, but more needs to be done to create different layers of usability to make these texts useful for different researchers. The text written word for word as it was written by Dorothy Pennyman is useful, but I envision another transcription where spelling is made consistent, missing words filled in, and the text “cleaned up” so it reads as a modern document. This would provide a purely textual object that would work better for text analysis, allowing for more study of the words as objects, with less context.