Overview

1.1.JPG

Volume 1 of the Preußisches Urkundenbuch, containing about 331 documents ranging in date from 1140 to 1257

I. Overview: Obtaining and Formatting Data from Published Documentary Evidence

One main goal of my project is to map data drawn from the published documentary sources related to my period and area of study: Prussia during the thirteenth, fourteenth, and fifteenth centuries. Documents dating from the early end of this timeline are relatively scarce, but they grew in number over the course of the thirteenth century in correlation with the expansion of Christian presence and administration until the explosion of document-production in the fourteenth century. Two major explanations underlay this change. First, the Teutonic Order moved its headquarters from Venice (where it had been temporarily headquartered following the fall of Acre in 1291) to the castle of Marienburg, the heart of its Prussian holdings. Second, the advent of paper in European documentary culture around the year 1300 precipitated an exponential rise in the number, density, and types of documents produced.

A portion of the Prussian documentary evidence has been published in volumes compiled by German historians over the course of the late-nineteenth and early-twentieth centuries.[1] For my purposes, extracting information about these thousands of documents from the published volumes will provide a representative sample set to experiment with before diving into the unpublished material lying in wait in the archives.

1.2.JPG

Complete with difficult-to-read late-nineteenth-century Gothic print!

 

 

But, the problem: this wealth of published material is contained in old leather-bound volumes with pages that look like this:

1.3.JPG

Not perfect, but a good start...

 

 

So, how do we get a product that looks like this?:

In the remainder of the essay I will explain how I got to the final product with a little cheating (scraping available online material using OutWit Hub) and with a lot of hard work (formatting the scraped data using regular expressions and tenacity). 

 


[1] A. Seraphim, M. Hein, and E. Maschke, eds., Preussisches Urkundenbuch, vol. 2, 2 vols. (Königsberg: Hartungische Verlagsbruderei, 1882): two volumes in the series.

by Patrick Meehan