Planning the Database

Importing my data from the spreadsheet into a new Filemaker database took planning, creativity, and a change from “thinking in a spreadsheet” into “thinking in a database”—in other words, thinking relationally. From the start of the database’s planning, I considered what both my immediate and long-term needs for the database were, and thought about what would be the most flexible and intuitive way to relate my data. In this initial stage of brainstorming, I determined that I would want the database to fulfill the following tasks:

1) Keep track of information about each source, namely:

  • Date: when was the document issued?
    Need to be able to identify year/month/day, date ranges, and ambiguity
  • Issuing agent: at whose behest was the document issued?
    There can be one or more parties per document, with a variety of identities: people, institutions, cities, and so on
  • Receiving agent: who was the intended recipient of the document?
    Tracks the same information as for the issuing party
  • Origin: where does the document come from?
    Sometimes the place is not given, there are multiple places, or there is ambiguity
  • Destination: where is the document sent?
    In most cases, not explicitly noted by the editors, but certainly can be ascertained by examining each document
  • Document type: what kind of document is it?
    Medieval documents can usually be categorized according to scholarly typologies: charters, letters, privileges, and so on
  • Image: what does the document look like?
    Whether as a photo, scan, or link, an image of the document can be useful in the long-term for many reasons: transcription, illuminations, script, and so on

2) Be able to incorporate future sources from archive, a key utility for future research on the dissertation and beyond:

  • Archival sources will have reference numbers to track, possibly relating to several documents at the same time
  • Scanned images and/or transcriptions, since the sources will usually be unpublished
  • My notes and research organization will be in English (as opposed to German, which I mainly stuck to for consistency with the scraped data)

3) Be able to incorporate future sources from published collections other than the PrUB.


4) Keep track of notes

 

Having determined these tasks, the database’s skeletal structure began to become clearer. In particular, two organizing principles emerged: first, I was able to translate the proposed functions into unique, related categories or, in the terminology of the database, tables; and second, the database’s structure would ultimately revolve around the document, which would form the main “trunk” from which the database’s various branches (the tables) would stem.

With these two organizing principles in mind, the next step was to characterize the relationships between each table and the central document, as well as relationships between certain related tables. Outlining these relationships was a necessary step in the database’s creation in order to determine what pieces of information were simply attributes (fields)of the tables (rather than tables themselves). Sketching out the relationships also indicated one-to-many and many-to-many relationships.

An illustrative example of this last step is the relationship between the Document, its Origin,and its Destination:

Document and Origin:

  • Each document comes from one place
  • An origin can be associated with multiple documents

Document and Destination

  • Each document has one or more destinations, even if this is not indicated
  • A destination can be associated with multiple documents

Origin and Destination

  • Each origin can also be a destination, and vice-versa
  • The origin and destination can be the same for a select document

These observations, simple as they are, in fact reveal key information about these categories. In particular, there are three pieces of vital structural information that emerge. First, the interchangeable nature of Origin and Destination indicate that it makes the most sense to categorize both under the umbrella of Places. Second, it follows from here that a document can be associated with multiple places, and a place can be associated with multiple documents: in other words, this is a many-to-many relationship that will need a join table (which we will call DocumentPlaces). And finally, designating a place as a document’s origin or destination is specific to the relationship between a particular document and its associated places. In other words, if the join table DocumentPlaces tracks the unique relationship between a specific document and its associated places, the indication of a place as Origin or Destination can be tracked in the join table as an attribute of the relationship.

As it happens, the relationship between the Document and Places resembles the one between the Document and its associated Agents: any agent can create or receive a document, and their designation as a Creator or Recipient can be tracked as an attribute in a DocumentAgents join-table.

Walking through the planning process gives a sense of the complexity involved in designing a relational database, even for a relatively simple project. In the end, the structure that resulted is represented in this ER diagram, which formed the basis of my Filemaker database. In fact, the image below is taken directly from Filemaker, and thus visualizes the skeleton of my database.

1.4.JPG

The ER diagram

First, observe the overall structure of the database and the relationships among various categories: all information is ultimately related to the central Document, giving the database a “hub and spokes” structure. Second, note that each piece of information is assigned a unique ID. The ID is a serial number that Filemaker generates automatically when the data is imported. You can see the Document ID in four different tables, which illustrates the relational nature of the database. That is, each of the join-tables track information directly related to a select document, indicated by the Document ID. The Agents and Places tables, on the other hand, are separate categories populated with data (namely, agents and places) independent of the documents. The relationship of agents and places is tracked via the join-tables. In fact, the DocumentPlaces table is a good example of the utility of a join-table (which, to repeat, represents the specific relationship between a particular document and its associated places of origin and destination). Attributes of this relationship include a place’s relationship to the document (as origin or destination) and “qualifiers” (for instance, a document might come from “nearby” a certain place). Planning, understanding, and populating the join-tables was perhaps the most difficult task in creating the database.

by Patrick Meehan