Templates for People, Places, and Organizations

Matthew Milner: May 19, 2016 at 23:17

Open Data is great - despite the fact most historians have no idea what it is or how to use it. It uses standard vocabularies, namespaces, and taxonomies to describe data, allowing researchers to move data from one context to another easily. Yet it's also rather complicated for the average humanities scholar (let's be honest here) since it also requires familiarity with data types and formats like XML, JSON, Turtle, etc. These aren't always usually picked up for those with mind towards prose and manuscripts, though they're not difficult, really. When it comes to the bibliographic world, we now have excellent tools for quickly creating lists and generating metadata as needed - Zotero, EndNote, Refworks, Mendeley, are among many that allow users to store citations and documents quickly and easily, and make exporting or creating standard outputs that adhere to Open Data standards rather easy for the end user. But this is on account of a very simple fact: end users don't have to *write* that data - they export data that already exists, and that has been cleaned (ideally) and tweaked, etc. At the click of a button Zotero will allow a user to create an RIS file for any record, something that can be imported and used in other systems and software. There's an example below.

The real issue, for historical scholars, is that while there are tools and Open Data standards for other entities, like agents and places, there are no easy quick tools for creating easily movable data in the same kind of way. More over, there's no real standard that allows scholars to do their work quickly and efficiently, with minimal disruption to their well-honed methods and practices, that also are easily written by hand. In short - there are tools and standards, but the hurdle of use creates real impediments, both to the creation of the data itself, and ongoing scholarship that might not be data-esque in orientation, but could easily be so if the hurdle were removed.

A good example of this is FOAF. Friend of a Friend is a well-established standard for describing biographical and some limited social data. But it is often rendered in XML: someone has to a) know FOAF elements and standards, and b) create compliant XML on the fly. While this might seem trite for a DH-er, the majority of scholars working on women's networks in 17th Century Spain, aren't going to bother creating compliant FOAF of each of their individuals and agents. Doing so would actually be a distraction from their overall research objectives, which are women's networks in 17th Century Spain, not FOAF XML. At the same time, such a scholar is likely creating lists and tables of those individuals and agents, in order to make sense of the network, and to store basic prosopographical data that *could* be rendered in an Open Data format like FOAF. The question is how do we get that data into a useable state, without a) disrupting the scholar and b) without a complex tool or format?

Another important consideration in the age of Open Data is that we need to empower individual historical scholars doing their work in small underserviced and often insular archives. While larger repositories are well on the path to digitization, employing Open Data standards, and even in the case of Europeana share it widely and usefully, the reality is that much historical research takes place in places where such standards and transformations have yet to be realized fully. In many cases, there's also no internet access. Imagine a scholar, in a private archive in a remote village or town in rural Italy, or Maharastra, and you get the picture: great finds, no internet access, and nothing is digital. What to do? In such cases online transcription tools are moot, unless you've got a local version of EndNote or Zotero. But more to the point, a scholar isn't going to want to work on XML when they could be using their valuable energy to sift through materials.

And yet, if we're working towards true Open Data, we need to empower such scholars and give them the tools to take such research efforts and a) critique existing scholarship and data and b) create new Open Data sets of their own.

Nanohistory allows scholars to do just this - but it's dependent on a very simple issue: a standardized template for quickly and easily creating biographical, organizational, and geographical data quickly and easily for scholars that doesn't disrupt their work, and can be used offline, in a variety of ways. The solution has limitations (for instance, you can't make complex network associations), but it's definitely more usable for humanities researchers on the go. The key: data created using the template is easily imported into Nanohistory where the data import process allows quick and easy disambiguation of the new data with that already existing in the platform. Users can add new records or update existing records, linking both to new or existing projects and sources.

Each template employs Nanohistory's fields for the particular entity involved (person, organization, or place), as well as secondary data (dates of birth or creation, death or dissolution, places, external urls, and identifiers). In the XLS template, these define columns (with headers); in the TXT format, they are semi-colon (;) delimited. Here are they are, in case you want to create your own, or you can download the files:

IDPrefix Name, Suffix [Titles]GenderDate of BirthPlace of BirthDate of DeathPlace of DeathUrlsIdentifiersKeywords
23Sir John Smith, of Wessex [Earl of Suffolk|Baron of Wilstone]M1875Basingstoke1931-09?Hong Konghttp://www.johnsmithofwessex.com| http://en.wikipedia.org/john_smith,_of_wessexVIAF:12121213,DNB:31212932British Businessmen, English Barons

Or as a single text line:

Sir John Smith, of Wessex [Earl of Suffolk|Baron of Wilstone];M;1875;Basingstoke;1931-09?;Hong Kong;http://www.johnsmithofwessex.com|http://en.wikipedia.org/john_smith,_of_wessex;VIAF:12121213,DNB:31212932;British Businessmen, English Barons
  • ID: Internal reference ID
  • Name, consisting of
    • Prefix: 'Sir','Lady','Rev.' etc. Optional
    • Name: Forename or Alias Phrase followed by middle names or initials and surname. "John M. Smith", or "A Proud Puritan"
    • Suffix: Epithets or other information, like 'The Younger', 'of Wessex' - but not titles!. Optional
    • Titles: Placed in square brackets following the name, these indicate an office or station of some kind, such as "King of England","Abbot of Circencester" etc. Delimited by a pipe |. Optional
  • Gender: M, F, O (for other), or U (unknown), or blank
  • Dates: YYYY-MM-DD, month and day are optional. Ambiguous dates noted with?
  • URL: Delimited by a pipe |
  • Identifiers: Identifiertype+:+[identifier], e.g. "VIAF:12121213", delimited by comma. Identifier Types must be included in Nanohistory prior to importing.
  • Keyword: Comma delimited phrases
IDNameTypeDate of CreationDate of DissolutionUrlsIdentifiersKeywords
241City and Corporation of BasingstokeCivic1675http://www.basingstoke.gov.uk |http://en.wikipedia.org/basingstokeVIAF:3124321British Cities, English Towns

Or as a single text line:

City and Corporation of Basingstoke;civic;1675;;http://www.basingstoke.gov.uk|http://en.wikipedia.org/basingstoke;VIAF: 3124321; British Cities, English Towns
  • ID: Internal reference ID
  • Name: String of organization name, e.g. "Society of Antiquarians of Bristol"
  • Type: One of the designated organization types in Nanohistory. Most common are: civic, society, religious, judicial.
  • Dates: YYYY-MM-DD, month and day are optional. Ambiguous dates noted with?
  • URL: Delimited by a pipe character |
  • Identifiers: Identifiertype+:+[identifier], e.g. "VIAF:12121213", delimited by comma. Identifier Types must be included in Nanohistory prior to importing.
  • Keyword: Comma delimited phrases
23Stratford-upon-AvonWarwickshire England52.19180/-1.70800http://en.wikipedia.org/wiki/Stratford-upon-AvonGEONAMES:2636713English Towns, Warwickshire

Or as a single text line:

Stratford-upon-Avon; Warwickshire England; 52.19180/-1.70800;http://en.wikipedia.org/wiki/Stratford-upon-Avon; GEONAMES:2636713; English Towns, Warwickshire
  • ID: Internal reference ID
  • Name: String of placename without parent or other information
  • PartOf: String denoting the most recent parent-place
  • Latitude / Longitude: Decimals for a point. Complex polygons are permissible, but must follow Nanohistory guidelines.
  • URL: Delimited by a pipe character |
  • Identifiers: Identifiertype+:+[identifier], e.g. "VIAF:12121213", delimited by comma. Identifier Types must be included in Nanohistory prior to importing.
  • Keyword: Comma delimited phrases

Any missing data should just be left blank. In a case where there's only a name, for instance, you may have a name defined, but then a row of seven semi-colons to indicate missing information:

John Smith;;;;;;;

I'm working on a limited internal event / referencing version which will add another column or field following keywords that allows users to create internal references within a single dataset using the ID field. This is complicated by two issues at the moment: first, that the importing tool becomes fairly sluggish with more than 300 records to parse at a given time (700 records take about 30 seconds to render, and returns about 5MB of data for some reason), and second, the IDs need to be internally referenced to a particular set being imported. In other words, you can't reference existing data within Nanohistory, or data within another subset of the data you're importing, only records that are also being imported at the same time.


22 June 2016 update: I had to change the file format for Excel worksheets to XLSX due to problems with parsing. The new links reflect the changes.