Dates and Dating

Matthew Milner: October 23, 2017 at 11:32

You'd think that historians would be experts on time and chronometry. While most are able to rattle off dates significant to their particular objects of study, time itself often appears only as an element of the framing of an historical study or narrative. Rarely do historians discuss time in ways that allow or permit the comparison of time across cultures, let alone historical phenomena. It becomes even more messier when historians talk about epochs or periods themselves as discernible historical "events", as if the mere declaration of a start and a stop some how automatically circumscribes a bona fide and worthwhile object of scholarly attention. What constitutes a temporal start or a stop, moreover, is contextual - not just in terms of the later at hand (as in picking an event that denotes a boundary), but in their granularity or specificity. Are the points that bound historical phenomena best described using seconds, hours, or even a year or a month? Even a second is a span of time, rather than a point. Second to all of this is the need for a constant or stable dating scheme which allows for cross-cultural comparison of time.

Nanohistory uses Julian days for dates alongside datetime formats (YYYY-MM-DD) and free text. Like the Unix timestamp, which calculates seconds to or from 1 Jan 1970, Julian days provide a stable integer-based solution for handling dates outside of usual culturally-deterministic calendars. While the Unix epoch is limited by its format to around 137 years (see https://en.wikipedia.org/wiki/Year_2038_problem ), Julian Days are the basis for astronomical reckoning of time, and have persisted across calendrical changes and revisions since the Roman Empire. Programmers have long used them to as a solution for cross-calendrical dating solutions. They're currently the basis for astronomical chronometric systems like the Truncated Julian Day (TJD) or the Heliocentric Julian Day (HJD). Julian days are geocentric; the date changes at noon, rather than midnight; and day subdivisions are stored as decimal points. This produced a floating number, in computational terms, which is much more friendly for comparisons and calculations than standardized calendrical date formats.

This precision allows Nanohistory a more flexible approach to dating than most humanistic research platforms. Rather than forcing users to assert full dates a single day as an indicator for a month or year (say the 1st or 15th of a month or January 1 or June 30th for a year), if needed, Nanohistory calculates dates as ranges. At the moment, it does not store subdivisions of days, but could easily do so if needed by users. Nanohistory's calendar handling is powered by Php's Julian Day functions, and an extension created by http://www.ar-php.org/ for Islamic dates. We're using the tabular Islamic Calendar variant, rather than astronomical. On the front end (in the rare instance we process dates client-side) we're using Fourmilab's calendar converter.

When a user declares a date for an entity, the data is processed as follows:

If there is an 'original date' or free text 'date' field, it is stored separately.
A year is required, but months, and days are not. Users must select a calendar for the date. These date parts are used to build a date which is compliant to the standardized format of YYYY-MM-DD If a month and day are provided, they are, it is used. Otherwise, the default month is set to 01, as is the default day, as needed. Date precision is cascading: It is not possible to provide a day without providing a month.
At the same time as the default date is calculated, a default range is also calculated to create Julian dates. If a user provides a full date as YYYY-MM-DD, the Julian date is calculated using the declared calendar. If a date portion is provided, the highest and lowest Julian dates are calculated using the same date. If a year is provided, without months, the lowest date is YYYY-01-01; while the highest date is calendar dependent (usually YYYY-12-31 between YYYY-12-29; French revolutionary calendars are different).
All of this data is stored in a separate table, which declares whether the date is a start or end date. The original or free text data is stored as a string; the standardized date as a date format, the Julian date for the standardized date as a float number, and the low and high values as a hyphen-delimited string. If no original or free text date is provided, a string is created from the portion of the date provided by a user, e.g. 1903-09 renders Sept. 1903. Lastly, users can declare the date type, noting if it is exact, ambiguous, or some other indicator like a floreat date for a person.
When dates appear in Nanohistory, users see the original or free text version of the date. The code, however, employs the julian days, and the calendar to filter, and transform data as needed for various visualizations and presentations.

Dates are an important element of Nanohistory's data validation model. Rather than storing endless re-occurrences of duplicate events or historical interactions, dates filter when a particular event is 'valid' or 'usable'. To put it another way, Nanohistory does not store each separate time 'John Smith'->'bought'->'cow': it uses dates to turn an event 'on' or 'off'. This allows for more precise tracking of ambiguities around John Smith's purchase of cattle.

Timelines

This dating method allows Nanohistory a much richer, and more nuanced approach to cross-calendrical timelines. Data for each timeline (like those in the Webs or Maps tools) is processed according to the calendar declared in the timeline itself. Users can 'flip' in between calendar systems using the same data, regardless of which calendar the data was originally stored.

The other critical aspect of dating in the data validation model is how storing dates as ranges offers four filtering options for timelines. The usual approach to timeline dates looks something like the following:

The six scenarios for timeline events are:

Begins before the start date, and ends after it
Begins before the start date, and ends before it
Begins before the end date, and ends after it
Begins after the end date, and ends after it
Begins after the start date, and ends before the end date
Begins before the start date, and ends after the end date

Usually filtering for dates means taking the start and end dates of a query in a standardized format of YYYY-MM-DD and walking through each scenario, comparing formated dates. Nanohistory's method, however, expands the options for which dates can be used:

Narrow: the highest possible julian start date, and the lowest possible julian end date
Lazy: the lowest high julian start date, and the highest low julian end date
Wide: the highest low julian start date, and the lowest high julian end date
Greedy: the lowest low julian start date, and the highest high julian end date

These are calculated using dates declared by any user for an event. This means that depending on filtering method used, data validation could include or exclude the same event depending on whether a potential match's dates fell across start or end parameters of a filter or query. The resulting filter method visually looks something like a box-plot used for calculating highs and lows, alongside medians, usually for financial purposes.