Apr 152011
 

One of the goals of publishing State Records NSW’s archival control data is to promote creative re-use of that information. But what does that mean in practice… and how do you do it? In this post I give a small example of the type of mashup that can be created using the datasets on this site.

State Records NSW’s online catalogue, Archives Investigator, is often regarded simply as a gateway to archival records but it is a much richer resource than that. Archives Investigator provides information about the people or agencies that created records and their business reasons for doing so. This contextual information helps us identify and interpret records. It also has a lot of untapped secondary value. For example, the ministries data is perhaps the simplest of the datasets State Records has released. On their own, the ministry entities in Archives Investigator simply list the names and date ranges of government ministries, with links to the portfolios they contained. In aggregate, however, this data provides a complete timeline of political change in New South Wales since self-government.

Timeline of NSW ministries

Click on image to view interactive timeline

This timeline is an example of a mashup. A mashup is simply the combination of one set of data (in this case, information on ministry changes in New South Wales) with other data and/or applications. The great thing about mashups is that by combining data and applications together you can often create something that is greater than the sum of the individual parts.

Our starting point in creating this mashup is the ministries.xml file (you can find it in the context.zip file on the Datasets page). The structure of this file is very simple, for each ministry we have four fields of information:

  • the ministry number (which we can use to create links back to ministry entities in Archives Investigator),
  • the ministry title,
  • and start and end dates.

To make the timeline, we use the TimelineSetter application, an application recently released as open source by ProPublica.org. The four fields in the ministries.xml file are enough to establish the framework for a TimelineSetter timeline: we have dates, titles for the events, and we can use the ministry numbers to create links back to Archives Investigator (the ‘Read More’ links in the timeline).

This is a great start, and we already have a useable timeline, but we can make it better by mixing our data with other sources of publically available data.

One thing our ministry data lacks is information about the political affiliation of different ministries: were they Labor or Coalition or something else? By matching the names of the leaders of each ministry to lists of political leaders for the two major political parties we can make a pretty good guess. Luckily enough, Wikipedia has two such lists:

If you are making a mashup using Wikipedia you should consider using a service such as DBpedia which provides an interface to the structured data in Wikipedia.  For our timeline, however, we can take a simpler approach by screen scraping the Wikipedia pages directly. Screen scraping just means extracting data from a web page’s HTML code. The party leader information in Wikipedia is neatly arranged in particular columns within tables: we can use that structure to identify the cells we need to pull out the names.

Our timeline is looking better now. Along with the ministry information, we’ve been able to automatically categorise a good proportion of the ministries according to party affiliation. But it is still a little bare and it would be great to fill it out with some descriptive information about the different ministries.

The National Library of Australia’s Trove: Newspapers service is brilliant. If you haven’t seen it already, go take a look. It is a database of digitised Australian newspapers that runs from 1803 right up to the mid-1950s (and until the 1980s for the Australian Women’s Weekly). By searching in New South Wales papers on the dates of ministry change for the names of new premiers, we are sure to discover many links to relevant newspaper articles that will enrich our timeline.

In the same way that we screen scraped Wikipedia entries, we could probably also screen scrape Trove. This is a bit fiddly however and Tim Sherratt has provided a much better solution with his unofficial Australian newspapers API. An API, or application programming interface, is just a set of rules that defines how one computer service can talk to another. On the internet, APIs enable users to write programs that can interact with web sites. Tim Sherratt’s API allows us to query the Trove:Newspaper database and get back the results in formats (XML and JSON) that an application can read.

And that’s all it takes! By displaying State Records’ ministry data with the TimelineSetter tool and by connecting it with information from Wikipedia and Trove:Newspapers, we’ve quickly created a pretty useful tool. What ideas do you have for mashups using State Records’ data?

If you are interested in seeing the code that created this timeline, check out: https://gist.github.com/1005698

Mar 212011
 

This post describes the entities and fields in the catalogue data published by State Records NSW. To download this data, visit the Datasets page.

What is missing from the catalogue data?

Before describing what the dataset contains, it is important to note that the dataset is missing a key element: information about relationships between entities that aren’t simple one-to-one links (e.g. a series may have links to any number of preceding series). A means of publishing these relationships is currently being investigated. If you have any ideas about how this could best be achieved, please share them as comments on this post.

Understanding the entities

State Records NSW’s online catalogue (Archives Investigator) contains information about records held as archives (in Item and Series descriptions) as well as information about the contexts in which records were created and used. Contextual information includes information about the creators of records (in Agency, Person, Organisation, Ministry, and Portfolio descriptions) as well as information about the business purpose of records (in Function and Activity descriptions). 

 

Information about records

Series

A record series is a group of (one or more) record items accumulated by an agency or person which have a common identity and system of control, and are generally in the same format.

The dataset contains 15202 series descriptions.

The series data released by State Records NSW has the following fields:

  • Series_number
  • Series_title
  • Start_date_qualifier
  • Start_date
  • End_date_qualifier
  • End_date
  • Contents_start_date_qualifier
  • Contents_start_date
  • Contents_end_date_qualifier
  • Contents_end_date
  • Descriptive_note
  • Format
  • Arrangement
  • Copies
  • Bridging_aids
  • Series_control_status
  • Repository

For examples of the information contained in these fields, see this series example: Semi-official papers of Mr G.A. Robinson, Chief Protector of Aborigines

Items

A record item is an individual unit within a record series, and the smallest entity listed in Archives Investigator. A record item may be in any format: (for example) a file, card, volume, plan or drawing, photograph or videotape. Some record items (such as files) may contain multiple individual documents but these are not normally listed as individual entities.

In order to fully understand the significance of a record item it is vital to know what record series it forms part of. There is usually no way to determine the context or content, or format of a record item without learning about the record series.

The dataset contains 342622 item descriptions.

The item data released by State Records NSW has the following fields:

  • ID (this field can be used to construct links to Archives Investigator i.e. by concatenating with http://investigator.records.nsw.gov.au/Entity.aspx?Path=\Item\. The ID cannot be used to retrieve items in State Records NSW’s reading rooms, both the Series_number and the Item_number_or_control_symbol field should be used for this purpose.)
  • SeriesType
  • Series_number
  • Item_number_or_control_symbol
  • Item_title
  • Descriptive_Note
  • Start_date
  • End_date
  • AccessDirectionNo
  • ImagesCount
  • Availability

For examples of the information contained in these fields, see this item example: Mendooran – Tooraweenah: detail & permanent survey (PF25)

Information about creators

Agencies

An agency is an administrative or business unit which has responsibility for carrying out some designated activity.

The dataset contains 3521 agency descriptions.

The agency data released by State Records NSW has the following fields:

  • Agency_number
  • Agency_title
  • Start_date_qualifier
  • Start_date
  • End_date_qualifier
  • End_date
  • Category
  • Creation
  • Abolition
  • Administrative_history_note

For examples of the information contained in these fields, see this agency example: Department of Prisons (1874-1970)

Persons

A person is an individual who creates records, usually in an official capacity, but whose records have not been maintained in the records of the associated agency.

The dataset contains 179 person descriptions.

The person data released by State Records NSW has the following fields:

  • Person_number
  • Surname
  • Given_names
  • Birth_date_qualifier
  • Birth_date
  • Death_date_qualifier
  • Death_date
  • Alternative_name
  • Prenomial_honorifics
  • Postnomial_honorifics
  • Offices_held
  • Biographical_note
  • Minister

For examples of the information contained in these fields, see this person example: Dovey, Wilfred Robert

Organisations

An organisation is a whole government, municipal council, incorporated company, church or other body that is generally regarded as independent and autonomous in the performance of its normal functions.

The dataset contains 62 organisation descriptions.

The organisation data released by State Records NSW has the following fields:

  • Organisation_number
  • Organisation_title
  • Start_date_qualifier
  • Start_date
  • End_date_qualifier
  • End_date
  • Creation
  • Abolition
  • Administrative_history_note
  • Commonwealth_Organisation_CO_number

For examples of the information contained in these fields, see this organisation example: Colony of New South Wales

Ministries

A ministry is the body of ministers who hold warrants from the Head of State as members of the Executive Council. A ministry comprises a number of portfolios. A ministry is often named for the Premier who led it. Coalition ministries are often named after both leaders.

The datset contains 93 ministry descriptions.

The ministry data released by State Records NSW has the following fields:

  • Ministry_number
  • Ministry_title
  • Start_date
  • End_date

For examples of the information contained in these fields, see this ministry example: Donaldson Ministry

Portfolios

A portfolio is the responsibility, or combination of responsibilities, assigned to a particular minister. Portfolios administer agencies.

The dataset contains 266 portfolio descriptions.

The portfolio data released by State Records NSW has the following fields:

  • Portfolio_number
  • Portfolio_title
  • Start_date_qualifier
  • Start_date
  • End_date_qualifier
  • End_date
  • Descriptive_note

For examples of the information contained in these fields, see this portfolio example: Colonial Secretary (1856-1889)

Information about business purpose

Functions

A function is a major area of responsibility, authority or jurisdiction assigned to or assumed by an organisation. Functions derive from mandates usually given in legislation. Functions can be permissive or prescriptive. They constitute the principal themes of business of any organisation.

The dataset contains 14 function descriptions.

The function data released by State Records NSW has the following fields:

  • Function_number
  • Function_title
  • Start_date_qualifier
  • Start_date
  • Descriptive_note

For examples of the information contained in these fields, see this function example: Law and Order

Activities

An activity is a part of a function. Activities are used in Archives Investigator to provide more specific functional context for record series than can be provided by a function.

The dataset contains 182 activity descriptions.

The activity data released by State Records NSW has the following fields:

  • Activity_number
  • Activity_title
  • Start_date_qualifier
  • Start_date
  • End_date_qualifier
  • End_date
  • Creation
  • Abolition
  • Descriptive_note

For examples of the information contained in these fields, see this activity example: Legal Opinions

Questions?

If you would like more information about this dataset or any of the information it contains, please post your questions as comments to this post.