May 242013
 

State Records NSW is excited to be a part of Govhack 2013 on 31 May to 2 June. We’re offering three great datasets for developers to work with: check out the Datasets page.

GovHack 2013

State Records NSW’s catalogue API

With over a million entries, http://search.records.nsw.gov.au is our largest dataset. The entire catalogue, as well as its search functionality, are accessible via calls to a Rails-style web API described here: http://search.records.nsw.gov.au/usage. The record series and items entities describe the holdings of the State Archives collection and are, in their own right, a fantastic historical resource. Those descriptions are linked to entities representing the agencies and people who made the records and the business functions and activities that informed their work. These contextual entities give a rich picture of government in New South Wales and its activities right back to the colonial period.

OpenGov API

State Records NSW manages the OpenGov NSW website. This website is a repository for information published by NSW Government agencies, including Annual Reports and open access information released under the Government Information (Public Access) Act 2009 (GIPA Act). The site currently contains over 2000 publications. Metadata for those publications, links to the PDFs, and their extracted full text contents are accessible via calls to a web API described here: https://www.opengov.nsw.gov.au/api.

Soldier settlement indexes

Specially released for the Govhack 2013 event, these indexes were created as part of State Records NSW’s Volunteer Program, and are finding aids for series of records that relate to New South Wales’s soldier settlement scheme for discharged soldiers who served in World War I. To get a picture of soldier settlement, check out the Wikipedia article: Soldier Settlement (Australia). The indexes have the names and locations of returning soldiers and would be a great resource for a Centenary of Anzac project. They are also very suited to geolocation and visualisation. The indexes are in a CSV format and are available on the Datasets page.

Interested in using State Records NSW’s data during GovHack 2013?

I’ll be at the Sydney event: please grab me () if you are interested in working with any of our datasets. If you are at one of the other locations, you can reach me on the weekend at @richardlehane.

Oct 112011
 

State Records NSW is now inviting our regular users to trial http://api.records.nsw.gov.au as a new search tool for accessing the State Archives collection.

If you have tried this new search tool, and have feedback to give, we would love to hear it. We are actively developing the tool and would like to make it as useful and as intuitive as possible. So please post any feedback you have as comments to this blog post.

From time to time we’ll post project updates to this blog. Any posts of particular interest for regular users using http://api.records.nsw.gov.au as a search tool are being marked with the “Regular users” category (in the right-hand column).

So far there have been posts on:

So what’s the whole API thing about anyway?

On this site, and in other places, you may find that the new search tool is also being described as an API, or application programming interface. This is because http://api.records.nsw.gov.au isn’t just a search tool, it is also an interface for making the raw data underlying the catalogue accessible, particularly for re-use by developers.

It’s a bit like toy trucks. If most online catalogues are toy trucks that you can play with, but only using the features built-in by the manufacturer, then http://api.records.nsw.gov.au is a toy truck built from lego bricks.

Because it is an API, you can take the search tool apart and use its “bricks” (i.e. XML or JSON versions of the search results and entities) to create other things (such as this mashup of ministries entities), mix it with other sources of data (e.g. to create federated search portals), or even upload your own data (by creating applications that automatically tag or add comments to items in the catalogue).

Lego truck, by monkeyc.net (flickr)

This “API approach” also has a lot of value for State Records because it means we can make better use of our own data (for example, it makes it much easier for us to contemplate creating new tools like mobile phone applications that integrate with the catalogue).

That said, if you just want a toy truck (a simple but powerful search tool), and don’t want to worry about all this API business, that’s OK, because, at the end of the day, it is a toy truck too!

 

 

Aug 252011
 

Opening Hours, by bbodien (flickr)

State Records’ API used to have the odd distinction of being one of the few web services with opening hours (Mon-Fri 9am to 5pm, open weekends too). This wasn’t anything intentional, just a compromise we’d had to make in order to connect the API to our online catalogue’s live database which required a network link that is unfortunately routinely shut down.

Anyway, thanks to the hard work of a number of our staff (Nott, Damien and Ninh), I am pleased to announce that the API is now available 24/7, like any proper web service should be.

This will enable:

  • after hours research
  • late night hacking
  • and overseas visitors!

It also marks a point at which the API can be considered to offer a reliable and stable service on top of which other stuff can be built. Of course, this doesn’t mean we will stop experimenting & in fact we’ve recently added a lot of cool new read-write functionality (tags and comments) that we’re busily documenting and will properly announce soon.

 

Aug 192011
 

The API is currently (Fri 19 August) undergoing maintenance and won’t be available again until Monday (22 August).

The good news

Since launching we’ve been forced to close down at 5pm each evening as a result of a regular, scheduled shutdown of our network connection. This problem is being addressed by today’s maintenance and (fingers crossed) from next week the API will be accessible fulltime. This will enable more widespread use of the API and will provide the stability required by developers wishing to develop against it.

Aug 082011
 

Zotero is a free tool that helps you collect and manage research notes and references. It has many useful features including the automatic creation of bibliographies and footnotes, online back-up and syncing, and search and tagging of your notes and references. Zotero also integrates with many different websites (such as library catalogues, online journals or newspapers, and reference sites like Wikipedia) to automatically record appropriate citations when you are doing online research.

State Records’ new API (http://api.records.nsw.gov.au) supports the automatic capture of series and item citations by Zotero.

To try this out:

  1. download Zotero. If you use Mozilla Firefox as your browser, you can install it as a browser plug-in. Otherwise, install the standalone version.
  2. navigate to the series or item in the API that you would like to cite. E.g. http://api.records.nsw.gov.au/series/1
  3. click the scroll icon in your browser’s address bar to automatically capture the citation.

Add series to ZoteroTo check that the citation has been correctly added, bring up the Zotero screen (Ctrl-Alt-Z) and you should have a new item in your Zotero library. You can append your research notes or attach digital images to this item.

In Zotero

Jul 072011
 

Relationships are key to the way that State Records NSW describes archives. Descriptions of individual items and series depend on their links to each other, to the agencies or people that created them, and to their role in government business (functions and activities), for their full meaning.

I am therefore very pleased to announce that relationships between entities are now included in State Records NSW’s API.

Relationships are visible when you visit pages for individual entities in the API, e.g.:

Entity relationships in the API

And of course, because this is an API, developers can access these relationships (in multiple formats such as XML and JSON) through logical URLs such as http://api.records.nsw.gov.au/series/1/persons.xml. For full details, see the documentation: http://api.records.nsw.gov.au/usage.

Many thanks to Wisanu Promthong (aka Nott), State Records NSW’s new Systems developer, Digital Archives, for implementing these additions to the API.

Jun 272011
 

I am excited to announce the addition of a new ‘Open archival data’ link (on the right of this page): Archival Data – Public Record Office Victoria.

As part of its project to release raw archival data for re-use, Public Record Office Victoria (PROV) is making agency and government function descriptions available for download as XML (agency data in the EAC-CPF format, function data in a format based on EAC-CPF). PROV is also considering future steps such as the publication of series and items data and the development of an API.

State Records NSW’s agency and function data is available in XML from the datasets page and through the new API. There is fantastic scope for combining this data with PROV’s (and indeed with similar data released by the National Archives of Australia). By doing so, what can we learn about Australian administrative history? How do the functions of Victorian and New South Wales government compare? Get mashing!

Jun 232011
 

State Records NSW’s new API is designed primarily as a framework to allow the development of new web services (both internally by State Records staff and by external developers). Nevertheless the creation of the API has provided us with an opportunity to experiment with new ways of presenting collection search results and this aspect of the project may be of interest to all researchers using the collection.

In this post I describe key features of the API’s collection search and also some of the more advanced functionality you can access ‘under the hood’. To try the collection search yourself, go to: http://api.records.nsw.gov.au.

An example

Example search results using the new API

What, why and who

Rather than presenting search results as a simple list, the API’s collection search provides a structured view, clustering results according to three questions:

  • what records (both record series and individual items) relate to the query?
  • why might records relating to the query have been created by Government (Government functions and activities)?
  • who in Government (agencies and people) might have created records relating to the query?

(For those interested in archival theory, this three-part division matches Australian archivist Chris Hurley’s conception of archival description as comprising three essential types of entity: documents, deeds and doers.)

Simple search, but not too simple

The new search box might look ‘simple’ but sophisticated searching of the catalogue is still possible.

Swamped with too many hits? Use the two ‘filters’ in the right-hand column of the results page to drill down to more relevant results. The date filter narrows results by date range. The series filter allows you to see at a glance the key record series relating to your query and narrow your results to particular series.

If you are a ‘power user’ you can include these filters in your initial search by adding the following special keywords to your query:

  • entities:[Item,Series,Function,Activity,Person,Agency]
  • series:[series id number]
  • from:[year]
  • to:[year]

For example, the following query will just return record items dating between 1900 and 1950:

Custom search

Incorporating the new collection search into your browser

Fallen in love with the new collection search? If you use a modern version of firefox or internet explorer, you can take it with you anywhere you go on the internet by including it amongst the ‘search providers’ in your browser’s search box (next to the main address box). This will allow you to quickly search State Records’ collection wherever you happen to be browsing. To do this:

  1. go to http://api.records.nsw.gov.au
  2. if you are on internet explorer, do this:

Or, if you are on firefox do this:

Got suggestions?

The API’s collection search is, like the API itself, still in an experimental mode. If you have any suggestions for how it might be improved we would love to hear them (and we might try to implement them). Please post your ideas as comments to this post.

Jun 212011
 

I am pleased to announce a new product from the Open Data project: http://api.records.nsw.gov.au

What is an API?

APIs, or ‘application programming interfaces’, provide platforms for developers to write applications that connect with other services. If you’ve used a Twitter client or a custom Flickr application, you’ve indirectly benefited from the web APIs that both of those companies provide. For State Records NSW, a web API has value both as a flexible base upon which we can build new and richer means of accessing the collection and also as an interface that external developers and services (such as federated search portals) can use to connect with the collection.

Features of State Records NSW’s API

Today’s release is an ‘alpha’ one, which is to say that it is a first draft and we would really appreciate feedback about the direction you would like us to head with it.

Broadly speaking, the API provides access to the ‘nouns’ of State Records’ catalogue (the entities: series, items, activities, agencies, etc.) and to one ‘verb’, search. Entities can be accessed either singularly or as lists through logical URLs (e.g. http://api.records.nsw.gov.au/agencies or http://api.records.nsw.gov.au/agencies/1) and are available in multiple formats (XML, JSON and some additional standardised formats such as MODS). Search results are available in XML (the OpenSearch format) and JSON.

Developer documentation is available here: http://api.records.nsw.gov.au/usage.

I’m not a developer, can I use it too?

Absolutely. All the resources in the API are available as normal web pages too and, in implementing the ‘search’ feature, we’ve been experimenting with new ways of presenting search results (try the ‘collection search’ box at http://api.records.nsw.gov.au). Please note however that although the API connects directly with Archives Investigator it is not yet complete: in particular, information about the relationships between entities is not yet represented (we’re working on it). Some other neat features of the API for regular users include Zotero and OpenSearch integration (I’ll describe these in a follow-up post).

Future directions

Today’s release is a starting point but we hope to develop the API into a platform capable of delivering access to a range of State Records NSW’s online resources (such as images, indexes, disposal authorities, and, in the future, digital archives). We hope to extend the range of formats supported (especially archival formats such as EAC-CPF). And perhaps make it a writeable API (capable of capturing user contributed data such as tags, comments and images as well as agency contributed data such as transfers of digital archives). If you have any ideas about features you’d like implemented, please suggest them as comments below.

Apr 152011
 

One of the goals of publishing State Records NSW’s archival control data is to promote creative re-use of that information. But what does that mean in practice… and how do you do it? In this post I give a small example of the type of mashup that can be created using the datasets on this site.

State Records NSW’s online catalogue, Archives Investigator, is often regarded simply as a gateway to archival records but it is a much richer resource than that. Archives Investigator provides information about the people or agencies that created records and their business reasons for doing so. This contextual information helps us identify and interpret records. It also has a lot of untapped secondary value. For example, the ministries data is perhaps the simplest of the datasets State Records has released. On their own, the ministry entities in Archives Investigator simply list the names and date ranges of government ministries, with links to the portfolios they contained. In aggregate, however, this data provides a complete timeline of political change in New South Wales since self-government.

Timeline of NSW ministries

Click on image to view interactive timeline

This timeline is an example of a mashup. A mashup is simply the combination of one set of data (in this case, information on ministry changes in New South Wales) with other data and/or applications. The great thing about mashups is that by combining data and applications together you can often create something that is greater than the sum of the individual parts.

Our starting point in creating this mashup is the ministries.xml file (you can find it in the context.zip file on the Datasets page). The structure of this file is very simple, for each ministry we have four fields of information:

  • the ministry number (which we can use to create links back to ministry entities in Archives Investigator),
  • the ministry title,
  • and start and end dates.

To make the timeline, we use the TimelineSetter application, an application recently released as open source by ProPublica.org. The four fields in the ministries.xml file are enough to establish the framework for a TimelineSetter timeline: we have dates, titles for the events, and we can use the ministry numbers to create links back to Archives Investigator (the ‘Read More’ links in the timeline).

This is a great start, and we already have a useable timeline, but we can make it better by mixing our data with other sources of publically available data.

One thing our ministry data lacks is information about the political affiliation of different ministries: were they Labor or Coalition or something else? By matching the names of the leaders of each ministry to lists of political leaders for the two major political parties we can make a pretty good guess. Luckily enough, Wikipedia has two such lists:

If you are making a mashup using Wikipedia you should consider using a service such as DBpedia which provides an interface to the structured data in Wikipedia.  For our timeline, however, we can take a simpler approach by screen scraping the Wikipedia pages directly. Screen scraping just means extracting data from a web page’s HTML code. The party leader information in Wikipedia is neatly arranged in particular columns within tables: we can use that structure to identify the cells we need to pull out the names.

Our timeline is looking better now. Along with the ministry information, we’ve been able to automatically categorise a good proportion of the ministries according to party affiliation. But it is still a little bare and it would be great to fill it out with some descriptive information about the different ministries.

The National Library of Australia’s Trove: Newspapers service is brilliant. If you haven’t seen it already, go take a look. It is a database of digitised Australian newspapers that runs from 1803 right up to the mid-1950s (and until the 1980s for the Australian Women’s Weekly). By searching in New South Wales papers on the dates of ministry change for the names of new premiers, we are sure to discover many links to relevant newspaper articles that will enrich our timeline.

In the same way that we screen scraped Wikipedia entries, we could probably also screen scrape Trove. This is a bit fiddly however and Tim Sherratt has provided a much better solution with his unofficial Australian newspapers API. An API, or application programming interface, is just a set of rules that defines how one computer service can talk to another. On the internet, APIs enable users to write programs that can interact with web sites. Tim Sherratt’s API allows us to query the Trove:Newspaper database and get back the results in formats (XML and JSON) that an application can read.

And that’s all it takes! By displaying State Records’ ministry data with the TimelineSetter tool and by connecting it with information from Wikipedia and Trove:Newspapers, we’ve quickly created a pretty useful tool. What ideas do you have for mashups using State Records’ data?

If you are interested in seeing the code that created this timeline, check out: https://gist.github.com/1005698