Datasets

 

1. Catalogue Data from Archives Investigator

Extracted 2011-02-28.

Description

For a description of this data, see this post: Making sense of the catalogue data.

Downloads

This dataset is available for download as:

  1) a set of XML files:

  2) an SQLite database:

  • ai_sqlite.zip (contains the combined content of the XML files) [18.8Mb]

 


License:
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Australia License.

4 comments on “Datasets

  1. Lynnette Fiddick on said:

    If the idea of this new project was to simplify Archives Investigator for the general public(ie those without a degree as an archivist) it has failed dismally. Archives Investigator itself is difficult enough to navigate and I cannot see how this “Data Project” is going to help the general public. At least the old Concise Guide was intuitive and most “ordinary” people were able to get some benefit from it. The jargon used seems to imply this new project is designed for bodies other than the general public.

    • Richard Lehane on said:

      Hi Lynnette

      thank you for commenting. It is a very important point you raise & I could probably set this out more clearly on the “About” page, so thanks for the opportunity to clarify.

      The intended audience for this project comprises, as you rightly note, specialists (both archivists and computer programmers). This project won’t be directly relevant to general researchers, but hopefully the outcomes of this project will benefit everybody.

      The datasets on this page comprise the raw data underlying Archives Investigator. By making that data available, it becomes possible for specialists to develop their own interfaces to the collection, federate our collection information with other collection databases (so you can search in other web portals to access our data), and use that information for other purposes (e.g. create computer applications such as The History Wall – http://defining.net.au/wall/).

      As a general researcher you probably won’t be able to use these datasets yourself but if you do have ideas about how access to State Records’ collection can be improved, please do share them as they might inspire specialists working with the data to create new ways of accessing the archives.

  2. I notice your agencies XML file does not include Preceding / Succeeding Agencies data. I thought to look as I have had ideas of a visual timeline showing consolidation and fragmenting of agencies or departments over time (perhaps something along the lines of this graphic).

    I say this as I’ve just spent a few hours manually compiling a timeline of Queensland Courts, and it really makes apparent the changes in relevant legislation, and regionalisation as well.

    I imagine perhaps such a timeline could also serve as a more intuitive way of finding records from the top down in the archival hierarchy — people often don’t know the historical name of a relevant department, but timelines of agencies grouped by function could visually give them a place to start.

    I’ve found your API project quite inspiring. There is so much potential to expand the ways we present archival material! Thanks for sharing your progress.

    • Richard Lehane on said:

      Hi Amanda
      Firstly, I’m sorry I have taken a long time in approving your comment – I’ve just returned from a short break.

      This sounds like a great project. And it should be possible!

      The reason the agencies.xml file doesn’t include relationships to preceding/succeeding agencies is due to the structure of the underlying database (Archives Investigator). That database contains all relationships information in a single table (i.e. all the different relationships between all the different entities in the catalogue) & we didn’t have access to that table when compiling these datasets (long story!).

      The good news is that through the API we do have access to that table and you can access these relationships through logical routes: http://api.records.nsw.gov.au/agencies/:agency_id/succeeding and http://api.records.nsw.gov.au/agencies/:agency_id/preceding. You can add format suffixes to get results in different formats (XML, JSON etc.). For full details see: http://api.records.nsw.gov.au/usage. So one approach you could try would be to write a script that would iterate through all the agencies in the agencies.xml file, hit those preceding/succeeding routes in the API, and compile the results.

      An alternative approach would be to use the EAC-CPF format that the API returns. EAC-CPF is an international standard for encoding information about people/organisations related to archives – http://eac.staatsbibliothek-berlin.de/). This format includes relationships to other agencies (preceding/succeeding as well as parent/child) and embeds them in a single file. You can grab agencies in this format in bulk by using the route: http://api.records.nsw.gov.au/agencies.eac_cpf. Because this format is quite verbose we only return 25 results at a time. You can grab all the results by adding a page parameter to the URL and iterating. Or, if you use curl (http://curl.haxx.se/), you could try this one liner:
      curl -O http://api.records.nsw.gov.au/agencies.eac_cpf?page=[1-142]

      An added benefit of taking the EAC-CPF approach is that any timeline you create for NSW agencies using that format could also probably be generated for Public Records Office Victoria’s agencies. They used EAC-CPF when releasing their agency data last year: http://prov.vic.gov.au/publications/archival-data

      Good luck, please let me know if you need any help, and tell us how you get on!

      Cheers
      Richard

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Spam Protection by WP-SpamFree