Data Integration for Spreadsheet Applications via Linked Data Standards
The Spreadsheets Preservation Plan by the National Archives and Records Administration (NARA) is a comprehensive guide aimed at ensuring the long-term preservation of spreadsheet data within federal records. This plan outlines digital preservation standards suitable for various spreadsheet file types, with a particular focus on Microsoft Excel and OpenDocument Spreadsheet formats.
According to NARA’s Linked Open Data, the preservation plan covers several spreadsheet file formats, including:
- Microsoft Excel (.xls, .xlsx)
- OpenDocument Spreadsheet (.ods)
- Lotus 1-2-3 Worksheet (.wks, .wk1, .wk2, .wk3, .wk4, .wk5)
- Lotus Improv Spreadsheet (.imp)
- Lotus 1-2-3 Graph (.gph)
The Digital Preservation Framework IDs for these formats are as follows:
- Microsoft Excel 2.x: NF00259
- Microsoft Excel 2000-2003: NF00260
- Microsoft Excel 3.0: NF00261
- Microsoft Excel 4.0: NF00262
- Lotus 1-2-3 Worksheet 1.0 and 1A: NF00228
- Lotus 1-2-3 Worksheet 2.0 and 2.x: NF00229
- Lotus 1-2-3 Worksheet 3.0: NF00230
- Lotus 1-2-3 Worksheet 4.0/5.0: NF00231
- Lotus 1-2-3 Graph: NF00722
- Lotus Improv Spreadsheet: NF00873
The plan emphasises the importance of maintaining the authenticity, usability, and accessibility of spreadsheet data over time. However, for more detailed format-specific information and technical specifications, it's recommended to consult NARA’s Linked Open Data portal or specific preservation documentation directly.
The Structured Data: Spreadsheets Preservation Plan also documents the significant properties of spreadsheet records. NARA makes these details available in Resource Description Framework Terse RDF Triple Language (RDF Turtle) files, which can be opened in any text editor. The RDF Turtle files are accessible via NARA’s Linked Open Data portal.
In addition to raw data, spreadsheet files may contain charts, visualizations, and formulae. This plan can be used as test criteria for tools and processes used in format transformations. The Digital Preservation Framework as Linked Open Data includes the same elements as the Preservation Plans on GitHub.
Technology plays a crucial role in the Spreadsheets Preservation Plan by utilizing data-and-cloud-computing technologies to ensure the long-term preservation of spreadsheet data. The plan outlines various technological approaches for digital preservation, including the use of specific formats such as Microsoft Excel and OpenDocument Spreadsheet, each with their unique digital preservation framework IDs.