Weekly Update

Archiving our cultural heritage in the cloud

DuraSpace, a not-for-profit consortium of universities, libraries and museums, just launched a SaaS solution to simplify the preservation of digitized cultural objects in the cloud. Using public cloud resources from both Amazon and RackspaceDuraCloud hides the complexity of ensuring that valuable resources are preserved for the future. This launch of a subscription service takes the DuraSpace organization in new directions — and possibly to the private cloud.

University and national libraries, as well as museums and archives, have been digitizing their collections since the earliest days of the web. This work both increases access to rare and delicate material and serves to preserve something for future generations if disaster should befall the original work. Digital copies of cultural artifacts — and the metadata used to describe them — have typically been stored in digital repositories such as DuraSpace’s DSpace and Fedora, or the UK’s ePrints. For richly funded institutions such as MIT, Columbia or Cambridge, these systems have worked well. But in smaller institutions, projects have been more likely to use software like Microsoft’s Access database. Although less suitable for the task, these simple databases have been easier to use than the free but complex repository systems offered by DuraSpace and others. DuraCloud promises to take capabilities previously reserved for the rich and well-staffed institutions and make them available in a web browser to anyone.

Packaged as a hosted service that removes the need to configure hardware or patch software, DuraCloud initially appears expensive, costing $375 per month. This includes an Amazon or Rackspace virtual machine (worth about $70) and 500 GB of storage (worth $60–$70), as well as support and updates. Additional storage is billed at your chosen cloud provider’s list price and is added to the DuraCloud invoice. DuraSpace CEO Michele Kimpton sees this as one way that DuraCloud delivers real value to subscribers. Purchasing rules in many libraries, for example, prevent the use of credit cards. Invoices from DuraCloud are far easier for libraries to deal with, as they fit entrenched processes based on purchase orders, approvals and invoices in ways that a traditional SaaS application’s use of credit cards or PayPal does not.

And let’s not forget redundancy, a key principle of digital archiving: The more copies of a document, the less chance there is of losing something forever. However, many institutions struggle to achieve this in a cost-effective manner. DuraCloud’s management interface offers a solution to the problem by letting institutions redundantly store data in multiple Amazon regions or replicate across both Amazon and Rackspace. A sync service ensures that copies remain identical and notifies administrators if data loss occurs. Copies held in other regions could replace lost data.

As well as supporting Amazon and Rackspace, DuraCloud will soon add Microsoft’s Windows Azure. There is also an adapter for Eucalyptus, and Kimpton says she is “looking for a partner” interested in running a Eucalyptus-powered option. She is also interested in OpenStack and tracks Rackspace’s transition to OpenStack code. A UK project exploring the feasibility of running centralized cloud infrastructure for universities might be one place in which an OpenStack-based DuraCloud installation could be tested. As a dedicated academic private cloud, it should drive costs lower than the DuraCloud service itself can, by hosting its own version of the (open-source) DuraCloud software on virtual machines and storage that can then be rented to partners at lower rates than commercial cloud services can match.

As private academic clouds like the OpenStack-powered one at the San Diego Supercomputer Center (SDSC) begin to appear, DuraCloud would be wise to evaluate the cost of basing future services on a network of similar installations at big cultural and academic institutions, rather than depend on the more commercial public cloud services. Shared — but private — cloud infrastructure running in a small number of larger cultural institutions might be capable of reaching sufficient scale to cost-effectively compete with the public infrastructure upon which DuraCloud relies today. Given the scale of the cultural sector and its long-term perspective on preservation, could this be a case in which the private cloud proves better than the public?

Question of the week

Are public cloud services the right place to preserve cultural treasures for future generations?