Digital Preservation is People: Thinking About Digital Skills for Archivists

Dec 20, 2018

23 minute read

This piece was originally presented as part of the afternoon panel on digital preservation at the Archives Association of Ontario’s 2018 Institutional Issues Forum on October 25, 2018. Many thanks to the AAO for hosting me and giving me a space to think through some issues that have been on my mind for a while. My thanks also go to Ryan Kirkby for shepherding the session, to Heather Ryckman, my excellent fellow panelist.

Note: I thought I was being original when I came up with the title “Digital Preservation is People,” but there are at least three other examples of this usage. (Thanks to Sarah Romkey and Joshua Ranger for pointing out a few more that were missing from this original post). The aforementioned Joshua Ranger published a prescient 2014 blog post for AVP on resource management in the field with the same title. Trevor Owens opens his recently-published book The Theory and Craft of Digital Preservation (which is also available as a preprint) using the turn of phrase, and Ross Spencer’s project of the same name is looking very promising! You should consult all of these great sources.

And of course, I am not alone in approaching this subject - there’s been much work on it in the past and much work to come. The DigCurV research project (2011-13) is a big one in the digital preservation-related field worth mentioning. DigCurV sets out a fairly well-articuated set of skills and competencies for digital archives work. Since I gave the initial presentation, I’ve been happy to see several other works on the subject of learning digital skills. Edith Halvarsson at the Digital Preservation at Oxford and Cambridge project wrote an excellent blog post on the subject in relation to the 2018 Memory Makers conference “Digital Preservation Skills and How to Get Them.” A blog post by Margot Note at the vendor Lucidea also summarizes some of the previous work in the field. And the IEEE Big Data 3rd Workshop on Computational Archival Science included a presentation and paper by William Underwood et al. about introducing computational principles into archival studies programs, including an IMLS workshop being convened on the subject this April. Clearly this subject is in the air!

Getting Started

When I was a kid, I was a frequent watcher of the 1990s television show Ghostwriter. The show was about a group of precocious middle school students who solved mysteries with the help of the eponymous character Ghostwriter. As the name suggests, Ghostwriter is a ghost who is able to gather information and communicate it to the students through writing. The basic point of the show was to teach literacy: kids watching would be reading the ghost’s clues along with the characters in the show to help solve the mystery. Ghostwriter ran for three seasons on PBS between 1992 and 1995. One plot line that stood out for me at the time was a series of episodes called “Who is Max Mouse,” which aired between December 1993 and January 1994. The episode concerns a hacker who takes over the kids’ school computer system via a computer virus. The show did an able job of introducing viewers to something that many people were then just beginning to get in their homes: desktop computers and internet connections, not to mention computer viruses. I was lucky to have access to a computer at the time due to my Dad’s purchase of what I believe was a Tandy 1000 from Radio Shack, and I remember playing games on it from an early age. But things really started blowing up with the introduction of Windows 95 when personal computers became much more affordable and user-friendly to the average North American family. Educational shows like Ghostwriter were on the cutting edge of introducing kids to computing. But with the excitement of this new technology, Ghostwriter also introduced me to the feeling of intimidation that comes from being unfamiliar with technologies. “Who is Max Mouse” features a young Julia Stiles as a know-it-all middle school paper editor, who blasts the other kids with a stream of tech-sounding language copped from William Gibson’s novel Neuromancer: “Can you jam with the console cowboys in cyberspace? Ever experienced the new wave? Next wave? Dream wave? Or cyberpunk? I didn’t think so,” she says condescendingly. A popular comment below the YouTube clip of this scene summarizes the speech this way: “The original cyberbullying was literally bullying someone in real life by using intimidating computer jargon.”

Julia Stiles in Ghostwriter — Julia Stiles in the Ghostwriter episode "Who is Max Mouse?"

We’ve been living with computer systems as part of our daily realities for a long time now. Archivists have been talking about the problem of preserving digital materials for decades. OAIS started to get underway in 1995. The Report of the Task Force on Archiving of Digital Information was published in 1996 — almost as old as Ghostwriter. Why is it that the subject seems perpetually new? Why have institutions in Canada been so slow to move forward on their digital stewardship responsibilities? The reasons are complex, and are bound up with the way archives are funded, and how they operate as fairly conservative, slow-moving institutions usually attached to even more conservative, even slower-moving organizations like universities and government bureaucracies. But a major reason, to my mind, is the continued intimidation and anxiety that comes with digital things — a set of baggage that archivists have yet to let go of.

This baggage may begin with the predisposition of most archivists towards humanities fields, and history in particular. Though it’s now nearly twenty years old, a 1999 survey of students in archival programs found just 5% came to programs with a past degree in the sciences (Wallace, 1999). The rest were mostly from humanities and social sciences backgrounds, especially the usual suspects: history, literatures and languages. I’d argue — without concrete evidence — that most individuals who come to archives from these fields tend to privilege the physical historical materials that typically animate studies of history and literatures. Though this is changing with areas of interest such as the use of web archives for historical analysis, or the maturation of the broad field of digital humanities in general, my experience has been (and a new survey would confirm this if someone is interested in doing such a thing) that the would-be archivists coming into the field today are still mostly doing so with an interest in working with physical historical materials. Even if student interests are changing (and they probably will as new generations of students who have never known life without computers come into graduate programs), it is definitely the case for most practicing archivists. I’ve heard it many times before, and it came up again in the conversation that followed the initial delivery of this talk: working with digital materials was compared to eating broccoli: good for you, but not nearly as enjoyable as other archival foodstuffs, so to speak.

There is pleasure to be found in the certainty of working with analogue media: it’s known and understood, and therefore comfortable. By contrast, the idea of processing digital archival materials seems to be synonymous with fear and uncertainty: where to begin, how to do it the “right” way in the absence of any clear right way at all, the terrifying potential scale (thousands of emails, so many hard drives!) and consequent difficulties in determining what ought to be retained when donors or depositors themselves might not know what they are giving. I think it’s fair to say that there are generations of archivists that feel unprepared to undertake the activities to do digital preservation. It has meant that archives, if they are accepting digital materials at all, have left them largely unprocessed, and it means that administrators, themselves unfamiliar with digital archives, have not been investing in the various pieces of systems and infrastructure that doing digital preservation requires. As an aside, I should note that there are exceptions to this circumstance. I’ve met some passionate individuals who have come to archival studies as second careers after time spent in various information technology-related fields. There are other individuals who have taught themselves a great degree of technological literacy out of an interest and drive to do this work. These people are naturally drawn to the area of digital preservation based on their previous skill sets and are the ones who are currently doing most of the work right now, particularly in the area of building new tools and approaches. But they are a rarity, and an administrator should not wait for one of them to walk into their institution and solve their problems. It’s my firm belief that archivists all need to be, at some basic level, computer literate going forward if they are going to continue to do the work. Not everyone needs to be an expert, but practitioners should have at least a general knowledge if they are tasked with handling digital things. And right now, on the whole, I don’t believe this knowledge is at all general. In my view, a lot of this all boils down to good training; training that should begin at least in graduate school (if not earlier in high school) and extend into professional life. For the rest of this post, I am going to set out the current context and problem situation when it comes to digital preservation. I define the problem situation from a couple angles, some of them based on my personal experience as a service provider and occasional instructor, and others with a firmer grounding in the data from the Canadian Association of Research Libraries’ recent digital preservation survey, which I serve as lead on. I’ll point to a brief survey of the current state of training, and then discuss what I think some possible next steps might be.

A final note of introduction: I am not speaking from the position of being a comfortable traveler in the digital space. I am not trained as a programmer or systems administrator and have not been educated in computer science. I started my career in a small county archives in high school and pursued literature through undergraduate and graduate degrees, followed by archival studies and library studies at the University of British Columbia. This kind of path is common for many in the field, really. My interest in working in digital preservation comes from my interest in being an archivist as a whole: how stories can get told, through time, and in connection and conversation with the communities of people who need them. I should also note that for a topic this big, I can’t address all aspects - please take the following as provisional at best! Someone could easily write a PhD thesis or more on such a subject.

Experiences as a Service Provider

My current work role is as Digital Preservation Librarian at Scholars Portal. Scholars Portal is in the business of providing shared technology services to members of the Ontario Council of University Libraries. Until relatively recently, preservation services within this mandate have focused primarily on the preservation of shared licensed content. These are journal articles as well as books and other materials that libraries buy en masse from content providers, as well as gather from open access sources. The rationale of course is that libraries invest heavily in purchasing and curating these materials as core to their operations, and having a producer-independent mechanism for ensuring continued access to these materials is of paramount importance. As a result of my predecessor’s Steve Marks’ efforts, we have a collection approaching 40 million journal articles that have been preserved, and the journals repository was certified as a trustworthy digital repository in 2013. These materials are acquired from content providers and preserved by us on behalf of members, who grant us the right to do so and monitor our operations via OCUL’s governance structure. Individual librarians at institutions do not have to concern themselves with the technical aspects by which they are preserved unless they are interested in doing so, in which case our documentation is open (and I’m working to migrate it to a nicer-looking platform at the moment).

While my focus is still on preserving these kinds of materials, Scholars Portal staff and partners within OCUL also instituted work to start developing a suite preservation services that moved beyond shared content to locally created and managed content. A major one was the development of a cloud network hosted between five Ontario universities and managed by Scholars Portal called the Ontario Library Research Cloud. This provides members with a robust storage infrastructure that can be used for a number of functions, preservation among them. Steve also initiated the idea of the Permafrost hosted digital preservation service, which offers a set of tools for preservation purposes, including at its core hosted instances of the popular preservation processing workflow engine Archivematica. I have since taken the development of this project on and turned it from an idea into a functioning service. Myself and the talented systems support team member Dawas Zaidi manage the day-to-day operations of the service. The way the service works is that we offer the technical infrastructure, while the subscriber to the service uses it for their institutional purpose: to process digital materials for preservation. I have been largely working with archives and special collections units at Ontario universities. None of the archivists who I’ve worked with in the development of the service through a year-and-a-half-long pilot had anything beyond theoretical knowledge of digital preservation. I can say that the process was definitely a huge learning curve for them as we worked through many large and small issues and questions to get Archivematica-based workflows up and running. I was learning a lot of this as I went along myself too! I expected that the participants would mostly learn by doing hands-on processing: the “follow the steps to success” type of approach. In practice, I found that most could have used a crash course or refresher in digital preservation principles first, followed by the guided application of those principles. In their use of the service, archivists wanted to understand in a detailed way what they should be doing to digital materials to process them for preservation, why these things matter, and how to make preservation decisions going into the future. Sustaining a useful service without clients having this baseline of knowledge would be quite difficult, I think. Going forward, I now offer a quick-and-dirty intro to digital preservation concepts webinar before moving into the practical aspects, which also involve a lot of handholding during an onboarding period. It’s not a full education in the subject, but it’s a start. My hope is that as users develop more confidence, they’ll start being able to share knowledge amongst themselves.

Another related thing I’ve noticed is the necessity of explaining the limits of what we can do as a service provider. I have to make it very clear to subscribers where their responsibilities lie versus those of Scholars Portal. The expectation among some individuals I have spoken to, particularly since we do this for shared content I think, is that we will take care of all aspects of preservation for them. But I cannot be the digital archivist for 21 academic libraries in Ontario. For example, policy-related decisions are not something I can make for materials that are not under my stewardship. I can advise on what kinds of metadata to include, or what file formats to normalize to, but at the end of the day that decision is up to the institution. They know their collections and their users in a way that I cannot, and this information has a daily impact on the decisions made when preserving digital materials. The issue, then, is in having a) staff who can do the work, and b) staff who are knowledgeable enough to make decisions. Shared services can, and should, complement local ones. But unless they are willing to give up the mandate to keep these materials at all, local institutions need to invest in local resources to do the work that shared services complement, and staff who are knowledgeable enough to participate. I’m very much interested in how digital preservation work can scale, but I also believe that many activities that make up this work need a dialogue with the individuals who care for, and use, the materials in question. Digital preservation work is not a blunt on or off switch that turns unpreserved things into preserved ones: the process should always be contextualized by specific needs. Having ongoing learning in place allows this work to progress and mature.

Experiences as a Teacher

I have also now taught four workshops on digital preservation for a number of different audiences. I didn’t really set out to pursue work on the training side, but a few “asks” later and I now have something resembling an in-progress curriculum that serves as a decent one-day introduction to the topic. Teaching provides me a good excuse to pursue my own learning too as different audiences may have interests that are outside my knowledge base. During these sessions, I’ve witnessed an incredible range of knowledge levels when it comes to digital preservation, and computer systems in general. At my last Archives Association of Ontario workshop in May 2018, at least three attendees of out 30 thought they were taking a course on digitization and were looking for scanning standards and so on. I was sorry to disappoint them, but I hope they got something out of the workshop anyway. There was a second larger middle group of individuals who knew what digital preservation was as a concept, but did not know its components or functions. This was the group I generally focused my efforts towards, since this was my (reasonably correct) assumption of who the audience would be. A third smaller group knew a few things about a certain tool, or some specifics regarding file formats. But they didn’t know how to string these together into a workflow. What they wanted was a hands-on demonstration of this, which is difficult to provide when it takes a full day to introduce the basic components.

In aiming for the middle ground during teaching these workshops, I’ve gradually developed a day-long curriculum that consists of:

What is a file?
Introduction to main standards/practices: OAIS, TDR, METS/PREMIS
Introduction to steps in a workflow with example tools (checksums, file IDs, normalization, etc.) and demonstrations of each tool’s output
Introduction to workflow tool: Archivematica (which links all the tools demo’d in the prior step).
Several conversation or question-based activities* throughout the day that speak to these things

*At some point it’d be interesting to try and introduce more active tool use using something like PythonAnywhere. I’ve also had some sessions with activities using Archivematica.

This “curriculum” has been largely sufficient for that middle group, but it made it difficult to satisfy the outliers. Ideally, there would be a base level of knowledge to work from that was shared by the community to enable more targeted, progressive training, but this is currently not the case. We may have to start thinking of more consistent approaches to move learners along from one knowledge position to another. I think that for those looking to see more hands-on digital preservation, a special interest group or discussion forum or the like might actually be of more benefit. Edith’s blog post mentioned above refers to the idea of “collaborative learning” as an antitode to the traditional, lecture-based approach to digital preservation that tends to be the most prominent.

Notes from the CARL Digital Preservation Working Group Survey

Some of these observations are also borne out of the data I have been collecting with the Canadian Association of Research Libraries’ Digital Preservation Working Group survey on digital preservation capacity and needs in Canada. This survey was initiated to gather information on the state of digital preservation activities, and the current gaps, at Canadian institutions. We had 51 respondents, half of whom were CARL members, and half of whom were other Canadian memory institutions large and small. The results show some interesting (and concerning) trends in staffing overall. The average level of staffing was 1.11 FTE (Full Time Equivalent) in total per institution — that is, the equivalent to just over 1 person. While 4 of the 48 respondents who responded with staffing information had 5 or more full-time staff for this work (and therefore boosted the average FTE overall), 65% of the rest (31 respondents) had less than 100% FTE for digital preservation in total across all roles. Furthermore, of the 145 roles listed by all of the respondents, 54% of these had 20% or less time for digital preservation assigned to the role.

You can read a whole lot more about these figures in the interim survey reports for phase 1 and phase 2 of the survey as well as my summary slides here. A final report is coming in early 2019. But the most interesting figures for my purposes in this post are the expectations for changing roles. 47% of respondents said they expected to expand staffing for digital preservation, of which 75% said they would do so via reassignment, and 63% said they would hire new staff (respondents could pick one or both options). How, then, will existing staff be trained to fulfill these new responsibilities? What skills and knowledge should (and will) new hires bring, and will they meet institutional needs?

Scan: Graduate and Technical/College Training

First up: what does graduate and college-level training offer? I did a brief, unscientific study of the course offerings of information school programs across Canada, of which there are seven. Four of these had classes that specifically mentioned digital curation/archives/preservation in the title or description, though none of these were required, so it would be very possible to complete a degree without ever finding out about the subject in much detail, unless it is taught in some limited way within a larger required class. Some students may also be getting a practical education with internships or co-ops but these are of course dependent on their interests and the abilities of host institutions to support these interests. As far as technical programs for archives and records management go in Ontario, I couldn’t find any evidence of digital preservation training from the three program sites I looked at, though curriculum details were slim. I’d be curious about what kinds of discussions are happening around the condition of digital skills in graduate and technical programs and whether these programs are doing any assessment of their current offerings in this light.

Scan: Professional Development

On the professional development side, there is a loose network of workshops and other programs out there. One of the better known is the Digital Preservation Management workshop. The workshops are hosted internationally, mostly at universities, and who gets to attend a particular workshop depends on who the host is. They last 3-5 days and typically focus more on the standards and policy side of digital preservation. Students get a deep grounding of OAIS, for example. There is a focus on assessing programs as well. There is a lot of discussion of approaches to digital preservation but it does not really focus on particular tools or workflows in great detail, at least from my experience taking the course in 2016. However, it gave me a really solid grounding in OAIS and other important concepts in digital preservation program development.

Digital POWRR institutes have been offered a handful of times in the USA based on IMLS funding. POWRR stands for “Preserving digital Objects With Restricted Resources.” They seem to focus more on actionable practices and the hands-on application of tools. Participants work with virtual machines and the command line, for example. These workshops have been hosted in the US only. They also have a set of webinars online. POWRR and DPM workshops together would constitute a pretty good intro to the field, in my mind.

The Society of American Archivists has a Digital Archives Specialist certification program that they offer. My fellow panelist Heather Ryckman at the AAO institutional forum is a graduate of the program and had good things to say about it. The DAS certification requires nine courses, two of which must be taken in person, over two years. Four courses must be taken every five years to continue to maintain the certification. While I’m sure that this program is very good based on the curriculum outlined, I expect the ongoing cost of these courses might make formal certification difficult for most, though I could see a justification for someone looking to move into the ‘expert’ level at their jobs. The individual courses provide a good model for a baseline approach to fulfilling needs for competencies and skills, though. I also am unsure about the need for formal certification in this area. I’d rather see archivists learning at least some of these things as part of their (already expensive) graduate education, and then having more modular, cost-effective (and employer-supported) opportunities to increase their knowledge based on what is needed for their professional growth later on. Nevertheless, programs like this are definitely proof that that graduate education in this area is lacking, and that there is a need for a more comprehensive approach to training in this area.

Various tools-specific workshops are offered on a semi-regular basis around the world for things like BitCurator, Islandora and Archivematica but these are for people who generally already know the basics of digital preservation in general.

Some other miscellaneous training opportunities come from online resources at the Digital Preservation Coalition. Professional associations sometimes put on one-off workshops or may offer asynchronous webinars or the like, such as this upcoming course hosted by the Library Juice Academy.

There are a few promising initiatives that have recently come online. The Digital Preservation at Oxford and Cambridge project recently released some really nice training materials as part of a training pilot it conducted. Another is a project based at the University of Melbourne called digital preservation carpentry that builds off of the software/library carpentry type workshops out there. Not much has happened on this front beyond the planning stage as far as I can tell, but the idea is to develop a core technical curriculum that could be repeatable across different contexts.

Bringing it Together

As part of my original presentation, I sketched out a bunch of competencies I thought were useful to think about based on my experience to date. I hesitate to list them here only because there are better sources for this kind of thing like DigCurV. What is clear is that the training gap ought to be addressed in a more coordinated way across different levels of education. I’d love to see a system of more regularly available baseline training come online in Canada, to be later supported by more collaborative approaches to learning as a more knowledgeable community emerges from these initial efforts.

So, what kind of system should be in place to support this varied set of needs? I see the following as at least a start:

Evaluating existing frameworks for skills/competencies against current needs: are they still accurate/representative? What competencies, or levels of competency, are applicable to different levels of education, from graduate to continued learning?
More clearly assessing where potential learners are at, particularly among the working population, as well as assessing existing graduate programs for evidence of these competencies.
Developing potential shared training resources at different levels that fulfill certain competencies. At the very least, it’d be ideal to not have someone have to reinvent the wheel all the time when they want to run a course. Open educational resources, anyone? The DPOC materials certainly push in this direction.

Of course, all of this begs the question of who will perform such work. There are already a great number of organizations with a stake in this domain, either from the skills/education assessment side or the digital preseration support-and-coordination side. Professional associations of various kinds also have a role, since they are often concerned with the professional development of their members. There’s no need for a new organization to sprout up for such a single purpose, so it’d be great to see some kind of collaborative work happen on this front. Skills training, to my mind, is one of the most important things (especially alongside better resourcing of the work overall) required to start making digital preservation work a lived reality in archives. And maybe just a bit of training will decrease the anxiety and increase the joy that comes with preserving digital things so young Julia Stiles’ words in Ghostwriter no longer carry such a sting.

Back to posts