Review of Preserving Data Journalism: A Systematic Literature Review
By Cindy Royal, Professor
June 22, 2021
I have long studied the emergence of data journalism (Royal, 2012; Royal & Blasingame, 2015) and the proliferation of new roles to support technology in media organizations (Royal, 2017; Royal, et al., 2020). That is why I read with interest the Journalism Practice study on Preserving Data Journalism: A Systematic Literature Review (Heravi, et al., published online March 31, 2021). In this study, Bahareh Heravi of the School of Information and Communication Studies at College of Dublin; Kathryn Cassidy, Digital Repository of Ireland, Trinity College; Edie Davis, Trinity College; and Natalie Harrower, Digital Repository of Ireland, Royal Irish Academy, discuss the challenges associated with archiving data journalism projects. This holds relevance for scholars in terms of our ability to study this topic over time and how we address in curriculum, but holds significant importance for the professional organizations producing this type of work.
As the study notes, “without intervention, we will lose a crucial part of reporting and news history” (p. 2). The problem has mostly to do with the varying technologies used to create data visualizations. “The issue for sustained access is that the visualisations rely on digital infrastructures (code, apps, platforms) that are not being systematically preserved and sustained” (p. 2). These visualizations can be in the form of static images, but more likely, they are dynamically generated, interactive projects that rely on various technologies. The projects are more akin to software (or “complex digital objects”) than traditional news stories (“flat” or “simple” objects). “Dynamic data visualisations and news apps are essentially software, and their preservation, therefore, should follow methods for software preservation” (p. 2). This type of preservation requires funding and specific competencies that may not be available to media organizations. It is this integration of software development approaches in journalism that is the key to this study, and provides further validation for incorporating product management techniques in digital development.
The study identified several ways in which data visualizations come to be: third party services, in-house tools, content management systems and myriad other tools. Many visualizations use third-party services like DataWrapper, Flourish or Tableau for their creation, thus the party who created the visualization is removed from the organization that has the responsibility for its sustainability. The tool may not be consistently maintained by the provider, and it is likely to be updated in the future in ways that can deem past projects unworkable. This can happen with little-to-no notice to the media organization.
The authors used a systematic literature review to help identify best practices for preservation of these projects as software objects, as well as preservation of the datasets behind these projects. Using peer-reviewed academic sources and supported by articles in professional publications, the study identified challenges, methods of preservation and made recommendations.
The most commonly discussed challenges had to do with obsolescence and dependency, followed by lack of knowledge and complexity of objects. Other challenges included difficulty of preservation tasks, lack of collaboration, lack of guidance, legal issues, lack of support and cultural issues."The challenge of culture was particularly interesting, as it relates to power and positionality in media organizations in general. “The preservation of software is frequently seen as a secondary activity and one with limited usefulness” (Matthews et al. 2010, 92).
There were numerous preservation methods identified in the literature, including emulation, migration, documentation, taking snapshots, use of specific tools, archiving and maintenance, of projects, preserving metadata and the use of checksums to validate changes to the data.
There were also numerous recommendations identified, which are presented in two categories. “The first approach is to keep a working version of the visualisation available via methods such as emulation, migration, and virtual machines. The second approach attempts to capture a 'flat' or simplified version of the visualisation via methods such as snapshots, documentation, and metadata” (p. 12). The approach depends on the "significant properties of the object," or "those properties of digital objects that affect their quality, usability, rendering, and behaviour” (p. 12).
More specifically, the authors recommended an immediate solution for journalists to use a snapshot approach in which images are stored of the visualization demonstrating the significant properties. If an image or series of images were not satisfactory in capturing the story, they then proposed developing a series of images into an animated GIF or video to represent the functionality of the visualization.
Beyond these suggestions for journalists to preserve their work, there were broader recommendations associated with process. “Our recommendations below fall into several categories, and many of these require changes to workflows and investment into new policies, practices and technical solutions, and will therefore take time to pursue” (p. 15). These include guidance and education, infrastructures and tools, collaborations with third parties and resources and funding. The article further identified legal issues that may come into play, including copyright and licensing conditions.
Professionals have already recognized the urgency of data preservation techniques. Around the time I was reading this article, I saw this posted on the Twitter account of Scott Klein, deputy managing editor at ProPublica. He pointed out that many of the excellent data projects that were Online Journalism Award nominees in 2013 were no longer available or accessible as intended. “Now, many award winners are either completely gone from the Internet or hopelessly borked” (May 19, 2020). As these projects become more prevalent over time, it will be important to understand the conditions under which they are maintained and can be archived, which is the focus of this study.
In an email interview, lead author Heravi said her inspiration for the study was due to her own professional work. “The disappearance of data journalism output, particularly dynamic data visualisations, was one of the big reasons I started this work,” Heravi said. “These included content that I myself, or my colleagues, had created for the Irish news media, as well as some of the early examples of data journalism from the likes of Simon Rogers at the Guardian (now data editor at Google), that I have been using as a teaching resource in the historical context of data journalism.” She was further inspired by collaborations with her coauthors at the Digital Repository of Ireland, whose primary focus is on the preservation of digital content.
Heravi identified that part of the cultural challenge is the deadline-driven nature of journalism. “Unlike practices in the software community where backward compatibility is something you can’t not think about, as a practice, journalists do not tend to go back to their already published stories and maintain them as the technology changes,” Heravi said.
These findings also have implications for media education. “While teaching students about the latest technologies relevant to their programme, it is important to also teach them how to be prepared for the unknown changes that they will face in the future,” Heravi said. “in addition to teaching them the latest, we need to teach the students how to be the designers of future technological solutions in their fields, on an ongoing basis.”
Heravi advises journalists to be patient and give thought to preserving their data projects, hopefully encouraging it as a regular habit. “Don’t forget that the work you are producing today is part of the history being written,” Heravi said. “If it is lost, then so is part of the history.”
The full article has more detail on recommendations and is well worth your time. The article did not mention recommendations for future research, but I can see much potential in working directly with news organizations to better understand their preservation processes and challenges, as well as utilizing services like the Internet Archive to assess the extent to which certain projects changed or became unusable over time. There seems to also be commercial potential for a company to work with media organizations to improve archiving and preservation.
Heravi, B., Cassidy, K., Davis, E., & Harrower, N. (2021). Preserving Data Journalism: A Systematic Literature Review. Journalism Practice, 1-23.
Matthews, B., A. Shaon, J. Bicarregui, and C. Jones. 2010. “A Framework for Software Preservation.”International Journal of Digital Curation 5 (1): 91–105.
Royal, C. (2012). The journalist as programmer: A case study of the New York Times interactive news technology department. In International Symposium on Online Journalism (Vol. 2, No. 1, pp. 5-24).
Royal, C., & Blasingame, D. (2015). Data journalism: An explication. In International Symposium in Online Journalism (Vol. 5, No. 1, pp. 24-46).
Royal, C. (2017). Managing digital products in a newsroom context. ISOJ Journal (Vol. 7, pp. 45-66).
Royal, C., Bright, A., Pellizzaro, K., Belair-Gagnon, V., Holton, A. E., Vincent, S., Heider, D., Zielina, A & Kiesow, D. (2020). Product Management in Journalism and Academia. Journalism & Mass Communication Quarterly, 97(3), 597-616.