Piloting Data Science Learning Platforms through the Development of Cloud-based interactive Digital Computational Notebooks

Gnanasekaran, Rajesh Kumar; Marciano, Richard

doi:10.22323/1.378.0018

Abstract

Physical learning, communication, and collaboration have taken a colossal hit in 2019 and 2020 with cascading lockdowns resulting from the spread of the COVID-19 pandemic. Virtual access has become the need-of-the-hour, and the uses of cloud-based course content delivery, distance learning, and document collaboration are becoming increasingly ubiquitous. This paper introduces a novel method to allow students and faculty in the Humanities, Arts, and Social Sciences (HASS) to collaborate and interact through data analytical technologies using "interactive Digital Computational Notebooks" (iDCNs). We demonstrate this approach using a digitized Legacy of Slavery (LoS) archival dataset collection from the Maryland State Archives (MSA) and illustrate the socio-technical challenges in establishing this learning environment. We provide a step-by-step process involved in accessing, developing, and integrating different infrastructure elements. The LoS in Maryland is a major initiative of the MSA. The program seeks to preserve and promote the vast universe of experiences that have shaped the lives of Maryland’s African American population. Over the last 18 years, some 420,000 individuals have been identified, and data assembled into 16 major databases. These databases contain information unique to enslaved people’s lives, such as manumission records, certificates of freedom, census data, penitentiary records, etc. One of this paper’s primary objectives is to enable the digital representation of these culturally rich and sensitive collections ready to be analyzed and studied through contemporary scholars’ lenses. This project aims to achieve this goal by making these databases available and accessible so that users can generate individual stories, glean insights, and possibly recover “erased” memories of enslaved people. To achieve this goal, as a first step, unique dataset collections were prepared by downloading the databases and put through rigorous exploration, cleaning, and visualization process through coordination with interdisciplinary scholars composed of archivists, historians, computer scientists, and technology analysts. This project also illustrates the importance of a multidisciplinary approach to a unique set of digitized archival data with a specific focus on contextual aspects due to the data’s historical value and sensitivity. The collaborative process used open-source and readily accessible tools to create meaningful visualizations as an arrangement that flows together conducive for educators to teach. The visualizations use the spatial and temporal characteristics of the datasets to produce graphs and charts for a graphical view of the datasets. The visualizations constructed are responsive to present the data by instant connections to the datasets dynamically. The integration of these digital artifacts obtained from each dataset was carried out through Jupyter Notebooks (JNs). These iDCNs are unlike the traditional digital notebooks that provide a space for students to take notes and collect clippings of text. Instead, the iDCNs developed in this project are a novel set of educational tools that allow text and software code to co-exist and be rendered in a single document coherently for instructors and students to follow the text with visual representations back-to-back. The iDCNs are also equipped with live examples of basic natural language processing on certain text-rich features of these dataset collections. The open-source nature of this project’s setup and cloud-based distribution of these digital artifacts pave the way for students from under-served communities to take advantage of a unique way of learning and to perform hands-on work on marketable software tools, preparing them for a successful career. The contributions of this paper to the fields of HASS and other non-STEM (Science, Technology, Engineering, and Mathematics) backgrounds lie in the idea of providing an “always-on” cloud-based pedagogical environment for aspiring students and researchers worldwide to analyze, learn and unearth stories through a data science-driven approach on a cultural dataset, in our case, the LoS dataset collection.

Keywords: Computational Thinking, Interactive Digital Computational Notebooks, Computational Archival Science, Cloud-based digital learning