Main Image
Volume 293 - International Symposium on Grids and Clouds (ISGC) 2017 (ISGC2017) - Humanities, arts & social sciences applications
Data Science as a Foundation towards Open Data and Open Science: The Case of Taiwan Indigenous Peoples Open Research Data (TIPD)
J.P. Lin
Full text: pdf
Published on: 2017 December 06
Abstract
The research is an outcome of the joint research program by Academia Sinica and the Council of Indigenous Affairs in 2013-2017. The aims of this paper are threefold: (1) to demonstrate the methods of data science in constructing the Taiwan Indigenous Peoples (TIPs) Open Research Data database (TIPD, see http://TIPD.sinica.edu.tw, and https://osf.io/e4rvz/, identifiers: DOI 10.17605/OSF.IO/E4RVZ, ARK c7605/osf.io/e4rvz) based on Taiwan Household Registration (THR) administrative data; (2) to illustrate automated and semi-automated data processing as methods for constructing effective open data; and (3) to demonstrate appropriate utilization of “old-school” data formats such as multi-dimensional tables as an effective means to overcome legal and ethical issues. The research extracts valuable information embedded in micro data of THR and enriches the extracted information through the processes of cleaning, cleansing, crunching, reorganizing, and reshaping the source data. The data enrichment processes produce a number of data sets that contain no individual information but retain most of the source data information. The enriched data sets thus can be open to the public as open data. The open data are systematically constructed mainly in an automated and partly in a semi-automated way through the integration of optimized hardware, compiler & script programming languages, computing software, and system script languages. Major outputs of TIPD amount to 31,000 files in number, totaling around 79 GB in size. TIPD consists of three categories of open research data: (1) categorical data, (2) household structure and characteristics data, and (3) population dynamics data. The potential contributions of TIPD are moves from “closed” to “open” , from “the elite” to “the ordinary”, from “local” to “global”, and from “macro and static” to “micro and dynamic” research.
Keywords: big data, data science, open data, open science, TIPD, TIPs
DOI: https://doi.org/10.22323/1.293.0004
Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.