Sarah Elaine Bratt
- Assistant Professor, School of Information
- Member of the Graduate Faculty
Contact
- Richard P. Harvill Building, Rm. 409
- Tucson, AZ 85721
- sebratt@arizona.edu
Awards
- METSTI 2023 Best Paper Presentation Award
- The ASIS&T Special Interest Group for Metrics (SIG/MET) and the ASIS&T Special Interest Group for Scientific and Technical Information (SIG STI) sponsored by the International Center for the Study of Research (ICSR), Fall 2023
- George H. Davis Faculty Travel Fellowship
- University of Arizona Research Innovation & Impact (RII) George H. Davis Faculty Travel Fellowship, Spring 2023
- LSSTC Catalyst Fellowship, Social Science Fellow (declined)
- LSST Corporation funded by the John Templeton Foundation, Spring 2023
Interests
Research
Research data management, data science, library and information science, science of science and innovation, data curation, long-term data preservation, social studies of science
Teaching
information organization, feminist methodologies, research methods
Courses
2024-25 Courses
-
Directed Research
INFO 692 (Spring 2025) -
Honors Thesis
ECOL 498H (Spring 2025) -
Statistic Foundations Info Age
ISTA 116 (Spring 2025) -
Directed Research
INFO 692 (Fall 2024) -
Honors Thesis
ECOL 498H (Fall 2024) -
Organization/Information
LIS 515 (Fall 2024) -
Statistic Foundations Info Age
ISTA 116 (Fall 2024)
2023-24 Courses
-
Directed Research
INFO 692 (Spring 2024) -
Statistic Foundations Info Age
ISTA 116 (Spring 2024) -
Directed Research
INFO 692 (Fall 2023) -
Organization/Information
LIS 515 (Fall 2023) -
Statistic Foundations Info Age
ISTA 116 (Fall 2023)
2022-23 Courses
-
Capstone
INFO 698 (Spring 2023) -
Statistic Foundations Info Age
ISTA 116 (Spring 2023) -
Organization/Information
INFO 515 (Fall 2022) -
Organization/Information
LIS 515 (Fall 2022) -
Statistic Foundations Info Age
ISTA 116 (Fall 2022)
Scholarly Contributions
Journals/Publications
- Arora, R., Beattie, K., Bernholdt, D. E., Bratt, S. E., Godoy, W. F., Katz, D. S., Laguna, I., Maji, A. K., Mudafort, R. M., Rouson, D., Rubio-Gonzalez, C., Sukhija, N., Thakur, A. M., & Vahi, K. (2023). Giving RSEs a Larger Stage through the Better Scientific Software Fellowship. Computing in Science & Engineering, 24(5), 1-10. doi:10.1109/mcse.2023.3253847
- Bratt, S. (2023). ‘Routine Infrastructuring’: How Social Scientists Appropriate Resources to Deposit Qualitative Data to ICPSR and Implications for FAIR and CARE. Proceedings of the Association for Information Science and Technology, 60(1), 61-72. doi:10.1002/pra2.769
- Bratt, S., Langalia, M., & Nanoti, A. (2023). North-south scientific collaborations on research datasets: a longitudinal analysis of the division of labor on genomic datasets (1992-2021). Frontiers in Big Data, 6, 1054655.More infoCollaborations between scientists from the global north and global south (N-S collaborations) are a key driver of the ‘fourth paradigm of science’ and have proven crucial to addressing global crises like COVID-19 and climate change. However, despite their critical role, N-S collaborations on datasets are little understood. Science of science studies tend to rely on publications and patents to examine N-S collaboration patterns. To this end, the rise of global crises requiring N-S collaborations to produce and share data presents an urgent need to understand the prevalence, dynamics, and political economy of N-S collaborations on research datasets. In this paper, we employ a mixed methods case study research approach to analyze the frequency of and division of labor in N-S collaborations on datasets submitted to GenBank over 29 years (1992-2021). We find: (1) there is a low representation of N-S collaborations over the 29-year period. When they do occur, N-S collaborations display “burstiness” patterns, suggesting that N-S collaborations on datasets are formed and maintained reactively in the wake of global health crises such as infectious disease outbreaks; (2) The division of labor between datasets and publications is disproportionate to the global south in the early years, but becomes more overlapping after 2003.
- Godoy, W. F., Arora, R., Beattie, K., Bernholdt, D. E., Bratt, S. E., Katz, D. S., Laguna, I., Maji, A. K., Thakur, A. M., & Mudafort, R. M. (2023). Giving RSEs a Larger Stage through the Better Scientific Software Fellowship. Computing in Science & Engineering.More infoThe Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. The BSSwF’s vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). Case studies from several of the program’s participants illustrate the diverse ways the BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as the BSSwF can help recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations, and ideas for a larger audience.
- Qin, J., Bratt, S., Hemsley, J., Smith, A., & Liu, Q. (2023). A FAIR Data Ecosystem for Science of Science. Proceedings of the Association for Information Science and Technology, 60(1), 1107-1109. doi:10.1002/pra2.960
Proceedings Publications
- Bratt, S. E. (2023, oct). "Routine Infrastructuring": How Social Scientists Appropriate Resources to Deposit Qualitative Data to ICPSR and Implications for FAIR and CARE. In Proceedings of the Association for Information Science and Technology, 60, 61--72.More infoThis study develops a grounded theory of how social scientists facilitate qualitative data deposit and the impacts on making data FAIR and CARE. Drawing from 15 semi‐structured interviews with U.S. academic social science faculty who deposited data to ICPSR, I take a resource‐centric perspective to address the need for theorizing scientists' use of resources to bridge the gap between underspecified, heterogeneous data practices and repository requirements. The two primary contributions of the study are: First, the identification of three types of resources that social science faculty use to structure data deposit routines, namely: 1) bottom‐up, 2) top‐down, and 3) borrowed resources. Second, I import a theory from crisis informatics, ‘routine infrastructuring,’ to explain how social scientists deposit data to ICPSR. Results reveal that the resources social scientists use function as ostensive routines. I argue routine infrastructuring is not only a way to enact routines but also creates routines. Findings also show ‘in‐house’ resources have a mix of beneficial and negative impacts for data FAIR‐ and CARE‐ness. This study advances the small but growing body of literature that examines routine dynamics in research groups from a resource‐centric perspective to explain qualitative data deposit to research data repositories.
- Qin, J., Bratt, S. E., Hemsley, J., & Smith, A. O. (2023, 2023-07-03). Metadata Analytics: A Methodological Discussion. In International Society of Scientometrics and Informetrics (ISSI) 2023 Conference.More infoMetadata Analytics is a term used to describe a research field that utilizes quantitative methods and metadata for publications, patents, datasets, and other research entities to study science of science. Metadata analytics inherits the bibliometric and scientometric tradition while infusing novel data sources – metadata for datasets – to extend the traditional bibliometric and scientometric research. The large scale of metadata from scientific data repositories offers both opportunities and challenges in the quantitative study of science. This paper discusses the problems and opportunities that metadata analytics contends with from a methodological perspective. Using the authors’ experiences over the course of a multi-year metadata analytics project, the paper focuses on the subtle differences between methods and science (or means and end) that arise when conducting research in metadata analytics and, for the same reason, bibliometrics and scientometrics . Metadata analytics is both a methodology and a research field. The intertwining of methods and science in metadata analytics can create pitfalls for researchers. Steering clearly between the means and ends in metadata analytics is essential to produce good science.
- Bratt, S., & Smith, A. O. (2022). Evolutionary Archives: The Unlikely Comparison of GenBank and Know Your Meme. In IEEE Big Data.
Presentations
- Bratt, S. E. (2023, July). Detecting Invisible Labor in Scientific Communities: The Case of GenBank. NetSci 2023 Satellite Workshop. Vienna, Austria.
- Bratt, S. E. (2023, November). Making Qualitative Data ‘Machine Readable’: How Scientists “Fit” Data to Research Data Repositories in U.S. Social Sciences Research [and Impacts on Perceived Validity ] . Society for the Social Studies of Science (4S) Annual Meeting. Honolulu, Hawaii: Society for the Social Studies of Science (4S).More infoThis article examines whether and how scientists “fit” their data to open research data repositories and consequences for the perceived validity of qualitative data. We highlight the knowledge performances and infrastructures around qualitative data that stabilize notions of its "usefulness" across different contexts. Drawing from interviews with social scientists and genetics research faculty in U.S. academic institutions, we argue that scientists engage in “data fitting” practices to conform their data to repository requirements. When fitting practices are awkward or unnatural to scientists' routines they can be called “contortions” — the ungainly or inappropriate actions scientists take to align datasets with repository requirements. Although the methodological, epistemic, and ethical implications of data fitting strategies can be substantial, fitting decisions are largely field-specific. Fields with low consensus, such as social sciences, lack widespread best practices on how to fit data to repository requirements – and whether to fit data to repositories at all – to make data Finable, Interoperable, Accessible and Reusable (FAIR) (Wilkinson et al., 2016), in short “machine readable” – such that they can be reused or integrated into a ‘big data collection’ (Leonelli, 2019).
- Bratt, S. E., Gomez, C. J., Lee, J., Langalia, M., Nanoti, A., & Leahey, E. E. (2023, June). Division of Labor in Data-Intensive Science: Implications for Innovation and Equity. 2nd International Conference of Science of Science & Innovation (ICSSI). Kellogg Global HUB, Northwestern University, Evanston, IL, USA: Digital Science.More infoIn this paper, we systematically analyze the international division of labor on 1.2 million datasets submitted to GenBank over 29 years (1992-2021). GenBank [1] is an international open research data repository for the genomics community hosted by NCBI – and through which the Human Genome Project was conducted and COVID-19 sequences submitted – mak- ing it an ideal site to analyze the global distribution of labor on datasets. To classify countries, we use the the World Bank Income Classification [2] and a newer measure, the Scientific and Technical Capacity Index (STCI) [7], nuancing the binary of N-S. We analyzed the yearly struc- tures and dynamics of the division of N-S division of labor on genomic datasets by calculating the ratio of overlap of scientists appearing as (co)contributors to the dataset and on the dataset’s associated publication(s), inferring that a higher overlap is indicative of “coreness” in flat teams [8]. Coreness is indicative that the dataset submitter is more ‘core’ to the project, indicating the technical labor on a project is drawn into the intellectual center of the study. We find: (1) Scientists from the global south tend to be listed as datasets contributors more often that of global north researchers. Overlap increases overall, but there remain dis- tinct functional roles; that is, 40 percent of scientists are only dataset contributors. This finding is surprising given prior studies reporting the lack of infrastructures to produce and curate data in low income or scientifically developing countries. However, it could be that contribution is explained by the high frequency of N-S collaborations in genomics research on infectious diseases [5], leading to southern scientists being equipped to collect and submit datasets. (2) We identify a positive relationship between the “flatness” of a team and southern scientists leading or last author on the publication.
Poster Presentations
- Bratt, S. E., Buchanan, S., Honick, B., & Gala, B. (2023, June). Invisible Data Communities: Detecting Scientific Communities Based on Dataset Affinity Networks. 2nd International Conference of Science of Science & Innovation (ICSSI). Kellogg Global HUB, Northwestern University, Evanston, IL, USA: Digital Science, Alfred P. Sloan Foundation, AFOSR, Northwestern Kellogg School of Management.More infoIn this paper, we analyzed patterns of communities defined outside of conventional com- munity detection using an affinity network approach. We identify a tripartite network of links between (1) scientists co-authoring datasets, (2) taxonomic classifications, and (3) journals to surface often invisible affinity networks based on dataset properties. We use GenBank datasets’ bibliographic metadata (e.g., author names, journal name, publication title, year published) and link them to the NCBI Taxonomy database which connects the bibliographic metadata to bio- logical metadata about the dataset. The biological metadata describes attributes of the sequence (e.g., mRNA/DNA) with information about the organism from which the sample was taken, and the taxonomic classification of the organism. For instance, a mouse genome sequence used for an experiment on influenza would have taxonomic tags for mus musculus and influenza.We demonstrate three novel ways to computationally reimagine scientific communities with a novel data source for studying the data-intensive scientific enterprise: GenBank repository metadata. We define communities according to the taxonomic lineage of the datatset the sci- entist submits to GenBank, the collaboration network on datasets, and the journal + taxon combination of the dataset submitted. We compare these novel ways to model communities to conventional theoretical and computational approaches to community detection, and reflect on the implications for how they can inform collaboration recommendation systems, academic library collection development, and science policy.
- Bratt, S. E., Gomez, C. J., Devitt, W., Langalia, M., Lee, J., & Leahey, E. E. (2023, June). North-South Collaborations on Scientific Datasets: A Longitudinal Exploration (1992-2021). 2nd International Conference of Science of Science & Innovation. Kellogg Global HUB, Northwestern University, Evanston, IL, USA: Digital Science.More infoIn this paper, we systematically analyze the frequency of N-S collaborations on approx- imately 1.2 million sequences submitted to GenBank over 29 years (1992-2021). GenBank [2] is an international open research data repository for the genomics community hosted by NCBI, and in which the Human Genome Project sequences were shared and infectious disease sequences submitted (including COVID-19) making GenBank an ideal site to analyze N-S col- laborations on datasets. To classify countries we use the World Bank Income Classification [4] and the Scientific and Technical Capacity Index (STCI) [11]. We find: (1) datasets are disproportionately produced by the global north, but there is a higher rate of collaborations between nations with discrepant S&T capacity on datasets over time. The preponderance of the datasets submitted are domestic collaborations, but where there is international collaborations, over 89 percent are collaborations among scientifically advanced countries. The N-S collaborations networks demonstrate “burstiness” in their forma- tion and dissolution [5], suggesting scientific reactivity to outbreaks of infectious disease (e.g. HIV/AIDs) and ad hoc influx of resources to build capacity in southern scientists’ institutions (see Figure 1). (2) The classification indices commonly used to characterize the global north and south at a national level are incompatible revealing a need for composite mea- sures to nuance the N-S binary. The S&T capacity index [11] to the need for measures that capture the multi-faceted nature of the N-S political economy [1, 7], where S&T capacity and income measures are not interchangeable. For instance, United Arab Emirates is classified as a High Income Country (HIC) by the World Bank income classification, but as a Scientifically Lagging Country (SLC) by the parameters of the S&T index.
- Bratt, S. E., Kingsley, S., Thomas, E., & Flores, J. (2023). Speculative Design Thinking in iSchool Education: Comparing Borges' Library of Babel and Bush's Memex to Surface Values in the Design of Organizing Systems. iConference 2023 Proceedings.More infoDesign thinking is critical in information science practiceand education. However, we lack applied approaches to implement de-sign thinking in iSchool graduate courses to elicit the values implicitin technologies. To address this gap, this poster presents a preliminaryspeculative design study with students that compares Vannevar Bush’smemex and Jorge Luis Borges’ Library of Babel as “visions of organiz-ing systems” as an applied approach to implementing design thinking iniSchool education. Drawing on experiences across two iSchool courses,we describe a speculative design approach for identifying values in orga-nizing systems. Second, we describe an analytic schema developed fromdesign sessions that used the scenario of a “modern memex” to surfacethe values encoded in modern information landscapes (e.g., misinforma-tion). We argue that analyzing the “visions of organizing systems” artic-ulated in speculative texts enables students to identify values implicit ininformation technologies. We conclude with recommendations for usingspeculative texts in iSchool education and practice.
- Qin, J., Bratt, S. E., Hemsley, J., Smith, A., & Liu, Q. (2023). A FAIR Data Ecosystem for Science of Science. Proceedings of the Association for Information Science and Technology.More infoThis poster discusses Automated Research Workflows (ARWs) in the context of a FAIR data ecosystem for the science of science research. We offer a conceptual discussion from the point of view of information science and characteristics and expectations for designers and developers of a FAIR data ecosystem. Drawing from a 10-year data science project developing GenBank metadata workflows, we incorporate the ideas of ARWs into the FAIR data ecosystem discussion to set a broader context and increase generalizability. Researchers can use these as a guide for their data science projects to automate research workflows in the science of science domain and beyond.