Bryan Heidorn
- Associate Dean, Research and Graduate Academic Affairs
- Professor, School of Information
- Member of the Graduate Faculty
Contact
- (520) 621-3565
- Richard P. Harvill Building, Rm. 453E
- Tucson, AZ 85721
- heidorn@arizona.edu
Degrees
- Ph.D. Information Science
- University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- : Natural Language Understanding for Image Retrieval: Botanical texts
Awards
- MBL Speaker Award March 2010
- What type of organization made the award?: George Frederick Jewett Foundation ;Description: Details in Presentation section;, Fall 2010
Interests
Research
Biodiversity informatics, science cyberinfrastructure
Teaching
Information retrieval, Natural language processing, Information research methods
Courses
2023-24 Courses
-
Capstone
INFO 698 (Fall 2023) -
Directed Research
INFO 692 (Fall 2023) -
Foundations of Information
INFO 505 (Fall 2023)
2022-23 Courses
-
Capstone
INFO 698 (Summer I 2023) -
Capstone
INFO 698 (Spring 2023) -
Independent Study
INFO 699 (Spring 2023) -
Foundations of Information
INFO 505 (Fall 2022) -
Independent Study
INFO 699 (Fall 2022) -
Rsrch Mth/Libr+Info Prof
LIS 506 (Fall 2022)
2021-22 Courses
-
Capstone
INFO 698 (Spring 2022) -
Graduate Seminar
INFO 696E (Spring 2022) -
Internship
INFO 493 (Spring 2022) -
Independent Study
INFO 499 (Winter 2021) -
Foundations of Information
INFO 505 (Fall 2021) -
Internship
INFO 493 (Fall 2021) -
Rsrch Mth/Libr+Info Prof
LIS 506 (Fall 2021)
2020-21 Courses
-
Rsrch Mth/Libr+Info Prof
LIS 506 (Summer I 2021)
2019-20 Courses
-
Dissertation
INFO 920 (Spring 2020) -
Information Research Methods
INFO 507 (Spring 2020) -
Dissertation
INFO 920 (Fall 2019)
2018-19 Courses
-
Dissertation
INFO 920 (Summer I 2019) -
Directed Research
INFO 692 (Spring 2019) -
Dissertation
INFO 920 (Spring 2019) -
Rsrch Mth/Libr+Info Prof
LIS 506 (Spring 2019) -
Dissertation
INFO 920 (Fall 2018) -
Foundations of Information
INFO 505 (Fall 2018)
2017-18 Courses
-
Dissertation
INFO 920 (Spring 2018) -
Rsrch Mth/Libr+Info Prof
LIS 506 (Spring 2018) -
Dissertation
LIS 920 (Fall 2017) -
Foundations of Information
INFO 505 (Fall 2017) -
Research
LIS 900 (Fall 2017)
2016-17 Courses
-
Dissertation
LIS 920 (Spring 2017) -
Information Research Methods
INFO 507 (Spring 2017) -
Research
LIS 900 (Spring 2017) -
Rsrch Mth/Libr+Info Prof
LIS 506 (Spring 2017) -
Dissertation
LIS 920 (Fall 2016) -
Foundations of Information
INFO 505 (Fall 2016)
2015-16 Courses
-
Capstone
LIS 698 (Summer I 2016) -
Collaborating: Online Commun
ESOC 211 (Summer I 2016) -
Dissertation
LIS 920 (Summer I 2016) -
Internship
LIS 693 (Summer I 2016) -
Preservation
LIS 541 (Summer I 2016) -
Dissertation
LIS 920 (Spring 2016) -
Internship
ISTA 493 (Spring 2016) -
Internship
LIS 693 (Spring 2016) -
Research
LIS 900 (Spring 2016)
Scholarly Contributions
Books
- Sandore, B., & Heidorn, P. B. (1997). Digital Image Access & Retrieval.. Publications Office, Graduate School of Library and Information Science, The University of Illinois at Urbana-Champaign, 501 E. Daniel St., Champaign, IL 61820-6211 ($30 plus $3 shipping and handling)..More infoEditors, Proceedings of theClinic on Library Applications of Data Processing
Journals/Publications
- , G. R., & , P. B. (2020). Mapping the "long tail" of research funding: A topic analysis of NSF grant proposals in the Division of Astronomical Sciences.More info"Long tail" data are considered to be smaller, heterogeneous, researcher-helddata, which present unique data management and scholarly communicationchallenges. These data are presumably concentrated within relativelylower-funded projects due to insufficient resources for curation. To betterunderstand the nature and distribution of long tail data, we examine NationalScience Foundation (NSF) funding patterns using Latent Dirichlet Analysis (LDA)and bibliographic data. We also introduce the concept of "Topic Investment" tocapture differences in topics across funding levels and to illuminate thedistribution of funding across topics. This study uses the discipline ofastronomy as a case study, overall exploring possible associations betweentopic, funding level and research output, with implications for research policyand practice. We find that while different topics demonstrate different fundinglevels and publication patterns, dynamics predicted by the "long tail"theoretical framework presented here can be observed within NSF-funded topicsin astronomy.[Journal_ref: ]
- , G. R., , P. B., & , J. S. (2020). The Astrolabe Project: Identifying and Curating Astronomical Dark Data through Development of Cyberinfrastructure Resources. EPJ Web Conf..More infoAs research datasets and analyses grow in complexity, data that could bevaluable to other researchers and to support the integrity of published workremain uncurated across disciplines. These data are especially concentrated inthe Long Tail of funded research, where curation resources and relatedexpertise are often inaccessible. In the domain of astronomy, it is undisputedthat uncurated dark data exist, but the scope of the problem remains uncertain.The Astrolabe Project is a collaboration between University of Arizonaresearchers, the CyVerse cyberinfrastructure environment, and AmericanAstronomical Society, with a mission to identify and ingestpreviously-uncurated astronomical data, and to provide a robust computationalenvironment for analysis and sharing of data, as well as services for authorswishing to deposit data associated with publications. Following expert feedbackobtained through two workshops held in 2015 and 2016, Astrolabe is funded inpart by National Science Foundation. The system is being actively developedwithin CyVerse, and Astrolabe collaborators are soliciting heterogeneousdatasets and potential users for the prototype system. Astrolabe team membersare currently working to characterize the properties of uncurated astronomicaldata, and to develop automated methods for locating potentially-useful data tobe targeted for ingest into Astrolabe, while cultivating a user community forthe new data management system.[Journal_ref: EPJ Web Conf. 186 03003 (2018)]
- , P. B., , G. R., & , J. S. (2018). Astrolabe: Curating, Linking and Computing Astronomy's Dark Data. ApJS (.More infoWhere appropriate repositories are not available to support all relevantastronomical data products, data can fall into darkness: unseen and unavailablefor future reference and re-use. Some data in this category are legacy or olddata, but newer datasets are also often uncurated and could remain "dark". Thispaper provides a description of the design motivation and development ofAstrolabe, a cyberinfrastructure project that addresses a set of communityrecommendations for locating and ensuring the long-term curation of dark orotherwise at-risk data and integrated computing. This paper also describes theoutcomes of the series of community workshops that informed creation ofAstrolabe. According to participants in these workshops, much astronomical darkdata currently exist that are not curated elsewhere, as well as software thatcan only be executed by a few individuals and therefore becomes unusablebecause of changes in computing platforms. Astronomical research questions andchallenges would be better addressed with integrated data and computationalresources that fall outside the scope of existing observatory and space missionprojects. As a solution, the design of the Astrolabe system is aimed atdeveloping new resources for management of astronomical data. The project isbased in CyVerse cyberinfrastructure technology and is a collaboration betweenthe University of Arizona and the American Astronomical Society. Overall theproject aims to support open access to research data by leveraging existingcyberinfrastructure resources and promoting scientific discovery by makingpotentially-useful data in a computable format broadly available to theastronomical community.[Journal_ref: ApJS (2018), 236.1, 3]
- Heidorn, P. B., Stahlman, G. R., & Steffen, J. (2018). Astrolabe: Curating, Linking and Computing Astronomy's Dark Data. Astrophysics Journal.More infoWhere appropriate repositories are not available to support all relevant astronomical data products, data can fall into darkness: unseen and unavailable for future reference and re-use. Some data in this category are legacy or old data, but newer datasets are also often uncurated and could remain "dark". This paper provides a description of the design motivation and development of Astrolabe, a cyberinfrastructure project that addresses a set of community recommendations for locating and ensuring the long-term curation of dark or otherwise at-risk data and integrated computing. This paper also describes the outcomes of the series of community workshops that informed creation of Astrolabe. According to participants in these workshops, much astronomical dark data currently exist that are not curated elsewhere, as well as software that can only be executed by a few individuals and therefore becomes unusable because of changes in computing platforms. Astronomical research questions and challenges would be better addressed with integrated data and computational resources that fall outside the scope of existing observatory and space mission projects. As a solution, the design of the Astrolabe system is aimed at developing new resources for management of astronomical data. The project is based in CyVerse cyberinfrastructure technology and is a collaboration between the University of Arizona and the American Astronomical Society. Overall the project aims to support open access to research data by leveraging existing cyberinfrastructure resources and promoting scientific discovery by making potentially-useful data in a computable format broadly available to the astronomical community.
- Brooks, C. F., Heidorn, P. B., Stahlman, G. R., & Chong, S. S. (2016). Working beyond the confines of academic discipline to resolve a real-world problem: A community of scientists discussing long-tail data in the cloud. First Monday, 21 (2).. First Monday, 21 (2)..
- Brooks, C. F., Heidorn, P. B., Stahlman, G. R., & Chong, S. S. (2016). Working beyond the confines of academic discipline to resolve a real-world problem: A community of scientists discussing long-tail data in the cloud.. First Monday, 21(2).
- Anglin, R., Best, J., Figueiredo, R., Gilbert, E., Gnanasambandam, N., Gottschalk, S., Haston, E., Heidorn, P. B., Lafferty, D., Lang, P., & others, . (2013). Improving the Character of Optical Character Recognition (OCR): iDigBio Augmenting OCR Working Group Seeks Collaborators and Strategies to Improve OCR Output and Parsing of OCR Output....
- Heidorn, P. B., & Zhang, Q. (2013). Label Annotation through Biodiversity Enhanced Learning.
- Paul, D. L., & Heidorn, P. B. (2013). Augmenting optical character recognition (OCR) for improved digitization: Strategies to access scientific data in natural history collections.
- Paul, D. L., Heidorn, P. B., Best, J., Gilbert, E., Neill, A., Nelson, G., & Ulate, W. (2013). Help iDigBio reveal hidden data: iDigBio Augmenting OCR working group needs you.
- Heidorn, P. B. (2011). Biodiversity Informatics. Bulletin of the American Society of Information Science and Technology.More infoOverview of the fieldhttp://www.asis.org/Bulletin/Aug-11/AugSep11_Heidorn.html;Your Role: sole author;Full Citation: Heidorn, P. Bryan (2011). Biodiversity Informatics. Bulletin of the American Society of Information Science and Technology August/September 2011. (http://www.asis.org/Bulletin/Aug-11/AugSep11_Heidorn.html);Electronic: Yes;
- Heidorn, P. B. (2011). Biodiversity informatics. Bulletin of the American Society for Information Science and Technology, 37(6), 38--44.
- Heidorn, P. B. (2011). The Emerging Role of Libraries in Data Curation and E-science. Journal of Library Administration.More info;Your Role: sole author;Full Citation: Heidorn, P. Bryan (2011). "The Emerging Role of Libraries in Data Curation and E-science." Journal of Library Administration 51, no. 7-8 (2011): 662-672. http://www.tandfonline.com/doi/abs/10.1080/01930826.2011.601269.;
- Heidorn, P. B. (2011). The Emerging Role of Libraries in data curation and e-science. Journal of Library Administration, 51(7-8), 662-672.More infoIn edited volume by Carla Stoffle;
- Heidorn, P. B. (2011). The emerging role of libraries in data curation and e-science. Journal of Library Administration, 51(7-8), 662--672.
- Cheng, J., Hu, X., & Heidorn, P. B. (2010). New measures for the evaluation of interactive information retrieval systems: Normalized task completion time and normalized user effectiveness. Proceedings of the American Society for Information Science and Technology, 47(1), 1--9.
- Heidorn, P. B., & Olson, A. (2010). The National Biological Information Infrastructure. Taylor and Francis Group.More infoEncyclopedia Article.;Your Role: First Author;Full Citation: Heidorn, P. Bryan and Annette Olson (2010). The National Biological Information Infrastructure. In (Eds.) Marcia J. Bates and Mary Niles Maack. The Encyclopedia of Library and Information Science. ;Other collaborative: Yes;Specify other collaborative: NBII Staff;
- Wei, Q., Heidorn, P. B., & Freeland, C. (2010). Name Matters: Taxonomic Name Recognition (TNR) in Biodiversity Heritage Library (BHL).
- Cragin, M. H., Smith, L. C., Palmer, C. L., & Heidorn, P. B. (2009). Extending the data curation curriculum to practicing LIS professionals. Proceedings of DigCCurr2009 Digital Curation: Practice, Promise and Prospects, 92.
- Heidorn, P. B. (2008). Shedding light on the dark data in the long tail of science. Library Trends, 57(2), 280--299.
- Smith, L. C., Cragin, M. H., Palmer, C. L., MacMullen, W. J., & Heidorn, P. B. (2008). Data Curation Education.
- Tang, X., & Heidorn, P. B. (2008). Improving information access to digital botanical collection by allowing users to search with domain knowledge. Proceedings of the American Society for Information Science and Technology, 45(1), 1--13.
- Tang, X., Heidorn, P. B., & Heidorn, B. (2008). The loss of domain knowledge in user search queries: A query log analysis of a botanical retrieval system. Proceedings of the American Society for Information Science and Technology, 44(1), 1-5. doi:10.1002/meet.14504403111More infoThis study analyzed the user search queries of an experiment conducted on a full-text botanical retrieval system. Search terms were extracted and categorized. Term frequencies in each category were calculated. Search terms were compared with target documents. The results reveal preferences for the kinds of information used in search queries, that search queries and document representations use very different format and vocabulary, and that the domain knowledge used in user search queries is lost when such queries are interpreted by the system.
- Wei, Q., & Heidorn, P. B. (2008). Automatic Metadata Extraction Using Machine Learning.
- Wei, Q., & Heidorn, P. B. (2008). Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study.
- Wei, Q., Freeland, C., & Heidorn, P. B. (2008). Taxonomic Name Recognition in Biodiversity Heritage Library.
- Bertram, B., Bishop, A. P., Heidorn, P. B., & Lunsford, K. J. (2007). The Inquiry Page. Collaboration, 45.
- Cragin, M. H., Heidorn, P. B., Palmer, C. L., & Smith, L. C. (2007). An educational program on data curation.
- Cui, H., & Heidorn, P. B. (2007). The reusability of induced knowledge for the automatic semantic markup of taxonomic descriptions. Journal of the American Society for Information Science and Technology, 58(1), 133--149.
- Heidorn, P. B., Palmer, C. L., & Wright, D. (2007). Biological information specialists for biological informatics. Journal of biomedical discovery and collaboration, 2(1), 1.
- Heidorn, P. B., Palmer, C. L., Cragin, M. H., & Smith, L. C. (2007). Data curation education and biological information specialists.
- Heidorn, P. B., Tobbo, H. R., Choudhury, G. S., Greer, C., & Marciano, R. (2007). Identifying best practices and skills for workforce development in data curation. Proceedings of the American Society for Information Science and Technology, 44(1), 1--3.
- Palmer, C. L., Cragin, M. H., Heidorn, P. B., & Smith, L. C. (2007). Data curation for the long tail of science: The case of environmental sciences. Third International Digital Curation Conference, Washington, DC.
- Palmer, C. L., Cragin, M. H., Heidorn, P. B., & Smith, L. C. (2007). Studies of data curation for the long tail of science. 3rd International Digital Curation Conference, Washington, DC, Digital Curation Center. Retrieved from http://www. dcc. ac. uk/events/dcc-2007/on, 28, Poster.
- Palmer, C., Cragin, M., Heidorn, P., & Smith, L. (2007). Data curation for the long tail of science: The case of environmental studies. 3rd International Digital Curation Conference, Washington, DC. Retrieved from https://apps. lis. uiuc. edu/wiki/download/attachments/32666/Palmer\_DCC2007. rtf.
- Tang, X., & Heidorn, P. (2007). Using automatically extracted information in species page retrieval. Proceedings of TDWG 2007.
- Bishop, A. P., Bruce, B. C., Lunsford, K. J., Jones, M. C., Nazarova, M., Linderman, D., Won, M., Heidorn, P. B., Ramprakash, R., & Brock, A. (2006). Supporting community inquiry with digital resources. Journal of Digital Information, 5(3).
- Catapano, T., Agosti, D., Sautter, G., Koning, D., Boehm, K., Johnson, N. F., Heidorn, P. B., Moritz, T. D., Sarkar, I. N., & Stephenson, C. (2006). TaxonX: A lightweight and flexible xml schema for mark-up of taxonomic treatments. Proceedings of Annual Meeting of Taxonomic Data Working Group.
- Palmer, C. L., Cragin, M. H., Heidorn, P. B., & Wright, D. T. (2006). Supporting biological information work: research and education for digital resources and long-lived data.
- Greenberg, J., Heidorn, P. B., Seiberling, S., & Weakley, A. S. (2005). Growing vocabularies for plant identification and scientific learning. International Conference on Dublin Core and Metadata Applications, pp--99.
- Hagedorn, G., Thiele, K., Morris, R., & Heidorn, P. B. (2005). The Structured Descriptive Data (SDD) w3c-xml-schema, version 1.0. Biodiversity Information Standards (TDWG), http://www. tdwg. org/standards/116.
- Bishop, A. P., Brock, A., Bruce, B. C., Crump, E., Heidorn, P. B., Jones, M. C., Linderman, D., Lunsford, K. J., Nazarova, M., Palmer, B., & others, . (2004). Community Inquiry Labs as Effective Community Informatics Tools.
- Heidorn, P. (2004). A comparison of biodiversity informatics and neuroinformatics, Part 2. Bulletin of the American Society of Information Science \& Technology, 30(2).
- Bruce, B. C., Bishop, A. P., Heidorn, P. B., Lunsford, K. J., Poulakos, S., & Won, M. (2003). The inquiry page: Bridging digital libraries to learners. Knowledge Quest, 31(3), 15--17.
- Heidorn, P. B., & Deem, L. (2003). OpenKey.
- Heidorn, P., & Deem, L. (2003). OpenKey: Illinois-North Carolina Collaborative Environment for Botanical Resources. First Monday, 8(5).
- Bruce, B. C., Lunsford, K. J., Bishop, A. P., Won, M., Heidorn, P. B., & Poulakos, S. (2002). The inquiry page: Learning with digital libraries. world, 1, 2.
- Heidorn, P. B., Mehra, B., & Lokhaiser, M. F. (2002). Complementary user-centered methodologies for information seeking and use: System's design in the Biological Information Browsing Environment (BIBE). Journal of the American Society for Information Science and Technology, 53(14), 1251--1258.
- Heidorn, P. B. (2001). A tool for multipurpose use of online Flora and Fauna. First Monday, 6(2).
- Heidorn, P. B. (2001). A tool for multipurpose use of online flora and fauna: The Biological Information Browsing Environment, BIBE. First Monday, 6(2).
- Heidorn, P. B., & Cui, H. (2000). The Interaction of Result Set Display Dimensionality and Cognitive Factors in Information Retrieval Systems. PROCEEDINGS OF THE ANNUAL MEETING-AMERICAN SOCIETY FOR INFORMATION SCIENCE, 37, 258--270.
- Heidorn, P. B., Sandore, B., Calarco, P. V., & Self, P. C. (2000). REVIEWS-Digital Image Access \& Retrieval. Library Quarterly, 70(4), 510--512.
- BRYANHEIDORN, P. (1999). Image retrieval as linguistic and nonlinguistic visual model matching. Library Trends, 48(2), 303--325.
- Heidorns, P. (1999). The Identification of Index Terms in Natural Language Object Description. PROCEEDINGS OF THE ANNUAL MEETING-AMERICAN SOCIETY FOR INFORMATION SCIENCE, 36, 472--481.
- Heidorn, P. B. (1998). Book Review: The Perception of Visual Information (2nd ed.) by W. Hendree and P. Wells.. Information Processing and Management, 34, 498-499.
- Heidorn, P. B. (1998). Prototypes and Idealizations in Natural Language Shape Descriptions.. Proceedings of the ASIS Annual Meeting, 35, 549--58.
- Heidorn, P. B. (1998). University ofIllinois, Urbana-Champaign. Representation and Processing of Spatial Expressions, 112.
- Heidorn, P. B., Zhang, J., & Sun, H. (1998). Evaluation of Thesauri for automatic query expansion and searching within document structure.
- Hurt, C., Aoe, J., Aoe, J., Arampatzis, A., Barry, C., Bateman, J., Brooks, T., Buckland, M., Byrne, J., Cantero, P., & others, . (1998). Huber, JC, 471.
- Lavagnino, M. B., Bowker, G. C., Heidorn, P. B., & BASI, M. M. (1998). Incorporating social informatics into the curriculum for library and information science professionals. Libri, 48(1), 13--25.
- Heidorn, P. B. (1997). Natural language processing of visual language for image storage and retrieval.
- Heidorn, P. B., & Sandore, B. (1997). Digital image access \& retrieval.
- Rayward, W. B., Burke, C., Hahn, T. B., Bowker, G. C., Buckland, M., Richards, P. S., Gluck, M., O'Kane, K. C., Mosley, P. A., Krikelas, J., & others, . (1997). List of Contents.
- Heidorn, P. B. (1996). 33d annual UIUC clinic highlights digital image storage and retrieval: overview. Library Hi Tech News, 1--2.
- Heidorn, P. B. (1994). Automatic Content Indexing of Image Data Bases.
- Heidorn, P. B., & Hirtle, S. C. (1993). Is spatial information imprecise or just coarsely coded?. Behavioral and Brain Sciences, 16(02), 246--247.
- Hirtle, S. C., & Heidorn, P. B. (1993). The structure of cognitive maps: Representations and processes. Advances in psychology, 96, 170--192.
- Lewis, C. M., & Heidorn, P. B. (1991). Identifying tacit strategies in aircraft maneuvers. Systems, Man and Cybernetics, IEEE Transactions on, 21(6), 1560--1571.
Proceedings Publications
- Heidorn, P. B., Cragin, M. H., Smith, L. C., & Palmer, C. L. (2009). Extending the data curation curriculum to practicing LIS professionals.More info;Your Role: Co-Author/Presenter;Other collaborative: Yes;Specify other collaborative: Faculty members at University of Illinois;
- Heidorn, P. B., Cragin, M. H., Smith, L. C., & Palmer, C. L. (2010). Extending the data curation curriculum to practicing LIS professionals.More info;Your Role: Co-Author/Presenter;Other collaborative: Yes;Specify other collaborative: Faculty members at University of Illinois;
- Stahlman, G. R., Heidorn, P. B., & Steffen, J. (2018, spring). The Astrolabe Project: Identifying and curating astronomical ‘dark data’ through development of cyberinfrastructure resources.. In Proceedings of Library and Information Services in Astronomy (LISA) VIII.
- Brooks, C. F., & Heidorn, P. B. (2016, Summer). The language of biodiversity informatics: Students identifying with science. In Convention of the International Association of Language and Social Psychology.More infoBrooks, C. F., and Heidorn, B. P. (2016). The language of biodiversity informatics: Students identifying with science. Paper submitted to the Science Communication Task Force (theme: Using the science of language to identify and address conflicts in the language of science) for presentation at the convention of the International Association of Language and Social Psychology, June 22-25, Bangkok, Thailand.
- Brooks, C. F., & Heidorn, P. B. (2016, summer). The language of biodiversity informatics: Students identifying with science. In International Association of Language and Social Psychology.
- Brooks, C. F., Heidorn, P. B., Stahlman, G. R., & Chong, S. (2015, Fall). Discourses, a community of scientists, and long-tail data in the cloud. In annual meeting of the Association of Internet Researchers.
- Heidorn, P. B., Stahlman, G., Chong, S. S., Stahlman, G. R., & Heidorn, P. B. (2015, March 24-27). Datasphere at the Biosphere II: Computation and data in the wild. In iConference Newport Beach, CA, , 2015.More infoBiological Field Stations provide a unique set of opportunities and challenges for digital curation. The stations serve as the center of short-term and long-term biological research, from biomolecular-scale to ecosystems-scale research. They represent some of the last remaining “natural” areas in certain regions. Stations provide unique information about local biotic and abiotic conditions. Data shared among the stations support continental scale and global research initiatives. The stations themselves support a large number of researchers who often come from multiple universities and other research and teaching institutions around the world. Because of this decentralized user base, it is particularly difficult for stations to capture data and other research products generated by research at the stations. The authors, part of a larger NSF-funded project, conducted a survey of field station researchers and then held a two-day workshop to identify challenges and opportunities for “grand challenge” research questions that could be enabled through development of cyberinfrastructure. We were particularly interested in “long-tail” data (Heidorn, 2008), which refers to large numbers of smaller datasets rather than only the large collections of homogeneous data frequently associated with “big data”. The information gathered through this study will inform future proposals for cyberinfrastructure development.
- Anglin, R., Best, J., Figueiredo, R., Gilbert, E., Gnanasambandam, N., Gottschalk, S., Haston, E., Heidorn, P. B., Lafferty, D., Lang, P., Nelson, G., Paul, D., Ulate, W., Watson, K., & Qianjin, Z. (2013, Fabruary). Improving the Character of Optical Character Recognition (OCR): iDigBio Augmenting OCR Working Group Seeks Collaborators and Strategies to Improve OCR Output and Parsing of OCR Output. In iConference, 957-964.
- Heidorn, P. B. (2013, February). Label Annotation through Biodiversity Enhanced Learning. In iConferece, 882-884.
- Anglin, R., Best, J., Figueiredo, R., Gilbert, E., Gnanasambandam, N., Gottschalk, S., Haston, E., Heidorn, P. B., Lafferty, D., Lang, P., Nelson, G., Paul, D. L., Ulate, W., Watson, K., & Zhang, Q. (2012). Improving the Character of Optical Character Recognition (OCR): iDigBio Augmenting OCR Working Group Seeks Collaborators and Strategies to Improve OCR Output and Parsing of OCR Output.More info;Your Role: Co-Author and organizer;Full Citation: Anglin, Robert; Best, Jason; Figueiredo, Renato; Gilbert, Edward; Gnanasambandam, Nathan; Gottschalk, Stephen; Haston, Elspeth; Heidorn, P. Bryan; Lafferty, Daryl; Lang, Peter; Nelson, Gil; Paul, Deborah L.; Ulate, William; Watson, Kimberly; Zhang, Qianjin (2013). “Improving the Character of Optical Character Recognition (OCR): iDigBio Augmenting OCR Working Group Seeks Collaborators and Strategies to Improve OCR Output and Parsing of OCR Output.” iConference 2013 Proceedings (pp.957-964).doi:10.9776/13493. Poster and Short Paper http://hdl.handle.net/2142/42089;Electronic: Yes;Collaborative with graduate student: Yes;Other collaborative: Yes;Specify other collaborative: Many Universities and Museums;Type of Publication: Poster;
- Heidorn, P. B. (2012). Augmenting Optical Character Recognition (OCR) for Improved Digitization: Strategies to Access Scientific Data in Natural History Collections.More info;Your Role: Co-Author;Full Citation: Paul, D., & Heidorn, P. B. (2013). Augmenting optical character recognition (OCR) for improved digitization: Strategies to access scientific data in natural history collections. iConference 2013 Proceedings (pp. 514-518). doi:10.9776/13266 http://hdl.handle.net/2142/39427;Electronic: Yes;Other collaborative: Yes;Specify other collaborative: Univ of Florida;
- Heidorn, P. B., & Zhang, Q. (2012). Label Annotation through Biodiversity Enhanced Learning.More info;Your Role: First Author;Full Citation: Heidorn, P. B., & Zhang, Q. (2013). “Label annotation through biodiversity enhanced learning.” iConference 2013 Proceedings (pp. 882-884). doi:10.9776/13450 http://hdl.handle.net/2142/42056. Poster and Short Paper;Electronic: Yes;Collaborative with graduate student: Yes;
- Paul, D. L., Heidorn, P. B., Best, J., Gilbert, E., Neill, A., & Ulate, W. (2012). Help iDigBio reveal hidden data: iDigBio Augmenting OCR working group needs you - Part II.More info;Your Role: 2nd Author, Sole presenter;Full Citation: Paul, D., Heidorn, P. B., Best, J., Gilbert, E., Neill, A., & Ulate, W. (2013). Help iDigBio reveal hidden data: iDigBio Augmenting OCR working group needs you - Part II. iConference 2013 Proceedings (pp. 1066-1068). doi:10.9776/13517. http://hdl.handle.net/2142/42515;Electronic: Yes;Other collaborative: Yes;Specify other collaborative: U of Florida, Smithsonian Institute, Arizona State University, Botanical Research Institute of Texas.;
- Paul, D. L., Heidorn, P. B., Best, J., Gilbert, E., Neill, A., Nelson, G., & Ulate, W. (2012). Help iDigBio reveal hidden data: iDigBio Augmenting OCR working group needs you.More info;Your Role: Second Author, lead presenter;Full Citation: Paul, D., Heidorn, P. B., Best, J., Gilbert, E., Neill, A., Nelson, G., & Ulate, W. (2013). Help iDigBio reveal hidden data: iDigBio Augmenting OCR working group needs you. iConference 2013 Proceedings (pp. 1019-1021). doi:10.9776/13471. http://hdl.handle.net/2142/42502;Electronic: Yes;Other collaborative: Yes;Specify other collaborative: U of Florida, Smithsonian Institute, Arizona State University, Botanical Research Institute of Texas.;
- Wei, Q., & Heidorn, P. B. (2008, September). Automatic metadata extraction from museum specimen labels. In Proceedings of the International Conference on Dublin Core and Metadata Applications, 57-68.More infoThis paper describes the information properties of museum specimen labels and machine learning tools to automatically extract Darwin Core (DwC) and other metadata from these labels processed through Optical Character Recognition (OCR). The DwC is a metadata profile describing the core set of access points for search and retrieval of natural history collections and observation databases. Using the HERBIS Learning System (HLS) we extract 74 independent elements from these labels. The automated text extraction tools are provided as a web service so that users can reference digital images of specimens and receive back an extended Darwin Core XML representation of the content of the label. This automated extraction task is made more difficult by the high variability of museum label formats, OCR errors and the open class nature of some elements. In this paper we introduce our overall system architecture, and variability robust solutions including, the application of Hidden Markov and Naive Bayes machine learning models, data cleaning, use of field element identifiers, and specialist learning models. The techniques developed here could be adapted to any metadata extraction situation with noisy text and weakly ordered elements.
- Cui, H., Heidorn, P. B., & Zhang, H. (2002). An approach to automatic classification of text for information retrieval. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 96--97.
- Heidorn, P. B. (2002). Biodiversity and biocomplexity informatics: Policy and implementation science versus citizen science. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 362--364.
- Heidorn, P. B. (2002). Reprocessing paper-based reference materials for the digital environment. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 377--377.
- Heidorn, P. B. (1998). Shapes from natural language in Verbal Image. In Representation and processing of spatial expressions, 119--132.
- Heidorn, P. B. (1995). Shape language processing and visual feedback for image indexing and retrieval. In PROCEEDINGS OF THE ASIS ANNUAL MEETING, 32, 231--231.
Presentations
- Heidorn, P. B. (2020, January 2020). Expert Stakeholders Evaluate the UN Report on Digital Interdependence - Report to the UN Secratary General. Konrad-Adenauer-Stiftung’s (KAS) workshop "The Age of Digital Interdependence". Washington, D.C.: Konrad-Adenauer-Stiftung Foundation (https://www.kas.de).More infoAfter the optimism that characterized the advent of the Internet and the beginning of the digital age, downsides have become increasingly clear; and urgent questions of regulation and the active shaping of the digital space and age have arisen. Especially due to the nature of the digital space (cross-border, multi-stakeholder, dynamic disruption), the strong connectivity (digital interdependence) and the resulting complexity, the question today is no longer whether the digital space and the digital age will be actively shaped, but what an adequate regulatory approach can look like. What are the important areas to shape the digital future and what are the central challenges, actors, and political processes to consider in developing smart regulations?Against this backdrop the Konrad-Adenauer-Stiftung’s (KAS) workshop will offer the opportunity to discuss and evaluate with fellow stakeholders the report "The Age of Digital Interdependence" by the High-Level Panel on Digital Cooperation appointed by the UN Secretary-General.The report identifies a number of significant effects of digitalization and shows that the complexity and dynamic nature of the digital age require various skills and competences of all actors and thus a new mode of digital cooperation in which multilateral and multi-stakeholder approaches complement each other meaningfully. This approach includes a number of centrally important values, such as inclusiveness, respect, human-centeredness, human flourishing, transparency, collaboration, accessibility and sustainability.This KAS multi-stakeholder event focused on the recommendations of the UN Commission and examined how realistic the recommendations are, which pre-conditions are necessary in order to implement the recommendations and if certain perspectives are underrepresented. KAS will produce a report on the discussion results.Participants were able to engage in depth with questions around barriers to access to the digital world and perspectives of digital governance. Participants also had the opportunity to network and develop new or foster existing relationships.
- Heidorn, P. B. (2019, November 6-8, 2019). Navigating Astronomical “Dark Data” through Advanced Cyberinfrastructure: ASTROLABE and other CyVerse Projects. Data inclusion revolution: A community workshop on enabling petabytes to science in the 2020s. Boston, MA: Kavli Data Foundation.More infoThe third meeting in a series on this topic, this community workshop is designed to gather a broad collection of astronomers, educators, and engineers to discuss the technical, institutional, and sociological barriers preventing any astronomer leveraging the scientific potential of these petabyte-scale datasets, and to develop a community roadmap designed to address these challenges and enable a data inclusion revolution in astronomy (article made available courtesy of ASP).
- Heidorn, P. B. (2019, November). Navigating Astronomical and Ecology Data through Advanced Cyberinfrastructure: ASTROLABE and other CyVerse Projects. National Earth-space Research Education and Innovation with Data. Green Bank Observatory, Greenbank, West Virginia: National Science Foundation.
- Heidorn, P. B., & Stahlman, G. R. (2018, Winter). Astrolabe - WorldWide Telescope Cyberinfrastructure Workshop. American Astronomical Society 231. Washington DC: American Astronomical Society.More infoWe ran the Workshop on the use of cyberinfrastructure in astronomy
- Heidorn, P. B. (2017, Summer). Biodiversity Informatics 2020: Computers and Data add new Tools. Special Library Association. Phoenix, AZ: Special Library Association.
- Heidorn, P. B., & Stahlman, G. R. (2017, Winter). Navigating Astronomical “Dark Data” through Advanced Cyberinfrastructure Workshop. American Astronomical Society 229. Grapevine, TX: American Astronomical Society.More infoWe ran the Workshop on the use of cyberinfrastructure in astronomy
- Heidorn, P. B., & Stahlman, G. R. (2017, Winter). Arizona Astronomical Data HubAAS 227: Dark/Orphaned Data. American Astronomical Society 227. Grapevine, TX: American Astronomical Society.More infoPresentation
- Heidorn, P. B. (2015, Summmer). Astrolabe. Physics-Astronomy-Mathematics roundtable at Special Library Association. Boston. MA: Special Library Association.
- Heidorn, P. B., Fox, p., Ahalt, S., & Jones, M. (2013, July). Empowering Long Tail Research: Panel: “Envisioning a Software Institute to Accelerate Environmental Science.”. Annual Meeting of Federation of Earth Science Information Partners (ESIP). Chapel Hill, North Carolina: Earth Science Information Partners.More infoPanel: The National Science Foundation created the Software Institutes for Sustained Innovation (S2I2) program to conceptualize a series of new institutes that can accelerate science and engineering through advances in software. Scientific advances ranging from modeling climate change to the sequencing of the human genome have been rendered possible in the last few decades due to the massive improvements in the capabilities of computers to process data through software. This pivotal role of software in science is broadly acknowledged, while simultaneously being systematically undervalued through minimal investments in maintenance and innovation. Scientists rely upon software that is often cobbled together by a string of graduate students with little understanding of software design principles, or upon commercial software that represents an algorithmic black box. As a community, we need to embrace the creation, use, and maintenance of software within science, and address problems such as code complexity, openness, reproducibility, and accessibility. We also need to fully develop new skills and practices in software engineering as a core competency in our earth science disciplines, starting with undergraduate and graduate education and extending into university and agency professional positions. Panelists will present three strategic planning projects that envision the role of a software institute in enabling science. This will be followed by a moderated discussion to elicit opinions and feedback about these visions from the ESIP community, and to assess the roles and functions that are most important for a software institute in the earth and environmental sciences.
- Heidorn, P. B. (2011, 2011-01-01). The Path to Enlightened Solutions for Biodiversity's Dark Data. Keynote speaker at Scripting Life: the science behind ViBRANT. Paris, France.More infoKeynote speaker for: Kickoff meeting of the European Biodiversity Informatics Initiative. Funding by the EU.;Invited: Yes;Type of Presentation: Invited/Plenary Speaker;
- Heidorn, P. B. (2011, 2011-08-01). Biodiversity Informatics: An Interdisciplinary Challenge. 75th Años. Bogota Colombia.More infoKeynote for the 75th Anniversary of the National Natural History Museum of Colombia. Funded by the Museum and University.;Invited: Yes;Type of Presentation: Invited/Plenary Speaker;
- Heidorn, P. B. (2011, 2011-11-01). Repository as App: Functionality to attract Dark Data. eResearch Australasia 2011. Melbourne, Australia.More infoExpenses paid by conference organizers http://conference.eresearch.edu.au/eres2011/featured-speakers/;Invited: Yes;Type of Presentation: Invited/Plenary Speaker;
- Heidorn, P. B. (2010, 2009-06-01). Societal Need for Digital Curation Specialists in the Library Setting. Special Library Association Annual Conference. Washinton, D.C..More infoComputer Science Roundtablehttp://units.sla.org/division/dpam/conferences/2009/;Invited: Yes;Type of Presentation: Panel Discussant (Reporting Research);
- Heidorn, P. B. (2010, 2009-09-01). Dark Data In the Long Tail of Science:. National Institute of STandards and Technology. Gaithersburg, MD.More infohttp://www.slideshare.net/pbheidorn/dark-data-in-the-long-tail-of-science-examples-in-biologyThe humanities and sciences are moving from a period of data scarcity to a period of data abundance because of advances in instrumentation and digitization. At the same time many of the key questions of our time such as climate change, ecosystem collapse, and energy require a faster pace of scientific discovery as well as improved data analysis, synthesis and reuse. Much scientific data is currently being lost or underutilized because of a lack of viable institutions for its organization and maintenance, because of a lack of education and because of cultural biases among the data generators that prevent proper data curation. This data, dark data, is essentially invisible to the scientific community. Like dark matter, it cannot easily be directly observed and its existence can only be inferred. Federal government bodies such as the Interagency Working Group on Digital Data, the National Institutes of Health and the National Science Foundation are calling for a major restructuring of the national management of science information including the underlying data. This presentation will outline the impacts these changes will have on scientists, information and computer scientists, librarians and the traditional repositories of scholarly knowledge: libraries and museums. It will draw on examples from my own research in automatic metadata extraction from museum specimens and literature, biodiversity informatics projects, biodiversity data standards development, education programs in biological informatics and program management at the National Science Foundation.;Invited: Yes;Type of Presentation: Invited/Plenary Speaker;
- Heidorn, P. B. (2010, 2009-11-01). Biodiversity Data Abundance and Scarcity. American Society for Information Science and Technology. Vancouver, British Columbia.More infoEach panelist gave a talk ending with a discussion. This panel aims to discuss the importance of creating Biodiversity and natural history collections, the state of the art in terms of standards, best practices and the challenges that natural history museums and herbaria face when trying to digitize their collections. My PowerPoint is attached.;Your Role: Other Panel Members: Miguel E. Ruiz, Jacob Kramer-Duffield, Jane Greenberg, Nathan Hall,;Type of Presentation: Panel Discussant (Reporting Research);
- Heidorn, P. B. (2010, 2010-01-01). Filtered Push and Biological Science Collections Tracker coordination. Biological Annotation Working Group. Harvard Natural History Museum, Boston, MA.More infoMeeting of working groups to plan coordination of research projects. BiSciCol Tracker was eventually funded by NSF. ;Type of Presentation: Panel Discussant (Reporting Research);
- Heidorn, P. B. (2010, 2010-03-01). Biodiversity Informatics: Mining Untapped Resources. First Lecture of George Frederick Jewett Foundation. Marine Biological Laboratory in Woods Hole, MA.More info;Type of Presentation: Invited/Plenary Speaker;
- Heidorn, P. B. (2010, 2010-10-01). Library Curation of Long-tail Science Data. 22nd Inter CODATA Conference: Scientific Data and Sustainable Development. Cape Town, South Africa.More infoPaper presentation;Type of Presentation: Government/Policy Audiences;
- Heidorn, P. B. (2010, 2010-12-01). Workshop in Scientific Data. International Digital Curation Conference. Chicago, Illinois.More infoPresented and helped organize.;Type of Presentation: Academic Conference/Workshop;
- Heidorn, P. B. (2010, 2011-04-01). The Path to Enlightened Solutions for Biodiversity's Dark Data. Scripting Life: the science behind ViBRANT. Paris, France.More infoKickoff meeting for EU Biodiversity Informatics Initiative;Type of Presentation: Academic Conference/Workshop;
- Heidorn, P. B., Cragin, M. H., Smith, L. C., & Palmer, C. L. (2010, 2009-04-01). Extending the data curation curriculum to practicing LIS professionals. DigCCurr 2009: Digital Curation Practice, Promise and Prospects. Chapel Hill, NC.More infoIn this panel we will present an overview of and outcomes from the inaugural Summer Institute on Data Curation held at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. The Institute addresses a growing need for continuing professional development in data curation. Panelists will present their experiences attending the Institute, and discuss these in relation to the current and ongoing data curation activities at their own universities. ;Your Role: Content development;Refereed: Yes;Other collaborative: Yes;Specify other collaborative: University of Illinois Collaborators;Type of Presentation: Panel Discussant (Reporting Research);
- Wei, Q., Heidorn, P. B., & Freeland, C. (2010, 2010-01-01). Name Matters: Taxonomic Name Recognition (TNR) in Biodiversity Heritage Library (BHL). iConference. Champaign, Illinois.More infohttp://hdl.handle.net/2142/14919;Your Role: Technical supervision, design of research and editing document.;Collaborative with graduate student: Yes;Other collaborative: Yes;Specify other collaborative: Chris Freeland is a BHL Developer,;Type of Presentation: Professional Organization;
- Heidorn, P. B. (2009, 2009-06-01). Societal Need for Digital Curation Specialists in the Library Setting. Special Library Association Annual Conference. Washinton, D.C..More infoComputer Science Roundtablehttp://units.sla.org/division/dpam/conferences/2009/;Invited: Yes;Type of Presentation: Panel Discussant (Reporting Research);
- Heidorn, P. B. (2009, 2009-09-01). Dark Data In the Long Tail of Science:. National Institute of STandards and Technology. Gaithersburg, MD.More infohttp://www.slideshare.net/pbheidorn/dark-data-in-the-long-tail-of-science-examples-in-biologyThe humanities and sciences are moving from a period of data scarcity to a period of data abundance because of advances in instrumentation and digitization. At the same time many of the key questions of our time such as climate change, ecosystem collapse, and energy require a faster pace of scientific discovery as well as improved data analysis, synthesis and reuse. Much scientific data is currently being lost or underutilized because of a lack of viable institutions for its organization and maintenance, because of a lack of education and because of cultural biases among the data generators that prevent proper data curation. This data, dark data, is essentially invisible to the scientific community. Like dark matter, it cannot easily be directly observed and its existence can only be inferred. Federal government bodies such as the Interagency Working Group on Digital Data, the National Institutes of Health and the National Science Foundation are calling for a major restructuring of the national management of science information including the underlying data. This presentation will outline the impacts these changes will have on scientists, information and computer scientists, librarians and the traditional repositories of scholarly knowledge: libraries and museums. It will draw on examples from my own research in automatic metadata extraction from museum specimens and literature, biodiversity informatics projects, biodiversity data standards development, education programs in biological informatics and program management at the National Science Foundation.;Invited: Yes;Type of Presentation: Invited/Plenary Speaker;
- Heidorn, P. B. (2009, 2009-11-01). Biodiversity Data Abundance and Scarcity. American Society for Information Science and Technology. Vancouver, British Columbia.More infoEach panelist gave a talk ending with a discussion. This panel aims to discuss the importance of creating Biodiversity and natural history collections, the state of the art in terms of standards, best practices and the challenges that natural history museums and herbaria face when trying to digitize their collections. My PowerPoint is attached.;Your Role: Other Panel Members: Miguel E. Ruiz, Jacob Kramer-Duffield, Jane Greenberg, Nathan Hall,;Type of Presentation: Panel Discussant (Reporting Research);
- Heidorn, P. B., Cragin, M. H., Smith, L. C., & Palmer, C. L. (2009, 2009-04-01). Extending the data curation curriculum to practicing LIS professionals. DigCCurr 2009: Digital Curation Practice, Promise and Prospects. Chapel Hill, NC.More infoIn this panel we will present an overview of and outcomes from the inaugural Summer Institute on Data Curation held at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. The Institute addresses a growing need for continuing professional development in data curation. Panelists will present their experiences attending the Institute, and discuss these in relation to the current and ongoing data curation activities at their own universities. ;Your Role: Content development;Refereed: Yes;Other collaborative: Yes;Specify other collaborative: University of Illinois Collaborators;Type of Presentation: Panel Discussant (Reporting Research);
Poster Presentations
- Heidorn, P. B., Zhang, Q., & Chong, S. (2013, February). The LABELX (Label Annotation Through Biodiversity Enhanced Learning). iConference. Fort Worth, TX: iSchool.
Others
- Heidorn, P. B. (2019, November). Network for Earth-space Research Education and Innovation with Data (NEREID) NSF Workshop Report. National Science Foundation.
- Heidorn, P. B. (2012). DataNet Grants Advisory Committees:The last Board meeting was in Ann Arbor about three weeks ago..More infoDataNet Grants Advisory Committees:The last Board meeting was in Ann Arbor about three weeks ago.SEAD (Sustainable Environment-Actionable Data) http://www.si.umich.edu/node/2464SI Lead Investigator:Margaret HedstromResearch Team:Ann ZimmermanCharles SeveranceMargaret HedstromJude YewKaren WoollamsFunding Partner:National Science FoundationAmount Awarded:$8,000,000Start Date:09/01/2011End Date:09/30/2016_________________________DataNet Project II Advisory Board (I am on several of the awards.Next Board meeting in Chapel Hill , NC March 12-13, 2013http://datafed.org/National Science Foundation Cooperative Agreement:OCI 0940841Collaboration Environments for Data Driven ScienceMajor science and engineering initiatives are dependent upon massive data collections that comprise observational data, experimental data, simulation data, and engineering data. To support science and engineering collaborations, a policy-driven national data management infrastructure is being implemented. The prototype addresses both the life cycle of science and engineering data and the sustainability of data collections and repositories over time, across changes in technology and usage.________________________________________
- Heidorn, P. (1998). The perception of visual information ..
- Heidorn, P. B. (1996). Natural language processing: Edited by CN PEREIRA and BJ GROSZ. The MIT Press (a Bradford Book), Cambridge, Mass.(1994). vi+ 531 pp., $35.00, ISBN 0-262-66092-X.