- Professor, Management Information Systems
- Professor, BIO5 Institute
- Award for Thesis - Undergraduate Honor's Student
- Spring 2019
- Distinguished Poster Award
- AMIA, Fall 2016
- Amazon Web Services - Research Credit
- Amazon, Summer 2014
No activities entered.
DissertationMIS 920 (Fall 2020)
Healthcare Information SystemsMIS 506 (Fall 2020)
Topics in Data and Web MiningMIS 611D (Fall 2020)
Web Computing And MiningMIS 510 (Fall 2020)
DissertationMIS 920 (Spring 2020)
DissertationMIS 920 (Fall 2019)
Healthcare Information SystemsMIS 506 (Fall 2019)
DissertationMIS 920 (Spring 2019)
Honors ThesisMIS 498H (Spring 2019)
Web Computing And MiningMIS 510 (Spring 2019)
DissertationMIS 920 (Fall 2018)
Healthcare Information SystemsMIS 506 (Fall 2018)
Honors ThesisMIS 498H (Fall 2018)
Independent StudyMIS 599 (Fall 2018)
Independent StudyMIS 599 (Spring 2018)
Master's Report ProjectsMIS 696H (Spring 2018)
Topics in Data and Web MiningMIS 611D (Spring 2018)
Web Computing And MiningMIS 510 (Spring 2018)
Independent StudyMIS 599 (Fall 2017)
Independent StudyMIS 599 (Spring 2017)
Topics in Data and Web MiningMIS 611D (Spring 2017)
Web Computing And MiningMIS 510 (Spring 2017)
Data Mining Bus IntellMIS 545 (Fall 2016)
Independent StudyMIS 599 (Fall 2016)
Web Computing And MiningMIS 510 (Spring 2016)
- Harwell, J., Pentoney, C., & Leroy, G. A. (2014). Finding and Understanding Medical Information Online. In Information Technologies for Patient Empowerment in Healthcare. De Gruyter.
- Harber, P. I., & Leroy, G. A. (2018). Insights from Twitter about Public Perceptions of Asthma, COPD, and Exposures. J Occup Environ Med.
- Harber, P., & Leroy, G. (2019). Insights from Twitter About Public Perceptions of Asthma, COPD, and Exposures. Journal of occupational and environmental medicine, 61(6), 484-490.More infoThe aim of this study was to analyze tweets concerning asthma and chronic obstructive pulmonary disease (COPD).
- Leroy, G. A., Kauchak, D., & Hogue, A. (2015). Effects of Text Simplification: Evalution of Splitting up Noun Phrases. Journal of Health Communications.
- Leroy, G., & Kauchak, D. (2019). A comparison of text versus audio for information comprehension with future uses for smart speakers. JAMIA open, 2(2), 254-260.More infoAudio is increasingly used to access information on the Internet through virtual assistants and smart speakers. Our objective is to evaluate the distribution of health information through audio.
- Mukherjee, P., Leroy, G. A., & Kauchak, D. (2018). Using Lexical Chains to Identify Text Difficulty: A Corpus Statistics and Classification Study. IEEE Journal of Biomedical and Health Informatics.
- Mukherjee, P., Leroy, G., & Kauchak, D. (2019). Using Lexical Chains to Identify Text Difficulty: A Corpus Statistics and Classification Study. IEEE journal of biomedical and health informatics, 23(5), 2164-2173.More infoOur goal is data-driven discovery of features for text simplification. In this paper, we investigate three types of lexical chains: exact, synonymous, and semantic. A lexical chain links semantically related words in a document. We examine their potential with a document-level corpus statistics study (914 texts) to estimate their overall capacity to differentiate between easy and difficult text and a classification task (11 000 sentences) to determine usefulness of features at sentence-level for simplification. For the corpus statistics study we tested five document-level features for each chain type: total number of chains, average chain length, average chain span, number of crossing chains, and the number of chains longer than half the document length. We found significant differences between easy and difficult text for average chain length and the average number of cross chains. For the sentence classification study, we compared the lexical chain features to standard bag-of-words features on a range of classifiers: logistic regression, naïve Bayes, decision trees, linear and RBF kernel SVM, and random forest. The lexical chain features performed significantly better than the bag-of-words baseline across all classifiers with the best classifier achieving an accuracy of ∼90% (compared to 78% for bag-of-words). Overall, we find several lexical chain features provide specific information useful for identifying difficult sentences of text, beyond what is available from standard lexical features.
- Szep, A., Szep, M., Leroy, G., Kauchak, D., Kloehn, N., Revere, D., & Just, M. (2019). Algorithmic Generation of Grammar Simplification Rules Using Large Corpora. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 2019, 72-81.More infoThere is often a discontinuity between patients' literacy level and educational materials. In response, we are developing an online medical text simplification editor. In this paper, we describe generating grammar simplification rules from a large parallel corpus (N=141,500) containing original sentences and their simplified variants. We algorithmically identified grammatical transformations between sentences (N=26,600) and used distributional characteristics in two corpora to select transformations with the broadest application and the least ambiguity. This resulted in a top set of 146 rules. Two experts evaluated 20 representative rules reflecting 4 characteristics (long/short and weak/strong) each with 5 example sentences. Generally, we found that the rules are helpful for guiding simplification. Using a 5-point Likert scale (5=best), stronger rules scored higher for ease of applying (4.11), overall helpfulness (4.40) and usefulness of examples (4.05). Rule length did not affect the expert scores. The grammar simplification rules are being integrated in our text editor.
- Kloehn, N., Leroy, G. A., Kauchak, D., Gu, Y., Colina, S., Yuan, N. P., & Revere, D. (2018). SubSimplify – Automatically generating term explanations in English and Spanish when expert and big data dictionaries are insufficient. Journal of Medical Internet Research (JMIR), 8, e10779.
- Leroy, G. A., & Wimble, M. (2018). Health Information Technology: Promise and Progress. Health Systems.
- Leroy, G. A., Gu, Y., Pettygrove, S. D., Kelly Galindo, M., Aurora, A., & Kurzius-Spencer,, M. (2018). Automated Extraction of Diagnostic Criteria from Electronic Health Recordes for Autism Spectrum Disorders: Development, Evaluation and Case Study. Journal of Medical Internet Research (JMIR).
- Colina, S., Pritchard, T. G., Yuan, N. P., Diaz, D., Rajnarayanan, S., Kauckak, D., Leroy, G. A., & Mukherjee, P. (2017). A New Parser for Medical Text Simplification Using Morphological, Sentential and Double Negation. Journal of Biomedical Informatics.
- Harber, P. I., & Leroy, G. A. (2017). FEASABILITY AND UTILITY OF LEXICAL ANALYSIS IN OCCUPATIONAL HEALTH FREE TEXT. J Occup Env Med.More infoDevelopment of natural language processing algorithm with subsequent application to 85,000 Mine Safety and Health Administration records.
- Harber, P. I., & Leroy, G. A. (2017). Social media use for occupational lung disease.. Curr Opin Allergy Clin Immunol. 2017 Jan 30. doi:, 17. doi:10.1097/ACI.0000000000000345More infoocial media have great impact on all aspects of lifethroughout the world. The utilization of social media for occupational lungdisease, however, has been much more limited. This article summarizes recentliterature concerning social media for occupational lung disease and identifiesareas for additional use.RECENT FINDINGS: Social media are used in six relevant areas: informationdissemination, peer-to-peer communication, survey research data collection,participatory research and exposome data acquisition, assessing public concerns, and knowledge generation. There are very clear advantages for informationdissemination from experts to workers and on a peer-to-peer basis, althoughvariable credibility and accuracy concerns persist. For research, social mediahave been used for acquiring data posted for nonresearch purposes and forefficiently collecting information specifically for research. The benefits ofefficiency, democracy, and very large data sources may counterbalance concernsabout inadequate specification of recruitment strategies and limited control overdata quality.SUMMARY: The potential benefits of using social media for lung health-workplaceinteractions are much greater than the very limited current utilization.
- Kauchak, D., Leroy, G. A., & Hogue, A. (2017). Measuring Text Difficulty Using Parse-Tree Frequency. Journal of the Association for Information Science and Technology, (Minor Revisions).
- Leroy, G. A., Gu, Y., Pettygrove, S. D., & Kurzius-Spencer, M. (2017). Automated Pattern Extraction for Recognizing DSM Diagnostic Criteria for Autism Spectrum Disorder in Mental Health EHR. In: Frasincar F., Ittoo A., Nguyen L., Métais E. (eds). Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science., 10260.
- Mukherjee, P., Mukherjee, P., Leroy, G. A., Leroy, G. A., Kauckak, D., Kauckak, D., Rajnarayanan, S., Rajnarayanan, S., Diaz, D., Diaz, D., Yuan, N. P., Yuan, N. P., Pritchard, T. G., Pritchard, T. G., Colina, S., & Colina, S. (2017). NegAIT: A New Parser for Medical Text Simplification Using Morphological, Sentential and Double Negation. Journal of Biomedical Informatics.
- Kauchak, D., & Leroy, G. (2015). Moving Beyond Readability Metrics for Simplify Health-related Text. IEEE IT Professional.
- Leroy, G. A., & Kauchak, D. (2016). Moving Beyond Readability Metrics for Simplifying Health-related Text. IEEE IT Professional, 8(3), 45-51. doi:10.1109/MITP.2016.50
- Leroy, G., Kauchak, D., & Hogue, A. (2016). Effects on Text Simplification: Evaluation of Splitting Up Noun Phrases. Journal of health communication, 21 Suppl 1, 18-26.More infoTo help increase health literacy, we are developing a text simplification tool that creates more accessible patient education materials. Tool development is guided by a data-driven feature analysis comparing simple and difficult text. In the present study, we focus on the common advice to split long noun phrases. Our previous corpus analysis showed that easier texts contained shorter noun phrases. Subsequently, we conducted a user study to measure the difficulty of sentences containing noun phrases of different lengths (2-gram, 3-gram, and 4-gram); noun phrases of different conditions (split or not); and, to simulate unknown terms, pseudowords (present or not). We gathered 35 evaluations for 30 sentences in each condition (3 × 2 × 2 conditions) on Amazon's Mechanical Turk (N = 12,600). We conducted a 3-way analysis of variance for perceived and actual difficulty. Splitting noun phrases had a positive effect on perceived difficulty but a negative effect on actual difficulty. The presence of pseudowords increased perceived and actual difficulty. Without pseudowords, longer noun phrases led to increased perceived and actual difficulty. A follow-up study using the phrases (N = 1,350) showed that measuring awkwardness may indicate when to split noun phrases. We conclude that splitting noun phrases benefits perceived difficulty but hurts actual difficulty when the phrasing becomes less natural.
- Harber, P. I., & Leroy, G. A. (2015). Assessing Work–Asthma Interaction With Amazon Mechanical Turk. J Occup Envir Med.More infoObjectives: To illustrate the utility of crowdsourcing for occupational healthsurveillance. Methods: Amazon Mechanical Turk was used to recruit andobtain information from employed persons with asthma, who answered questionsabout work–asthma interactions. Results: Data collection from 60 subjectsrequired only a few hours. Participants spent on average 7 minutesresponding to seven questions (one optional) and used an average of 708words. Work exacerbation, interference of asthma with work, and suggestedworkplace accommodation are frequent (83% reported at least one interaction).Conclusions: The full spectrum of work–asthma interactions shouldbe considered. Modern crowdsourcing methods have considerable potentialas occupational health surveillance tools because of their effectiveness; efficiencyand financial viability are additional important advantages.
- Keselman, A., Logan, R., Smith, C. A., Leroy, G., & Zeng-Treitler, Q. (2008). Developing informatics tools and strategies for consumer-centered health communication. Journal of the American Medical Informatics Association : JAMIA, 15(4), 473-83.More infoAs the emphasis on individuals' active partnership in health care grows, so does the public's need for effective, comprehensible consumer health resources. Consumer health informatics has the potential to provide frameworks and strategies for designing effective health communication tools that empower users and improve their health decisions. This article presents an overview of the consumer health informatics field, discusses promising approaches to supporting health communication, and identifies challenges plus direction for future research and development. The authors' recommendations emphasize the need for drawing upon communication and social science theories of information behavior, reaching out to consumers via a range of traditional and novel formats, gaining better understanding of the public's health information needs, and developing informatics solutions for tailoring resources to users' needs and competencies. This article was written as a scholarly outreach and leadership project by members of the American Medical Informatics Association's Consumer Health Informatics Working Group.
- Keselman, A., Smith, C. A., Divita, G., Kim, H., Browne, A. C., Leroy, G., & Zeng-Treitler, Q. (2008). Consumer health concepts that do not map to the UMLS: where do they fit?. Journal of the American Medical Informatics Association : JAMIA, 15(4), 496-505.More infoThis study has two objectives: first, to identify and characterize consumer health terms not found in the Unified Medical Language System (UMLS) Metathesaurus (2007 AB); second, to describe the procedure for creating new concepts in the process of building a consumer health vocabulary. How do the unmapped consumer health concepts relate to the existing UMLS concepts? What is the place of these new concepts in professional medical discourse?
- Leroy, G., & Miller, T. (2010). Perils of providing visual health information overviews for consumers with low health literacy or high stress. Journal of the American Medical Informatics Association : JAMIA, 17(2), 220-3.More infoThis pilot study explores the impact of a health topics overview (HTO) on reading comprehension. The HTO is generated automatically based on the presence of Unified Medical Language System terms. In a controlled setting, we presented health texts and posed 15 questions for each. We compared performance with and without the HTO. The answers were available in the text, but not always in the HTO. Our study (n=48) showed that consumers with low health literacy or high stress performed poorly when the HTO was available without linking directly to the answer. They performed better with direct links in the HTO or when the HTO was not available at all. Consumers with high health literacy or low stress performed better regardless of the availability of the HTO. Our data suggests that vulnerable consumers relied solely on the HTO when it was available and were misled when it did not provide the answer.
- Leroy, G., Xu, J., Chung, W., Eggers, S., & Chen, H. (2007). An end user evaluation of query formulation and results review tools in three medical meta-search engines. International journal of medical informatics, 76(11-12), 780-9.More infoRetrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools.
- Mukherjee, P., Leroy, G. A., Kauckak, D., Rajnarayanan, S., Diaz, D., Yuan, N. P., Pritchard, T. G., & Colina, S. (2017). NegAIT: A New Parser for Medical Text Simplification Using Morphological, Sentential and Double Negation. Journal of Biomedical Informatics.
- Ku, C., & Leroy, G. (2014). A Decision Support System: Automated Crime Report Analysis and Classification for e-Government. Government Information Quarterly, 31(4), 534–544.
- Leroy, G. A., Chen, H., & Rindflesch, T. C. (2014). Smart and Connected Health (Guest Editor Introduction). IEEE Intelligent Systems, 29(3).
- Leroy, G., & Kauchak, D. (2014). The effect of word familiarity on actual and perceived text difficulty. Journal of the American Medical Informatics Association : JAMIA, 21(e1), e169-72.More infoThere is little evidence that readability formula outcomes relate to text understanding. The potential cause may lie in their strong reliance on word and sentence length. We evaluated word familiarity rather than word length as a stand-in for word difficulty. Word familiarity represents how well known a word is, and is estimated using word frequency in a large text corpus, in this work the Google web corpus. We conducted a study with 239 people, who provided 50 evaluations for each of 275 words. Our study is the first study to focus on actual difficulty, measured with a multiple-choice task, in addition to perceived difficulty, measured with a Likert scale. Actual difficulty was correlated with word familiarity (r=0.219, p
- Pentoney, C., Harwell, J., & Leroy, G. (2014). Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2014, 976-83.More infoSearching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).
- Kwak, M., Leroy, G., Martinez, J. D., & Harwell, J. (2013). Development and evaluation of a biomedical search engine using a predicate-based vector space model. Journal of biomedical informatics, 46(5), 929-39.More infoAlthough biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p
- Leroy, G., Endicott, J. E., Kauchak, D., Mouradi, O., & Just, M. (2013). User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. Journal of medical Internet research, 15(7), e144.More infoAdequate health literacy is important for people to maintain good health and manage diseases and injuries. Educational text, either retrieved from the Internet or provided by a doctor's office, is a popular method to communicate health-related information. Unfortunately, it is difficult to write text that is easy to understand, and existing approaches, mostly the application of readability formulas, have not convincingly been shown to reduce the difficulty of text.
- Leroy, G., Kauchak, D., & Mouradi, O. (2013). A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty. International journal of medical informatics, 82(8), 717-30.More infoLow patient health literacy has been associated with cost increases in medicine because it contributes to inadequate care. Providing explanatory text is a convenient approach to distribute medical information and increase health literacy. Unfortunately, writing text that is easily understood is challenging. This work tests two text features for their impact on understanding: lexical simplification and coherence enhancement.
- Leroy, G., Endicott, J. E., Mouradi, O., Kauchak, D., & Just, M. L. (2012). Improving perceived and actual text difficulty for health information consumers using semi-automated methods. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2012, 522-31.More infoWe are developing algorithms for semi-automated simplification of medical text. Based on lexical and grammatical corpus analysis, we identified a new metric, term familiarity, to help estimate text difficulty. We developed an algorithm that uses term familiarity to identify difficult text and select easier alternatives from lexical resources such as WordNet, UMLS and Wiktionary. Twelve sentences were simplified to measure perceived difficulty using a 5-point Likert scale. Two documents were simplified to measure actual difficulty by posing questions with and without the text present (information understanding and retention). We conducted a user study by inviting participants (N=84) via Amazon Mechanical Turk. There was a significant effect of simplification on perceived difficulty (p
- De Leo, G., Gonzales, C. H., Battagiri, P., & Leroy, G. (2011). A smart-phone application and a companion website for the improvement of the communication skills of children with autism: clinical rationale, technical development and preliminary results. Journal of medical systems, 35(4), 703-11.More infoAutism is a complex neurobiological disorder that is part of a group of disorders known as autism spectrum disorders (ASD). Today, one in 150 individuals is diagnosed with autism. Lack of social interaction and problems with communication are the main characteristics displayed by children with ASD. The Picture Exchange Communication System (PECS) is a communication system where children exchange visual symbols as a form of communication. The visual symbols are laminated pictures stored in a binder. We have designed, developed and are currently testing a software application, called PixTalk which works on any Windows Mobile Smart-phone. Teachers and caregivers can access a web site and select from an online library the images to be downloaded on to the Smart-phone. Children can browse and select images to express their intentions, desires, and emotions using PixTalk. Case study results indicate that PixTalk can be used as part of ongoing therapy.
- Leroy, G., Helmreich, S., & Cowie, J. R. (2010). The influence of text characteristics on perceived and actual difficulty of health information. International journal of medical informatics, 79(6), 438-49.More infoWillingness and ability to learn from health information in text are crucial for people to be informed and make better medical decisions. These two user characteristics are influenced by the perceived and actual difficulty of text. Our goal is to find text features that are indicative of perceived and actual difficulty so that barriers to reading can be lowered and understanding of information increased.
- Leroy, G. (2009). Persuading consumers to form precise search engine queries. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2009, 354-8.More infoToday's search engines provide a single textbox for searching. This input method has not changed in decades and, as a result, consumer search behaviour has not changed either: few and imprecise keywords are used. Especially with health information, where incorrect information may lead to unwise decisions, it would be beneficial if consumers could search more precisely. We evaluated a new user interface that supports more precise searching by using query diagrams. In a controlled user study, using paper-based prototypes, we compared searching with a Google interface with drawing new or modifying template diagrams. We evaluated consumer willingness and ability to use diagrams and the impact on query formulation. Users had no trouble understanding the new search method. Moreover, they used more keywords and relationships between keywords with search diagrams. In comparison to drawing their own diagrams, modifying existing templates led to more searches being conducted and higher creativity in searching.
- De Leo, G., & Leroy, G. (2008). An online community for teachers of children with autism to support, observe, and evaluate communication enabled with smartphones. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 924.More infoWe are developing an online community for teachers of children diagnosed with autism spectrum disorder that will provide tools to share, analyze, and evaluate assisted communication. The data will be collected from software on smartphones that allows children to communicate with teachers using images. Since this is the first approach towards systematic data collection for children with ASD, we expect a significant impact on current teaching methods.
- Goryachev, S., Zeng-Treitler, Q., Smith, C. A., Browne, A. C., Divita, G., Keselman, A., Leroy, G., & Figueroa, R. (2008). Making primarily professional terms more comprehensible to the lay audience. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 956.More infoCertain texts, such as clinical reports and clinical trial records, are written by professionals for professionals while being increasingly accessed by lay people. To improve the comprehensibility of such documents to the lay audience, we conducted a pilot study to analyze terms used primarily by health professionals, and explore ways to make them more comprehensible to lay people.
- Leroy, G., Helmreich, S., Cowie, J. R., Miller, T., & Zheng, W. (2008). Evaluating online health information: beyond readability formulas. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 394-8.More infoAlthough understanding health information is important, the texts provided are often difficult to understand. There are formulas to measure readability levels, but there is little understanding of how linguistic structures contribute to these difficulties. We are developing a toolkit of linguistic metrics that are validated with representative users and can be measured automatically. In this study, we provide an overview of our corpus and how readability differs by topic and source. We compare two documents for three groups of linguistic metrics. We report on a user study evaluating one of the differentiating metrics: the percentage of function words in a sentence. Our results show that this percentage correlates significantly with ease of understanding as indicated by users but not with the readability formula levels commonly used. Our study is the first to propose a user validated metric, different from readability formulas.
- Miller, T., & Leroy, G. (2008). Visualization of health information with predications extracted using natural language processing and filtered using the UMLS. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 1057.More infoIncreased availability of and reliance on written health information can tax the abilities of unskilled readers. We are developing a system that uses natural language processing to extract phrases, identify medical terms using the UMLS, and visualize the propositions. This system substantially reduces the amount of information a consumer must read, while providing an alternative to traditional prose based text.
- Leroy, G., Jennifer, X. u., Chung, W., Eggers, S., & Chen, H. (2007). An end user evaluation of query formulation and results review tools in three medical meta-search engines. International Journal of Medical Informatics, 76(11-12), 780-789.More infoPMID: 16996298;Abstract: Purpose: Retrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools. Methods: We compared three medical search engines with support tools that require more or less effort from users to form a query and evaluate results. We carried out an end user study with 23 users who were asked to find information, i.e., subtopics and supporting abstracts, for a given theme. We used a balanced within-subjects design and report on the effectiveness, efficiency and usability of the support tools from the end user perspective. Conclusions: We found significant differences in efficiency but did not find significant differences in effectiveness between the three search engines. Dynamic user support tools requiring less effort led to higher efficiency. Fewer searches were needed and more documents were found per search when both query reformulation and result review tools dynamically adjust to the user query. The query reformulation tool that provided a long list of keywords, dynamically adjusted to the user query, was used most often and led to more subtopics. As hypothesized, the dynamic result review tools were used more often and led to more subtopics than static ones. These results were corroborated by the usability questionnaires, which showed that support tools that dynamically optimize output were preferred. © 2006 Elsevier Ireland Ltd. All rights reserved.
- Leroy, G., Eryilmaz, E., & Laroya, B. T. (2006). Health information text characteristics. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 479-83.More infoMillions of people search online for medical text, but these texts are often too complicated to understand. Readability evaluations are mostly based on surface metrics such as character or words counts and sentence syntax, but content is ignored. We compared four types of documents, easy and difficult WebMD documents, patient blogs, and patient educational material, for surface and content-based metrics. The documents differed significantly in reading grade levels and vocabulary used. WebMD pages with high readability also used terminology that was more consumer-friendly. Moreover, difficult documents are harder to understand due to their grammar and word choice and because they discuss more difficult topics. This indicates that we can simplify many documents by focusing on word choice in addition to sentence structure, however, for difficult documents this may be insufficient.
- Miller, T., Leroy, G., & Wood, E. (2006). Dynamic generation of a table of contents with consumer-friendly labels. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 559-63.More infoConsumers increasingly look to the Internet for health information, but available resources are too difficult for the majority to understand. Interactive tables of contents (TOC) can help consumers access health information by providing an easy to understand structure. Using natural language processing and the Unified Medical Language System (UMLS), we have automatically generated TOCs for consumer health information. The TOC are categorized according to consumer-friendly labels for the UMLS semantic types and semantic groups. Categorizing phrases by semantic types is significantly more correct and relevant. Greater correctness and relevance was achieved with documents that are difficult to read than those at an easier reading level. Pruning TOCs to use categories that consumers favor further increases relevancy and correctness while reducing structural complexity.
- Leroy, G., & Rindflesch, T. C. (2005). Effects of information and machine learning algorithms on word sense disambiguation with small datasets. International journal of medical informatics, 74(7-8), 573-85.More infoCurrent approaches to word sense disambiguation use (and often combine) various machine learning techniques. Most refer to characteristics of the ambiguity and its surrounding words and are based on thousands of examples. Unfortunately, developing large training sets is burdensome, and in response to this challenge, we investigate the use of symbolic knowledge for small datasets. A naïve Bayes classifier was trained for 15 words with 100 examples for each. Unified Medical Language System (UMLS) semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in nine experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. To investigate this large variance, we performed several follow-up evaluations, testing additional algorithms (decision tree and neural network), and gold standards (per expert), but the results did not significantly differ. However, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators. We conclude that neither algorithm nor individual human behavior cause these large differences, but that the structure of the UMLS Metathesaurus (used to represent senses of ambiguous words) contributes to inaccuracies in the gold standard, leading to varied performance of word sense disambiguation techniques.
- Leroy, G., Huang, J., Chuang, S., & Charlop-Christy, M. H. (2005). Communication software using pictures for use with Pocket PCs. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 1024.More infoAutism spectrum disorder has become one of the most prevalent developmental disorders. A difficulty with communication is one of the main impairments. We are developing a digital library of images that will be used to help children with autism communicate without the need for reading or writing skills. Images will be displayed on Pocket PCs to convey messages. We are currently developing and evaluating the first prototype.
- Leroy, G., & Rindflesch, T. C. (2004). Using symbolic knowledge in the UMLS to disambiguate words in small datasets with a naïve Bayes classifier. Studies in health technology and informatics, 107(Pt 1), 381-5.More infoCurrent approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naïve Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in eight experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. In a follow-up evaluation, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators.
- Leroy, G., Chen, H., & Martinez, J. D. (2003). A shallow parser based on closed-class words to capture relations in biomedical text. Journal of biomedical informatics, 36(3), 145-58.More infoNatural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun phrases automatically from free text has been developed and evaluated. It uses heuristics and a noun phraser to capture entities of interest in the text. Cascaded finite state automata structure the relations between individual entities. The automata are based on closed-class English words and model generic relations not limited to specific words. The parser also recognizes coordinating conjunctions and captures negation in text, a feature usually ignored by others. Three cancer researchers evaluated 330 relations extracted from 26 abstracts of interest to them. There were 296 relations correctly extracted from the abstracts resulting in 90% precision of the relations and an average of 11 correct relations per abstract.
- Gu, Y., & Leroy, G. A. (2020, January). Use of Conventional Machine Learning to Optimize Deep Learning Hyper-parameters for NLP Labeling Tasks. In Hawaii International Conference on System Sciences (HICSS).
- Kauchak, D., & Leroy, G. A. (2020, January). A Web-Based Medical Text Simplification Tool. In Hawaii International Conference on System Sciences (HICSS).
- Cheng, Y., Moharty, A. F., Ogunyemi, O., Smith, C., Leroy, G. A., & Zeng, Q. (2019, November). 2018 Salary Survey of AMIA Members: Factors Associated with Higher Salaries. In AMIA.
- Gu, Y., & Leroy, G. A. (2019, December). Mechanisms for Automatic Training Data Labeling for Machine Learning". In International Conference on Information Systems (ICIS).
- Gu, Y., & Leroy, G. A. (2019, December). Understanding Limitations of Deep Learning for Text in a Low-Resource Domain: A Case Study with Autism Diagnostic Criteria Detection from EHR. In HITS Workshop.
- Gu, Y., & Leroy, G. A. (2019, November). Classification with Feature and Algorithm Machine Learning Ensembles for Autism Spectrum Disorders. In Conference on Health IT and Analytics (CHITA).
- Gu, Y., & Leroy, G. A. (2019, October). Automated Training Data Discovery and Labelling for Machine Learning in a Low Resource Domain. In INFORMS.
- Kauchak, D., Leroy, G. A., Pei, M., & Colina, S. (2019, November). Predicting Transition Words between Sentences for English and Spanish Medical Text. In AMIA.
- Pei, M., Leroy, G. A., Kauchak, D., Szep, M., & Szep, A. (2018, March). Splitting Sentences for Text Simplification: A Machine Learning Approach. In AMIA Summit.
- Szep, A., Szep, M., Leroy, G. A., Kauchak, D., Kloehn, N., Revere, D., & Just, M. (2018, March). Algorithmic Generation of Grammar Simplification Rules Using Large Corpora. In AMIA Summit.
- Gu, Y., Leroy, G. A., Pettygrove, S. D., Kelly Galindo, M., & Kurzius-Spencer, M. (2018, November). Optimizing Corpus Creation for Training Word Embedding in Low Resource Domains: A Case Study in Autism Spectrum Disorder (ASD). In AMIA Fall Symposium.
- Gu, Y., Leroy, G. A., & Kauchak, D. (2017, November). When synonyms are not enough: Optimal parenthetical insertion for text simplification. In AMIA Fall Symposium.
- Mukherjee, P., Leroy, G. A., Kauchak, D., Naverrete, B., Diaz, D., & Colina, S. (2017, November). The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study. In AMIA Fall Symposium.
- Leroy, G. A., Harber, P. I., & Revere, D. (2015, December 1-2). Public sharing of medical advice using social media: an analysis of a Twitter. In Seventeenth International Conference on Grey Literature - A new wave of textual and non-textual grey literature.
- Alsudais, K., Corso, A., & Leroy, G. A. (2014, June). We Know Where You Are Tweeting from: Assigning a Type of Place to Tweets using Natural Language Processing and Random Forests. In IEEE Proceedings of the 3rd International Congress on Big Data.
- Harwell, J., Pentoney, C., & Leroy, G. A. (2014, February-march). Big Data For Query Expansion: A Comparison of Content-based versus Social-based Keywords. In 2014 Winter Conference on Business Intelligence.
- Kauchak, D., Mouradi, O., Pentoney, C., & Leroy, G. (2014, January). Text Simplification Tools: Using Machine Learning to Discover Features that Identify Difficult Text. In 47th Hawaii International Conference on System Sciences (HICSS).
- Kelly, L., Goeuriot, L., Schreck, T., Leroy, G., Mowery, D. L., Zuccon, G., & Palotti, J. (2014, September). Overview of the ShARe/CLEF eHealth Evaluation Lab 2014. In 5th International Conference of the CLEF Initiative: Information Access Evaluation. Multilinguality, Multimodality, and Interaction, CLEF 2014.
- Pentoney, C., Harwell, J., & Leroy, G. A. (2014, November). Does Query Expansion Limit Our Learning? A Comparison of Social-Based Expansion to Content-Based Expansion for Medical Queries on the Internet. In AMIA Annual Symposium.
- Suominen, H., Schreck, T., Leroy, G., Hochheiser, H., Goeuriot, L., Kelly, L., Mowery, D. L., Nualart, J., Ferraro, G., & Keim, D. (2014, September). Task 1 of the CLEF eHealth Evaluation Lab 2014 : Visual-Interactive Search and Exploration of eHealth Data. In 5th International Conference of the CLEF Initiative: Information Access Evaluation. Multilinguality, Multimodality, and Interaction, CLEF 2014.
- Leroy, G. A., Li, M., Rains, S. A., Kloehn, N. D., Kauchak, D., & Revere, D. (2017, November). Writing for Impact – The Influence of Features in Medical Texts on Comprehension, Perceived Severity and Perceived Susceptibility of Developing the Condition. Conference on Health IT and Analytics (CHITA). Washington DC.
- Harber, P. I., & Leroy, G. A. (2016, spring). Social Media (Twitter) for Assessing Concerns about Obstructive Airway Disease. American Thoracic Society Conference. San Francisco: American Thoracic Society.More infoRESULTS: 43805 tweets were obtained, of which 6119 exact duplicates were removed before analysis. Overall, institutional (both commercial and noncommercial) and re-tweets outnumbered original personal tweets (39117, 1776, and 2912 respectively). The frequency of terms linked to a domain was estimated by assigning the most frequent specific terms to domains: Disease (49235); Symptom Effect (5656); Non-Drug Treatment (5004); Children (2767); Drug Treatment (2190); Prevention (2087); Science (1863); Trigger NOS (1722); ..
- Harber, P. I., & Leroy, G. A. (2015, Feb). Crowd sourcing population data: Using Amazon Mechanical Turk and Natural Language Processing for Work-Related Asthma. Pulmonary Research Conference.
- Ellingson, K., Ramadan, F., Pu, J., Donovan, F., Galgiani, J. N., & Leroy, G. A. (2019, November). Valley Fever Phenotypic Presentation: Exploration of a Precision Medicine Approach to Clinical Decision Support". Reimagine Health: Is my fate in my genes?.
- Pei, M., Leroy, G. A., Kauchack, D., Szep, M., & Szep, A. (2019, March). Splitting Sentences for Text Simplification: A Machine Learning Approach. AMIA Summit.
- Gu, Y., & Leroy, G. A. (2018, December). A Classification Artifact to Support Mental Health Surveillance: A Comparison of Feature and Classifier Ensembles. Workshop on Information Technology and Systems (WITS).
- Leroy, G. A., Navarrete, B., Colina, S., & Kauchak, D. (2017, November). Spanish Text Simplification Using Term Familiarity: Applying Principles from English Text Simplification. AMIA Fall Symposium. Washington DC.
- Revere, D., Mukherjee, P., Kauchak, D., & Leroy, G. A. (2017, November). Creating a Corpus Resource for Text Simplification Research and Development,. AMIA Fall Symposium. Washington DC.
- Kauchak, D., Leroy, G. A., & Just, M. (2016, Fall). Grammar Frequency and Simplification: When Intuition Fails. AMIA Fall Symposium. Chicago.
- Leroy, G. A., Koolippurackal, J., Swami, S., & Harber, P. I. (2016, Fall). Reviewing Asthma-related Grey Literature and Personal Opinions on Twitter using LDA and CTM Clustering. AMIA Fall Symposium. Chicago: AMIA.
- Revere, D., Leroy, G. A., & Harber, P. I. (2015, Decmeber). What's the message? A NLP analysis of public health information sent by SMS. Seventeenth International Conference on Grey Literature - A new wave of textual and non-textual grey literature. Amsterdam, Netherlands.
- Revere, D., Revere, D., Leroy, G. A., Leroy, G. A., Harber, P. I., & Harber, P. I. (2015, Decmeber). What's the message? A NLP analysis of public health information sent by SMS. Seventeenth International Conference on Grey Literature - A new wave of textual and non-textual grey literature. Amsterdam, Netherlands.
- Leroy, G. A., Kurzius-Spencer, M., & Pettygrove, S. D. (2014, Novemberr). Using Natural Language Processing for Autism Trigger Extraction. AMIA Annual Symposium. Washington DC.
- Kauchak, D., Leroy, G. A., & Grueter, M. (2018, December). Demo: An Online Evidence-based Text Simplification Editor for Medical Text. Workshop on Information Technology and Systems (WITS).