Jump to navigation

The University of Arizona Wordmark Line Logo White
UA Profiles | Home
  • Phonebook
  • Edit My Profile
  • Feedback

Profiles search form

Hsinchun Chen

  • Professor, Management Information Systems
  • Regents Professor
  • Professor, BIO5 Institute
Contact
  • (520) 621-4153
  • McClelland Hall, Rm. 430Z
  • Tucson, AZ 85721
  • hchen@eller.arizona.edu
  • Bio
  • Interests
  • Courses
  • Scholarly Contributions

Biography

Dr. Hsinchun Chen is University of Arizona Regents’ Professor and Thomas R. Brown Chair in Management and Technology in the Management Information Systems (MIS) Department and Professor of Entrepreneurship & Innovation in the McGuire Center for Entrepreneurship at the College of Management of the University of Arizona.  He received the B.S. degree from the National Chiao-Tung University in Taiwan, the MBA degree from SUNY Buffalo, and the Ph.D. degree in Information Systems from the New York University. Dr. Chen is director of the Artificial Intelligence Lab and has served as a faculty of the UA MIS department (ranked #3 in MIS) since 1989. In 2014-2015, he also served as the lead program director for the National Science Foundation's program in Smart and Connected Health.  He has also served as a Scientific Counselor/Advisor of the National Library of Medicine (USA), Academia Sinica (Taiwan), and National Library of China (China).

Dr. Chen is a Fellow of IEEE, ACM, and AAAS. He received the IEEE Computer Society 2006 Technical Achievement Award, the 2008 INFORMS Design Science Award, the MIS Quarterly 2010 Best Paper Award, the IEEE 2011 Research Achievement and Leadership Award in Intelligence and Security Informatics, and the UA 2013 Technology Innovation Award. He was also a finalist of the AZ Tech Council’s Governor’s Innovation of the Year Award in 2011 and named in the Arizona Centennial Top 100 Scientists in 2012. He is the author/editor of 20 books, 25 book chapters, 280 SCI journal articles, and 150 refereed conference articles covering Web computing, search engines, digital library, intelligence analysis, biomedical informatics, data/text/web mining, and knowledge management. His recent books include: Dark Web (2012); Sports Data Mining (2010); Infectious Disease Informatics (2010); Terrorism Informatics (2008); Mapping Nanotechnology Knowledge and Innovation (2008), Digital Government (2007); Intelligence and Security Informatics for International Security (2006); and Medical Informatics (2005), all published by Springer.

Dr. Chen’s publication productivity in Information Systems was ranked #8 in a bibliometric study in (CAIS 2005) and #9 in (EJIS, 2007); and he was ranked #1 in Digital Library research in a study in (IP&M 2005), #1 in JASIST publication for 1998-2007 in (JASIST 2008) and #5 in h-index in IEEE Intelligent Systems publications for 1986-2010 in (IEEE IS 2010). His overall h-index in 2012 is 67, which is ranked in the top 10 for information retrieval researchers in computer science and #4 among all University of Arizona faculty according to Microsoft Academic Search, and #1 among all faculty in the MIS (tied with Andy Whinston) according to Google Scholar.

He is the founding editor and served as first Editor-in-Chief (EIC) of the ACM Transactions on Management Information Systems (ACM TMIS); he also founded and serves as EIC of Springer's open access journal Security Informatics (SI). He serves on numerous editorial boards including: IEEE Intelligent Systems, ACM Transactions on Information Systems, IEEE Transactions on Systems, Man, and Cybernetics, Journal of the American Society for Information Science and Technology, Decision Support Systems, and International Journal on Digital Library.

He has been an advisor for major NSF, DOJ, NLM, DOD, DHS, and other international research programs in digital library, digital government, medical informatics, and national security research. Dr. Chen founded Artificial Intelligence Lab, which has received more than $35M in research funding from NSF, NIH, NLM, DOD, DOJ, CIA, DHS, and other agencies (90 grants, 40 from NSF). Dr. Chen has also produced 30 Ph.D. students who are placed in major academic institutions around the world. Dr. Chen was conference co-chair of ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2004 and had served as the conference/program co-chair for eight International Conferences of Asian Digital Libraries (ICADL), the premiere digital library meeting in Asia that he helped develop. He had served as the Program Chair of the International Conference on Information Systems (ICIS) 2009, the premiere MIS conference. Dr. Chen is also (founding) conference co-chair of the IEEE International Conferences on Intelligence and Security Informatics (ISI) 2003-present. The ISI conference, which has been sponsored by NSF, CIA, DHS, and NIJ, has become the premiere meeting for international and homeland security IT research.

Dr. Chen’s COPLINK system, which has been quoted as a national model for public safety information sharing and analysis, has been adopted in more than 3500 law enforcement and intelligence agencies. The COPLINK research had been featured in the New York Times, Newsweek, Los Angeles Times, Washington Post, Boston Globe, and ABC News, among others. The COPLINK project was selected as a finalist by the prestigious International Association of Chiefs of Police (IACP)/Motorola 2003 Weaver Seavey Award for Quality in Law Enforcement in 2003. COPLINK research has since been expanded to border protection (BorderSafe), disease and bioagent surveillance (BioPortal), and terrorism informatics research (Dark Web), funded by NSF, DOD, CIA, and DHS. In collaboration with selected international terrorism research centers and intelligence agencies, the Dark Web project has generated one of the largest databases in the world about extremist/terrorist-generated Internet contents (web sites, forums, blogs, and multimedia documents). Dark Web research supports link analysis, content analysis, web metrics analysis, multimedia analysis, sentiment analysis, and authorship analysis of international terrorism contents. The project has received significant international press coverage, including: Associated Press, USA Today, The Economist, NSF Press, Washington Post, BBC, PBS, Business Week, WIRED magazine, and Arizona Daily Star, among others. Dr. Chen recently received additional major NSF Secure and Trustworthy Cyperspace (SaTC) program funding ($5.4M) for his Hacker Web research and Cybersecurity Analytics fellowship program.

Dr. Chen is also a successful entrepreneur. He is the founder of the Knowledge Computing Corporation (KCC), a university spin-off IT company and a market leader in law enforcement and intelligence information sharing and data mining. KCC merged i2, the industry leader in intelligence analytics and fraud detection, in 2009. The combined i2/KCC company was acquired by IBM in 2011 for $500M. Dr. Chen founded Caduceus Intelligence Corporation, another UA spin-off company, in 2010. Caduceus is developing web-based systems for healthcare informatics and patient support. Its first product named DiabeticLink will provide diabetes patient intelligence and services in US, Taiwan, China, and Denmark in 2013 and 2014.

Dr. Chen has also received numerous awards in information technology and knowledge management education and research including: AT&T Foundation Award, SAP Award, the Andersen Consulting Professor of the Year Award, the University of Arizona Technology Innovation Award, and the National Chiao-Tung University Distinguished Alumnus Award. Dr. Chen had served as a keynote or invited speaker in major international security informatics, health informatics, information systems, knowledge management, and digital library conferences and major international government meetings (NATO, UN, EU, FBI, CIA, DOD, DHS). He had served as Distinguished/Honorary Professor of several major universities in Taiwan and China (including Chinese Academy of Sciences and Shanghai Jiao Tong University). He was named the Distinguished University Chair Professor of the National Taiwan University in 2010 and was elected as China National 1000-Elite Chair Professor with the Tsinghua University in 2013.

Degrees

  • Ph.D. Information Systems
    • New York University, New York, New York
    • An Artificial Intelligence Approach to the Design of Online Information Retrieval Systems (advisor: Vasant Dhar)
  • M.S. Information Systems
    • New York University, New York, New York
  • M.B.A. Management Information Systems, Management Science, Finance
    • State University of New York at Buffalo, Buffalo, New York
  • B.S. Management Science
    • National Chiao-Tung University, Hsinchu City, Taiwan, Province of China

Awards

  • AIS Impact Award
    • AIS, Fall 2020
  • Best Paper Award
    • IEEE Intelligence and Security Informatics Conference (2019), Fall 2020
  • ACM Fellow
    • Association for Computing Machinery, Fall 2015
  • National Science Foundation Discovery News
    • National Science Foundation, Fall 2015
  • University of Arizona 2013 Technology Innovation Award
    • University of Arizona, Fall 2013
  • Innovator of the Year, Technology Innovation Award
    • University of Arizona, Spring 2013
  • MISQ Best Paper of 2010 (Co-Author)
    • Management Information Systems Quarterly, Spring 2011
  • AAAS Fellow
    • American Association for the Advancement of Science, Fall 2006
  • IEEE Fellow
    • Institute of Electrical and Electronics Engineers, Fall 2006

Related Links

Share Profile

Interests

Teaching

Data mining, text mining, and web miningCybersecurity

Research

Areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.

Courses

2020-21 Courses

  • Cyber Warfare Capstone
    MIS 689 (Spring 2021)
  • Dissertation
    MIS 920 (Spring 2021)
  • Cyber Warfare Capstone
    MIS 689 (Fall 2020)
  • Dissertation
    MIS 920 (Fall 2020)

2019-20 Courses

  • Cyber Warfare Capstone
    MIS 689 (Spring 2020)
  • Data Analytics
    MIS 464 (Spring 2020)
  • Dissertation
    MIS 920 (Spring 2020)
  • Cyber Warfare Capstone
    MIS 689 (Fall 2019)
  • Dissertation
    MIS 920 (Fall 2019)

2018-19 Courses

  • Cyber Warfare Capstone
    MIS 689 (Spring 2019)
  • Data Analytics
    MIS 464 (Spring 2019)
  • Dissertation
    MIS 920 (Spring 2019)
  • Topics in Data and Web Mining
    MIS 611D (Spring 2019)
  • Dissertation
    MIS 920 (Fall 2018)

2017-18 Courses

  • Dissertation
    MIS 920 (Spring 2018)
  • Dissertation
    MIS 920 (Fall 2017)
  • Master's Report Projects
    MIS 696H (Fall 2017)

2016-17 Courses

  • Dissertation
    MIS 920 (Spring 2017)
  • Master's Report Projects
    MIS 696H (Spring 2017)
  • Dissertation
    MIS 920 (Fall 2016)

2015-16 Courses

  • Dissertation
    MIS 920 (Summer I 2016)
  • Dissertation
    MIS 920 (Spring 2016)
  • Spcl Top Mngmnt Info Sys
    MIS 496A (Spring 2016)
  • Topics in Data and Web Mining
    MIS 611D (Spring 2016)

Related Links

UA Course Catalog

Scholarly Contributions

Books

  • Zeng, D., Chen, H., Zheng, X., Leischow, S., Zeng, D., Chen, H., Zheng, X., Leischow, S., Zeng, D., Chen, H., Zheng, X., Leischow, S., Zeng, D., Chen, H., Zheng, X., & Leischow, S. (2016). Proceedings of 2015 International Conference for Smart Health (edited; post-conference). Phoenix, Arizona: Springer Lecture Notes in Computer Science No. 9545.
  • Chau, M., Wang, G. A., & Chen, H. (2015). Intelligence and Security Informatics - Pacific Asia Workshop, PAISI 2015. Proceedings.. Ho Chi Minh City, Vietnam: Springer, Lecture Notes in Computer Science 9074.
  • Zeng, D., Chen, H., Zeng, X., & Leischow, S. (2016). Proceedings of 2015 International Conference for Smart Health (edited). Phoenix, Arizona: Springer, Lecture Notes in Computer Science No. 9545.

Journals/Publications

  • Ahmad, F., Abbasi, A. F., Li, J., Dobolyi, D., NETEMEYER, R., & Chen, H. (2020). A Deep Learning Architecture for Psychometric Natural Language Processing. ACM Transactions on Information Systems, 33(1), 6:1-6:29.
  • Bardhan, I., Chen, H., & Karahanna, E. (2020). Connecting Systems, Data, and People: A Multidisciplinary Research Roadmap for Chronic Disease Management. MIS Quarterly, 44(1).
  • Chau, M., Li, T., Xu, J., Yip, P., & Chen, H. (2019). Finding People with Emotional Distress in Online Social Media: A Design Combining Machine Learning and Rule-based Classification. MIS Quarterly.
  • Dang, Y., Zhang, Y., & Chen, H. (2019). An Exploratory Study on the Virtual World: Investigating the Avatar Gender and Avatar Age Differences in Their Social Interactions for Help-Seeking. Information Systems Frontiers.
  • Dang, Y., Zhang, Y., Brown, S. A., & Chen, H. (2020). Examining the Impacts of Mental Workload and Task-Technology Fit on User Acceptance of the Social Media Search System. Information Systems Frontiers, 22(3), 697-718.
  • Ebrahimi, M., Nunamaker, J. F., & Chen, H. (2019). Semi-Supervised Cyber Threat identification in Dark Net Market: A Transductive and Deep Learning Approach. Journal of Management Information Systems.
  • Samtani, S. S., Kantarcioglu, M., & Chen, H. (2020). Trailblazing the Artificial Intelligence for Cybersecurity Discipline: A Multi-Disciplinary Research Roadmap. ACM Transactions on Management Information Systems, 11(4).
  • Samtani, S., Zhu, H., & Chen, H. (2020). Proactively Identifying Emerging Threats from the Dark Web: A Diachronic Graph Embedding Framework (D-GEF). ACM Transactions on Privacy and Security, 23(4), 1-33.
  • Shuo, Y., Zhu, H., & Chen, H. (2019). Emoticon Analysis for Chinese Social Media and E-Commerce: The AZEmo System. ACM Transactions on Management Information Systems.
  • Zhu, H., Samtani, S. S., Chen, H., & Nunamaker, J. F. (2020). Human Identification for Activities of Daily Living: A Deep Transfer Learning Approach. Journal of Management Information Systems,, 37(2).
  • Benjamin, V., Valacich, J. S., & Chen, H. (2019). DICE-E: A Framework for Conducting Darknet Identification, Collection, Evaluation, with Ethics. MIS Quarterly, 43(1).
  • Jiang, S., & Chen, H. (2019). Examining Patterns of Scientific Knowledge Diffusion Based on Knowledge Cyber Infrastructure: A Multi-dimensional Network Approach. Scientometrics, 121(3), 1599-1617.
  • Lin, Y., Lin, M., & Chen, H. (2019). Do Electronic Health Records Affect Quality of Care? Evidence from the HITECH Act. Information Systems Research, 30(1), 306-318.
  • Wu, L., Zhu, H., Chen, H., & Roco, M. C. (2019). Comparing Nanotechnology Landscapes in the US and China: A Patent Analysis Perspective. Journal of Nanoparticle Research, 21.
  • Chen, H. (2018). A Sequence-to-Sequence Model-Based Deep Learning Approach for Recognizing Activity of Daily Living for Senior Care. Journal of Biomedical Informatics, 84.
  • Chen, H. (2018). Exploring Emerging Hacker Assets and Key Hackers for Proactive Cyber Threat Intelligence. Journal of Management Information Systems.
  • Chen, H. (2018). Hidden Markov Model Based Fall Detection with Motion Sensor Orientation Calibration: A Case for Real-Life Home Monitoring. IEEE Journal of Biomedical and Health Informatics, 22, 1847-1853.
  • Chen, H. (2018). Identifying SCADA Systems and Their Vulnerabilities on the Internet of Things (IoT): A Text Mining Approach. IEEE Intelligent Systems.
  • Chen, H. (2018). The State-of-the-Art in Twitter Sentiment Analysis: A Review and Benchmark Evaluation. ACM Transactions on Management Information Systems.
  • Chen, H. (2018). Web Media and Stock Markets: A Survey and Future Directions from a Big Data Perspective. IEEE Transactions on Knowledge and Data Engineering.
  • Li, W., Yin, J., & Chen, H. (2018). Supervised Topic Modeling using Hierarchical Dirichlet Process-based Inverse Regression: Experiments on E-Commerce Applications. IEEE Transactions on Knowledge and Data Engineering.
  • Chen, H. (2017). Adverse Drug Reaction Early Warning Using User Search Data. Online Information Review.
  • Chen, H. (2017). Healthcare Predictive Analytics for Risk Profiling in Chronic Care: A Bayesian Multi-Task Learning Approach. MIS Quarterly.
  • Chen, H. (2017). International perspective on nanotechnology papers, patents, and NSF awards (2000–2016). Journal of Nanoparticle Research.
  • Chen, H. (2017). Supervised Topic Modeling using Hierarchical Dirichlet Process-based Inverse Regression: Experiments on E-Commerce Applications. IEEE Transactions on Knowledge and Data Engineering, 30.
  • Chen, H. (2017). Web Media and Stock Markets: A Survey and Future Directions from a Big Data Perspective. IEEE Transactions on Knowledge and Data Engineering, 30.
  • Li, W., Chen, H., & Nunamaker, J. F. (2016). Identifying and Profiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System. Journal of Management Information Systems, 33(2), 1059-1086.
  • Lin, Y., Chen, H., Brown, R., Li, S., & Yang, H. (2017). Healthcare Predictive Analytics for Risk Profiling in Chronic Care: A Bayesian Multi-Task Learning Approach. MIS Quarterly, 41(2), 473-495.
  • Samtani, S., Chinn, R., Chen, H., Nunamaker, J. F., Samtani, S., Chinn, R., Chen, H., & Nunamaker, J. F. (2017). Exploring Emerging Hacker Assets and Key Hackers for Proactive Cyber Threat Intelligence. Journal of Management Information Systems, 34(4), 1023-1053.
  • Zhang, Y. G., Dang, Y. M., Brown, S. A., & Chen, H. (2017). Investigating the Impacts of Avatar Gender, Avatar Age, and Region Theme on Avatar Activity in the Virtual World. Computers in Human Behavior, 68, 378-387.
  • Zhang, Y. G., Dang, Y. M., Brown, S. A., & Chen, H. (2017). Investigating the Impacts of Avatar Gender, Avatar Age, and Region Theme on Avatar Activity in the Virtual World. Computers in Human Behavior.
  • Benjamin, V., Zhang, B., Chen, H., & Nunamaker, J. F. (2016). Examining Hacker Participation Length within Cybercriminal IRC Communities. Journal of Management Information Systems, 33(2), 482-510.
  • Benjamin, V., Zhang, B., Chen, H., Nunamaker, J. F., Benjamin, V., Zhang, B., Chen, H., & Nunamaker, J. F. (2016). Examining Hacker Participation Length in Cybercriminal Internet-Relay-Chat Communities. Journal of Management Information Systems, 33(2), 482-510.
  • Jiang, S., & Chen, H. (2016). NATERGM: A Model for Examining the Role of Nodal Attributes in Dynamic Social Media Networks. IEEE Transactions on Knowledge and Data Engineering, Forthcoming: 28(3), 729-740.
  • Li, Q., Chen, Y., Jiang, L. L., Li, P., & Chen, H. (2016). A Tensor-based Information Framework for Predicting the Stock Market. ACM Transactions on Information Systems, TBD(tbd).
  • Li, W., Chen, H., & Nunamaker, J. F. (2016). Identifying and Profiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System. Journal of Management Information Systems, 33(4).
  • Li, X., Zhang, T., Song, L., Zhang, Y., Zhang, G., Xing, C., & Chen, H. (2016). Effects of Heart Rate Variability Biofeedback Therapy on Patients with Poststroke Depression: A Case Study. Chinese Medical Journal, 128(18), 2542-2545.
  • Qiao, J., Meng, Y., Chen, H., Huang, H., & Li, G. (2016). Modeling One-Mode Projection of Bipartite Networks by Tagging Vertex Information. Physica A.
  • Woo, J., & Chen, H. (2016). Epidemic Model for Information Diffusion in Web Forums: Experiments in Marketing Exchange and Political Dialog. SpringerPlus.
  • Woo, J., Ha, S. H., & Chen, H. (2016). Tracing Topic Discussion with Event-driven SIR Model in Online Forums. Journal of Electronic Commerce Research.
  • Abbasi, A., Zahedi, F. M., Zeng, D., Chen, Y., Chen, H., & Nunamaker, J. F. (2015). Enhancing Predictive Analytics for Anti-Phishing by Exploiting Website Genre Information. JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 31(4), 109-157.
  • Jiang, S., Gao, Q., Chen, H., & Roco, M. C. (2015). The Roles of Sharing, Transfer, and Public Funding in Nanotechnology Knowledge-Diffusion Networks. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 66(5), 1017-1029.
  • Liu, X., & Chen, H. (2015). A Research Framework for Pharmacovigilance in Health Social Media: Identification and Evaluation of Patient Adverse Drug Event Reports. JOURNAL OF BIOMEDICAL INFORMATICS, 58, 268-279.
  • Liu, X., & Chen, H. (2015). Identifying Adverse Drug Events from Patient Social Media A Case Study for Diabetes. IEEE INTELLIGENT SYSTEMS, 30(3), 44-51.
  • Liu, X., Jiang, S., Chen, H., Larson, C. A., & Roco, M. C. (2015). Modeling Knowledge Diffusion in Scientific Innovation Networks: An Institutional Comparison Between China and U.S. with Illustration for Nanotechnology. Scientometrics, 105(3), 1953-1984.
  • Woo, J., Lee, M. J., Ku, Y., & Chen, H. (2015). Modeling the dynamics of medical information through web forums in medical industry. TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 97, 77-90.
  • Wu, B., Jiang, S., & Chen, H. (2015). The impact of individual attributes on knowledge diffusion in web forums. QUALITY & QUANTITY, 49(6), 2221-2236.
  • Zimbra, D., Chen, H., & Lusch, R. F. (2015). Stakeholder Analyses of Firm-Related Web Forums: Applications in Stock Return Prediction. ACM Transactions on Management Information Systems, 6(1), 2:1-2:38.
  • Benjamin, V., Chen, H., & Zimbra, D. (2014). Bridging the Virtual and Real: The Relationship Between Web Content, Linkage, and Geographical Proximity of Social Movements. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 65(11), 2210-2222.
    More info
    As the Internet becomes ubiquitous, it has advanced to more closely represent aspects of the real world. Due to this trend, researchers in various disciplines have become interested in studying relationships between real-world phenomena and their virtual representations. One such area of emerging research seeks to study relationships between real-world and virtual activism of social movement organization (SMOs). In particular, SMOs holding extreme social perspectives are often studied due to their tendency to have robust virtual presences to circumvent real-world social barriers preventing information dissemination. However, many previous studies have been limited in scope because they utilize manual data-collection and analysis methods. They also often have failed to consider the real-world aspects of groups that partake in virtual activism. We utilize automated data-collection and analysis methods to identify significant relationships between aspects of SMO virtual communities and their respective real-world locations and ideological perspectives. Our results also demonstrate that the interconnectedness of SMO virtual communities is affected specifically by aspects of the real world. These observations provide insight into the behaviors of SMOs within virtual environments, suggesting that the virtual communities of SMOs are strongly affected by aspects of the real world.
  • Campbell, J. L., Chen, H., Dhaliwal, D. S., Lu, H., & Steele, L. B. (2014). The information content of mandatory risk factor disclosures in corporate filings. Review of Accounting Studies, 19(1), 396-455.
    More info
    Abstract: Beginning in 2005, the Securities and Exchange Commission (SEC) mandated firms to include a "risk factor" section in their Form 10-K to discuss "the most significant factors that make the company speculative or risky." In this study, we examine the information content of this newly created section and offer two main results. First, we find that firms facing greater risk disclose more risk factors, and that the type of risk the firm faces determines whether it devotes a greater portion of its disclosures towards describing that risk type. That is, managers provide risk factor disclosures that meaningfully reflect the risks they face. Second, we find that the information conveyed by risk factor disclosures is reflected in systematic risk, idiosyncratic risk, information asymmetry, and firm value. Overall, our evidence supports the SEC's decision to mandate risk factor disclosures, as the disclosures appear to be firm-specific and useful to investors. © 2013 Springer Science+Business Media New York.
  • Dang, Y., Dang, Y., Zhang, Y., Zhang, Y., Hu, P. J., Hu, P. J., Brown, S. A., Brown, S. A., Yungchang, K. u., Yungchang, K. u., Wang, J., Wang, J., Chen, H., & Chen, H. (2014). An integrated framework for analyzing multilingual content in Web 2.0 social media. Decision Support Systems, 61(1), 126-135.
    More info
    Abstract: The growth of Web 2.0 has produced enormous amounts of user-generated content that contains important information about individuals' attitudes, perceptions, and opinions toward products, social events, and political issues. The volume of such content is increasing exponentially, making its search, analysis, and use more difficult and thus favoring advanced tools that aid in information search and processing. We propose an integrated framework that offers an infrastructure necessary for accessing, integrating, and analyzing multilingual user-generated content from different social media sites. Building on this framework, we develop the Dark Web Forum Portal (DWFP) that supports the gathering and analyses of social media content concerning security. Our evaluation results show that users supported by DWFP complete tasks better and faster than those using the benchmark forum. Participants consider DWFP to be better in terms of system quality, usefulness, ease of use, satisfaction and intention to use. © 2014 Elsevier B.V. All rights reserved.
  • Dang, Y., Zhang, Y., Hu, P. J., Brown, S. A., Ku, Y., Wang, H., Chen, H., Dang, Y., Zhang, Y., Hu, P. J., Brown, S. A., Ku, Y., Wang, H., & Chen, H. (2014). An Integrated Framework for Analyzing Multilingual Content in Web 2.0 Social Media. Decision Support Systems, 61, 126-135.
  • Jiang CuiQing, ., Liang Kun, ., Chen Hsinchun, ., & Ding Yong, . (2014). Analyzing market performance via social media: a case study of a banking industry crisis. SCIENCE CHINA-INFORMATION SCIENCES, 57(5).
    More info
    Analyzing market performance via social media has attracted a great deal of attention in the finance and machine-learning disciplines. However, the vast majority of research does not consider the enormous influence a crisis has on social media that further affects the relationship between social media and the stock market. This article aims to address these challenges by proposing a multistage dynamic analysis framework. In this framework, we use an authorship analysis technique and topic model method to identify stakeholder groups and topics related to a special firm. We analyze the activities of stakeholder groups and topics in different periods of a crisis to evaluate the crisis's influence on various social media parameters. Then, we construct a stock regression model in each stage of crisis to analyze the relationships of changes among stakeholder groups/topics and stock behavior during a crisis. Finally, we discuss some interesting and significant results, which show that a crisis affects social media discussion topics and that different stakeholder groups/topics have distinct effects on stock market predictions during each stage of a crisis.
  • Jiang, S., Chen, H., Nunamaker, J. F., & Zimbra, D. (2014). Analyzing firm-specific social media and market: A stakeholder-based event analysis framework. DECISION SUPPORT SYSTEMS, 67, 30-39.
    More info
    Discussion content in firm-specific social media helps managers understand stakeholders' concerns and make informed decisions. Despite such benefits, the over-abundance of information online makes it difficult to identify and focus on the most important stakeholder groups. In this study, we propose a novel stakeholder-based event analysis framework that uses online stylometric analysis to segment the forum participants by stakeholder groups, and partitions their messages into different time periods of major firm events to examine how important stakeholders evolve over time. With this approach, we identified stakeholder groups from a sample of six companies in the petrochemical and banking industries, using more than 500,000 online message postings. To evaluate the proposed system, we conducted market prediction within the identified groups, and compared the prediction performance with traditional approaches that did not account for stakeholder groups or events. Results showed that some stakeholder groups identified by our system had stronger relationships with firms' market performance, compared to the entire set of web forum participants. Incorporating event-induced temporal dynamics further improved the prediction performance. (C) 2014 Elsevier B.V. All rights reserved.
  • Ku, Y., Chiu, C., Zhang, Y., Chen, H., & Su, H. (2014). Text Mining Self-Disclosing Health Information for Public Health Service. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 65(5), 928-947.
    More info
    Understanding specific patterns or knowledge of self-disclosing health information could support public health surveillance and healthcare. This study aimed to develop an analytical framework to identify self-disclosing health information with unusual messages on web forums by leveraging advanced text-mining techniques. To demonstrate the performance of the proposed analytical framework, we conducted an experimental study on 2 major human immunodeficiency virus (HIV)/acquired immune deficiency syndrome (AIDS) forums in Taiwan. The experimental results show that the classification accuracy increased significantly (up to 83.83%) when using features selected by the information gain technique. The results also show the importance of adopting domain-specific features in analyzing unusual messages on web forums. This study has practical implications for the prevention and support of HIV/AIDS healthcare. For example, public health agencies can re-allocate resources and deliver services to people who need help via social media sites. In addition, individuals can also join a social media site to get better suggestions and support from each other.
  • Leroy, G. A., Chen, H., Rindflesch, T. C., Leroy, G. A., Chen, H., & Rindflesch, T. C. (2014). Smart and Connected Health (Guest Editor Introduction). IEEE Intelligent Systems, 29(3).
  • Lin, Y., Chen, H., Brown, R. A., Li, S., & Yang, H. (2014). Time-to-Event Predictive Modeling for Chronic Conditions Using Electronic Health Records. IEEE INTELLIGENT SYSTEMS, 29(3), 14-20.
  • Benjamin, V. A., & Chen, H. (2013). Machine learning for attack vector identification in malicious source code. IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics, 21-23.
    More info
    Abstract: As computers and information technologies become ubiquitous throughout society, the security of our networks and information technologies is a growing concern. As a result, many researchers have become interested in the security domain. Among them, there is growing interest in observing hacker communities for early detection of developing security threats and trends. Research in this area has often reported hackers openly sharing cybercriminal assets and knowledge with one another. In particular, the sharing of raw malware source code files has been documented in past work. Unfortunately, malware code documentation appears often times to be missing, incomplete, or written in a language foreign to researchers. Thus, analysis of such source files embedded within hacker communities has been limited. Here we utilize a subset of popular machine learning methodologies for the automated analysis of malware source code files. Specifically, we explore genetic algorithms to resolve questions related to feature selection within the context of malware analysis. Next, we utilize two common classification algorithms to test selected features for identification of malware attack vectors. Results suggest promising direction in utilizing such techniques to help with the automated analysis of malware source code. © 2013 IEEE.
  • Benjamin, V., Chung, W., Abbasi, A., Chuang, J., Larson, C. A., & Chen, H. (2013). Evaluating text visualization: An experiment in authorship analysis. IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics, 16-20.
    More info
    Abstract: Analyzing authorship of online texts is an important analysis task in security-related areas such as cybercrime investigation and counter-terrorism, and in any field of endeavor in which authorship may be uncertain or obfuscated. This paper presents an automated approach for authorship analysis using machine learning methods, a robust stylometric feature set, and a series of visualizations designed to facilitate analysis at the feature, author, and message levels. A testbed consisting of 506,554 forum messages, in English and Arabic, from 14,901 authors was first constructed. A prototype portal system was then developed to support feasibility analysis of the approach. A preliminary evaluation to assess the efficacy of the text visualizations was conducted. The evaluation showed that task performance with the visualization functions was more accurate and more efficient than task performance without the visualizations. © 2013 IEEE.
  • Chen, H., Compton, S., & Hsiao, O. (2013). DiabeticLink: A health big data system for patient empowerment and personalized healthcare. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8040 LNCS, 71-83.
    More info
    Abstract: Ever increasing rates of diabetes and healthcare costs have focused our attention on this chronic disease to provide a health social media system to serve multi-national markets. Our DiabeticLink system has been developed in both the US and Taiwan markets, addressing the needs of patients, caretakers, nurse educators, physicians, pharmaceutical company and researchers alike to provide features that encourage social connection, data sharing and assimilation and educational opportunities. Some important features DiabeticLink offers include diabetic health indicator tracking, electronic health record (EHR) search, social discussion and Q&A forums, health information resources, diabetic medication side effect reporting, healthy eating recipes and restaurant recommendations. We utilize advanced data, text and web mining algorithms and other computational techniques that are relevant to healthcare decision support and cyber-enabled patient empowerment. © 2013 Springer-Verlag.
  • Chen, H., Denning, D., Roberts, N., Larson, C. A., Ximing, Y. u., & Huang, C. (2013). Revealing the Hidden World of the Dark Web: Social Media Forums and Videos11. Intelligent Systems for Security Informatics, 1-28.
  • Chen, H., Roco, M. C., & Son, J. (2013). Nanotechnology public funding and impact analysis: A tale of two decades (1991-2010). IEEE Nanotechnology Magazine, 7(1), 9-14.
    More info
    Abstract: Nanotechnology?s economic and societal benefits have continued to attract significant research and development (R&D) attention from governments and industries worldwide. Over the past two decades, nanotechnology has seen quasi-exponential growth in the numbers of scientific papers and patent publications produced. New research topics and application areas are continually emerging, and investment from government, industry, and academia [1], [2] has expanded at substantial levels. But what is the impact of public funding on nanotechnology? How important is its role in driving innovation, invention, and knowledge transfer? © 2007-2011 IEEE.
  • Chen, H., Roco, M. C., Son, J., Jiang, S., Larson, C. A., & Gao, Q. (2013). Global nanotechnology development from 1991 to 2012: Patents, scientific publications, and effect of NSF funding. Journal of Nanoparticle Research, 15(9).
    More info
    Abstract: In a relatively short interval for an emerging technology, nanotechnology has made a significant economic impact in numerous sectors including semiconductor manufacturing, catalysts, medicine, agriculture, and energy production. A part of the United States (US) government investment in basic research has been realized in the last two decades through the National Science Foundation (NSF), beginning with the nanoparticle research initiative in 1991 and continuing with support from the National Nanotechnology Initiative after fiscal year 2001. This paper has two main goals: (a) present a longitudinal analysis of the global nanotechnology development as reflected in the United States Patent and Trade Office (USPTO) patents and Web of Science (WoS) publications in nanoscale science and engineering (NSE) for the interval 1991-2012; and (b) identify the effect of basic research funded by NSF on both indicators. The interval has been separated into three parts for comparison purposes: 1991-2000, 2001-2010, and 2011-2012. The global trends of patents and scientific publications are presented. Bibliometric analysis, topic analysis, and citation network analysis methods are used to rank countries, institutions, technology subfields, and inventors contributing to nanotechnology development. We then, examined how these entities were affected by NSF funding and how they evolved over the past two decades. Results show that dedicated NSF funding used to support nanotechnology R&D was followed by an increased number of relevant patents and scientific publications, a greater diversity of technology topics, and a significant increase of citations. The NSF played important roles in the inventor community and served as a major contributor to numerous nanotechnology subfields. © 2013 Springer Science+Business Media.
  • Fan, L., Zhang, Y., Dang, Y., & Chen, H. (2013). Analyzing sentiments in Web 2.0 social media data in Chinese: Experiments on business and marketing related Chinese Web forums. Information Technology and Management, 14(3), 231-242.
    More info
    Abstract: Web 2.0 has brought a huge amount of user-generated, social media data that contains rich information about people's opinions and ideas towards various products, services, and ongoing social and political events. Nowadays, many companies start to look into and try to leverage this new type of data to understand their customers in order to make better business strategies and services. As a nation with rapid economic growth in recently years, China has become visible and started to play an important role in the global business and economy. Also, with the large number of Chinese Internet users, a considerable amount of options about Chinese business and market have been expressed in social media sites. Thus, it will be of interest to explore and understand those user-generated contents in Chinese. In this study, we develop an integrated framework to analyze user sentiments from Chinese social media sites by leveraging sentiment analysis techniques. Based on the framework, we conduct experiments on two popular Chinese Web forums, both related to business and marketing. By utilizing Elastic Net together with a rich body of feature representations, we achieve the highest F-measures of 84.4 and 86.7 % for the two data sets, respectively. We also demonstrate the interpretability of Elastic Net by discussing the top-ranked features with positive or negative sentiments. © 2013 Springer Science+Business Media New York.
  • Jiang, C., Liang, K., Chen, H., & Ding, Y. (2013). Analyzing market performance via social media: a case study of a banking industry crisis. Science China Information Sciences, 1-18.
    More info
    Abstract: Analyzing market performance via social media has attracted a great deal of attention in the finance and machine- learning disciplines. However, the vast majority of research does not consider the enormous influence a crisis has on social media that further affects the relationship between social media and the stock market. This article aims to address these challenges by proposing a multistage dynamic analysis framework. In this framework, we use an authorship analysis technique and topic model method to identify stakeholder groups and topics related to a special firm. We analyze the activities of stakeholder groups and topics in different periods of a crisis to evaluate the crisis's influence on various social media parameters. Then, we construct a stock regression model in each stage of crisis to analyze the relationships of changes among stakeholder groups/topics and stock behavior during a crisis. Finally, we discuss some interesting and significant results, which show that a crisis affects social media discussion topics and that different stakeholder groups/topics have distinct effects on stock market predictions during each stage of a crisis. © 2013 Science China Press and Springer-Verlag Berlin Heidelberg.
  • Jiang, S., & Chen, H. (2013). A computational approach to detecting and assessing sustainability-related communities in social media. International Conference on Information Systems (ICIS 2013): Reshaping Society Through Information Systems Design, 1, 48-58.
    More info
    Abstract: The concept of corporate sustainability suggests that firms need to maintain sustainability principles and practices by addressing stakeholders' economic, ecological, and social concerns. Social media has become a knowledge depository where managers can evaluate stakeholders' concerns about the firm's sustainability-related issues. This study proposes a computational approach that utilizes natural language processing techniques to detect sustainability-related communities within online web forums. The validity of the detected communities was assessed based on their impacts on relevant firms' market performance when the firms' social responsibility was challenged. Experiments on three datasets showed that our system is effective in detecting sustainability-related communities. Also, a strong correlation was found between the activities of the identified sustainability-related communities and the firms' market performance during events that challenged the firms' social responsibilities. Our research contributes to the practice of managing corporate sustainability by facilitating managers in evaluating sustainability-related concerns of stakeholders and making effective managerial responses. © (2013) by the AIS/ICIS Administrative Office All rights reserved.
  • Jiang, S., Gao, Q., & Chen, H. (2013). Statistical modeling of nanotechnology knowledge diffusion networks. International Conference on Information Systems (ICIS 2013): Reshaping Society Through Information Systems Design, 4, 3552-3571.
    More info
    Abstract: Nanotechnology is crucial for industrial and scientific advancement, with millions of dollars being invested each year in nanotechnology-related research. Recent developments in information-technology enables modeling the knowledge diffusion process via online depositories of nanotechnology-related scientific publication records. Understanding the mechanism may help funding agencies use their funding effectively. This study uses Exponential Random Graph Models (ERGMs), a family of theorygrounded statistical models, to explore the knowledge diffusion patterns among nanotechnology researchers. We systematically evaluate how various attributes of researchers and public funding affect the knowledge diffusion processes. Results show that the impact of public funding on nanotechnology knowledge transfer has been increasing in recent years. Funding all kinds of researchers can stimulate knowledge transfer. Also, funding senior researchers help stimulate knowledge sharing. Our analysis framework of knowledge diffusion networks is effective in studying the knowledge diffusion patterns in nanotechnology, and can be easily applied to other fields. © (2013) by the AIS/ICIS Administrative Office All rights reserved.
  • Lim, E., Chen, H., & Chen, G. (2013). Business intelligence and analytics: Research directions. ACM Transactions on Management Information Systems, 3(4).
    More info
    Abstract: Business intelligence and analytics (BIA) is about the development of technologies, systems, practices, and applications to analyze critical business data so as to gain new insights about business and markets. The new insights can be used for improving products and services, achieving better operational efficiency, and fostering customer relationships. In this article, we will categorize BIA research activities into three broad research directions: (a) big data analytics, (b) text analytics, and (c) network analytics. The article aims to review the state-of-the-art techniques and models and to summarize their use in BIA applications. For each research direction, we will also determine a few important questions to be addressed in future research. © 2013 ACM.
  • Lin, Y., Chen, H., & Brown, R. A. (2013). MedTime: A temporal information extraction system for clinical narratives. Journal of Biomedical Informatics, 46(SUPPL.), S20-S28.
    More info
    Abstract: Temporal information extraction from clinical narratives is of critical importance to many clinical applications. We participated in the EVENT/TIMEX3 track of the 2012 i2b2 clinical temporal relations challenge, and presented our temporal information extraction system, MedTime. MedTime comprises a cascade of rule-based and machine-learning pattern recognition procedures. It achieved a micro-averaged f-measure of 0.88 in both the recognitions of clinical events and temporal expressions. We proposed and evaluated three time normalization strategies to normalize relative time expressions in clinical texts. The accuracy was 0.68 in normalizing temporal expressions of dates, times, durations, and frequencies. This study demonstrates and evaluates the integration of rule-based and machine-learning-based approaches for high performance temporal information extraction from clinical narratives. © 2013 Elsevier Inc.
  • Liu, X., & Chen, H. (2013). AZDrugMiner: An information extraction system for mining patient-reported adverse drug events in online patient forums. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8040 LNCS, 134-150.
    More info
    Abstract: Post-marketing drug surveillance is a critical component of drug safety. Drug regulatory agencies such as the U.S. Food and Drug Administration (FDA) rely on voluntary reports from health professionals and consumers contributed to its FDA Adverse Event Reporting System (FAERS) to identify adverse drug events (ADEs). However, it is widely known that FAERS underestimates the prevalence of certain adverse events. Popular patient social media sites such as DailyStrength and PatientsLikeMe provide new information sources from which patient-reported ADEs may be extracted. In this study, we propose an analytical framework for extracting patient-reported adverse drug events from online patient forums. We develop a novel approach - the AZDrugMiner system - based on statistical learning to extract ad-verse drug events in patient discussions and identify reports from patient experiences. We evaluate our system using a set of manually annotated forum posts which show promising performance. We also examine correlations and differences between patient ADE reports extracted by our system and reports from FAERS. We conclude that patient social media ADE reports can be extracted effectively using our proposed framework. Those patient reports can reflect unique perspectives in treatment and be used to improve patient care and drug safety. © 2013 Springer-Verlag.
  • Xin, L. i., & Chen, H. (2013). Recommendation as link prediction in bipartite graphs: A graph kernel-based machine learning approach. Decision Support Systems, 54(2), 880-890.
    More info
    Abstract: Recommender systems have been widely adopted in online applications to suggest products, services, and contents to potential users. Collaborative filtering (CF) is a successful recommendation paradigm that employs transaction information to enrich user and item features for recommendation. By mapping transactions to a bipartite user-item interaction graph, a recommendation problem is converted into a link prediction problem, where the graph structure captures subtle information on relations between users and items. To take advantage of the structure of this graph, we propose a kernel-based recommendation approach and design a novel graph kernel that inspects customers and items (indirectly) related to the focal user-item pair as its context to predict whether there may be a link. In the graph kernel, we generate random walk paths starting from a focal user-item pair and define similarities between user-item pairs based on the random walk paths. We prove the validity of the kernel and apply it in a one-class classification framework for recommendation. We evaluate the proposed approach with three real-world datasets. Our proposed method outperforms state-of-the-art benchmark algorithms, particularly when recommending a large number of items. The experiments show the necessity of capturing user-item graph structure in recommendation. © 2012 Elsevier B.V.
  • Zhang, Y., Dang, Y., & Chen, H. (2013). Research note: Examining gender emotional differences in Web forum communication. Decision Support Systems, 55(3), 851-860.
    More info
    Abstract: Web 2.0 has enabled and fostered Internet users to share and discuss their opinions and ideas online. Thus, a large amount of opinion-rich content has been generated. With more and more women starting to participate in online communications, questions regarding gender emotional differences in Web 2.0 communication platform have been raised. However, few studies have systematically examined such differences. Motivated to address this gap, we have developed an advanced and generic framework to automatically analyze gender emotional differences in social media. Algorithms are developed and embedded in the framework to conduct analyses in different granularity levels, including sentence level, phrase level, and word level. To demonstrate the proposed research framework, an empirical experiment is conducted on a large Web forum. The analysis results indicate that women are more likely to express their opinions subjectively than men (based on sentence-level analysis), and they are more likely to express both positive and negative emotions (based on phrase-level and word-level analyses). © 2013 Elsevier B.V.
  • Benjamin, V., & Chen, H. (2012). Securing cyberspace: Identifying key actors in hacker communities. ISI 2012 - 2012 IEEE International Conference on Intelligence and Security Informatics: Cyberspace, Border, and Immigration Securities, 24-29.
    More info
    Abstract: As the computer becomes more ubiquitous throughout society, the security of networks and information technologies is a growing concern. Recent research has found hackers making use of social media platforms to form communities where sharing of knowledge and tools that enable cybercriminal activity is common. However, past studies often report only generalized community behaviors and do not scrutinize individual members; in particular, current research has yet to explore the mechanisms in which some hackers become key actors within their communities. Here we explore two major hacker communities from the United States and China in order to identify potential cues for determining key actors. The relationships between various hacker posting behaviors and reputation are observed through the use of ordinary least squares regression. Results suggest that the hackers who contribute to the cognitive advance of their community are generally considered the most reputable and trustworthy among their peers. Conversely, the tenure of hackers and their discussion quality were not significantly correlated with reputation. Results are consistent across both forums, indicating the presence of a common hacker culture that spans multiple geopolitical regions. © 2012 IEEE.
  • Chau, M., Wang, A. G., Yue, W. T., & Chen, H. (2012). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7299 LNCS, IV.
  • Dang, Y., Zhang, Y., Chen, H., Brown, S. A., Hu, P. J., & Nunamaker Jr., J. F. (2012). Theory-informed design and evaluation of an advanced search and knowledge mapping system in nanotechnology. Journal of Management Information Systems, 28(4), 99-128.
    More info
    Abstract: Effective search support is an important tool for helping individuals deal with the problem of information overload. This is particularly true in the field of nanotechnology, where information from patents, grants, and research papers is growing rapidly. Guided by cognitive fit and cognitive load theories, we develop an advanced Web-based system, Nano Mapper, to support users' search and analysis of nanotechnology developments. We perform controlled experiments to evaluate the functions of Nano Mapper. We examine users' search effectiveness, efficiency, and evaluations of system usefulness, ease of use, and satisfaction. Our results demonstrate that Nano Mapper enables more effective and efficient searching, and users consider it to be more useful and easier to use than the benchmark systems. Users are also more satisfied with Nano Mapper and have higher intention to use it in the future. User evaluations of the analysis functions are equally positive. © 2012 M.E. Sharpe, Inc.
  • Hu, P. J., Wan, X., Dang, Y., Larson, C. A., & Chen, H. (2012). Evaluating an integrated forum portal for terrorist surveillance and analysis. ISI 2012 - 2012 IEEE International Conference on Intelligence and Security Informatics: Cyberspace, Border, and Immigration Securities, 168-170.
    More info
    Abstract: We experimentally evaluated the Dark Web Forum Portal by focusing on user task performance, usability, cognitive processing requirements, and societal benefits. Our results show that the portal performs perform well when compared with a benchmark forum. © 2012 IEEE.
  • Lu, H., Tsai, F., Chen, H., Hung, M., & Li, S. (2012). Credit rating change modeling using news and financial ratios. ACM Transactions on Management Information Systems, 3(3).
    More info
    Abstract: Credit ratings convey credit risk information to participants in financial markets, including investors, issuers, intermediaries, and regulators. Accurate credit rating information plays a crucial role in supporting sound financial decision-making processes. Most previous studies on credit rating modeling are based on accounting and market information. Text data are largely ignored despite the potential benefit of conveying timely information regarding a firm's outlook. To leverage the additional information in news full-text for credit rating prediction, we designed and implemented a news full-text analysis system that provides firm-level coverage, topic, and sentiment variables. The novel topic-specific sentiment variables contain a large fraction of missing values because of uneven news coverage. The missing value problem creates a new challenge for credit rating prediction approaches. We address this issue by developing a missingtolerant multinomial probit (MT-MNP) model, which imputes missing values based on the Bayesian theoretical framework. Our experiments using seven and a half years of real-world credit ratings and news full-text data show that (1) the overall news coverage can explain future credit rating changes while the aggregated news sentiment cannot; (2) topic-specific news coverage and sentiment have statistically significant impact on future credit rating changes; (3) topic-specific negative sentiment has a more salient impact on future credit rating changes compared to topic-specific positive sentiment; (4) MT-MNP performs better in predicting future credit rating changes compared to support vector machines (SVM). The performance gap as measured by macroaveraging F-measure is small but consistent. © 2012 ACM.
  • Schumaker, R. P., Zhang, Y., Huang, C., & Chen, H. (2012). Evaluating sentiment in financial news articles. Decision Support Systems, 53(3), 458-464.
    More info
    Abstract: Can the choice of words and tone used by the authors of financial news articles correlate to measurable stock price movements? If so, can the magnitude of price movement be predicted using these same variables? We investigate these questions using the Arizona Financial Text (AZFinText) system, a financial news article prediction system, and pair it with a sentiment analysis tool. Through our analysis, we found that subjective news articles were easier to predict in price direction (59.0% versus 50.0% of chance alone) and using a simple trading engine, subjective articles garnered a 3.30% return. Looking further into the role of author tone in financial news articles, we found that articles with a negative sentiment were easiest to predict in price direction (50.9% versus 50.0% of chance alone) and a 3.04% trading return. Investigating negative sentiment further, we found that our system was able to predict price decreases in articles of a positive sentiment 53.5% of the time, and price increases in articles of a negative sentiment 52.4% of the time. We believe that perhaps this result can be attributable to market traders behaving in a contrarian manner, e.g., see good news, sell; see bad news, buy. © 2012 Elsevier B.V. All rights reserved.
  • Woo, J., & Chen, H. (2012). An event-driven SIR model for topic diffusion in web forums. ISI 2012 - 2012 IEEE International Conference on Intelligence and Security Informatics: Cyberspace, Border, and Immigration Securities, 108-113.
    More info
    Abstract: Social media is being increasingly used as a communication channel. Among social media, web forums, where people in online communities disseminate and receive information by interaction, provide a good environment to examine information diffusion. In this research, we aim to understand the mechanisms and properties of the information diffusion in the web forum. For that, we model topic-level information diffusion in web forums using the baseline epidemic model, the SIR(Susceptible, Infective, and Recovered) model, frequently used in previous research to analyze disease outbreaks and knowledge diffusion. In addition, we propose an event-driven SIR model that reflects the event effect on information diffusion in the web forum. The proposed model incorporates the effect of news postings on the web forum. We evaluate two models using a large longitudinal dataset from the web forum of a major company. The event-SIR model outperforms the SIR model in fitting on major spikey topics that have peaks of author participation. © 2012 IEEE.
  • Yang, C. C., Chen, H., Wactlar, H., Combi, C. K., & Tang, X. (2012). SHB 2012: International workshop on smart health and wellbeing. ACM International Conference Proceeding Series, 2762-2763.
    More info
    Abstract: The Smart Health and Wellbeing workshop is organized to develop a platform for authors to discuss fundamental principles, algorithms or applications of intelligent data acquisition, processing and analysis of healthcare data. We are particularly interested in information and knowledge management papers, in which the approaches are accompanied by an in-depth experimental evaluation with real world data. This paper provides an overview of the workshop and the accepted contributions. © 2012 Authors.
  • Yang, C. C., Chen, H., Wactlar, H., Combi, C. K., & Tang, X. (2012). SHB chairs' welcome. International Conference on Information and Knowledge Management, Proceedings, iii.
  • Yang, M., & Chen, H. (2012). Partially supervised learning for radical opinion identification in hate group web forums. ISI 2012 - 2012 IEEE International Conference on Intelligence and Security Informatics: Cyberspace, Border, and Immigration Securities, 96-101.
    More info
    Abstract: Web forums are frequently used as platforms for the exchange of information and opinions, as well as propaganda dissemination. But online content can be misused when the information being distributed, such as radical opinions, is unsolicited or inappropriate. However, radical opinion is highly hidden and distributed in Web forums, while non-radical content is unspecific and topically more diverse. It is costly and time consuming to label a large amount of radical content (positive examples) and non-radical content (negative examples) for training classification systems. Nevertheless, it is easy to obtain large volumes of unlabeled content in Web forums. In this paper, we propose and develop a topic-sensitive partially supervised learning approach to address the difficulties in radical opinion identification in hate group Web forums. Specifically, we design a labeling heuristic to extract high quality positive examples and negative examples from unlabeled datasets. The empirical evaluation results from two large hate group Web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance than its counterparts. © 2012 IEEE.
  • Yang, M., Kiang, M., Chen, H., & Yijun, L. i. (2012). Artificial immune system for illicit content identification in social media. Journal of the American Society for Information Science and Technology, 63(2), 256-269.
    More info
    Abstract: Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets.The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance. © 2011 ASIS&T.
  • Zhang, Y., Dang, Y., Brown, S. A., & Chen, H. (2012). Understanding avatar sentiments using verbal and non- verbal cues. 18th Americas Conference on Information Systems 2012, AMCIS 2012, 5, 4030-4035.
    More info
    Abstract: With the increased popularity of virtual worlds, hundreds of thousands of people from different physical locations can join virtual worlds. In this computer-based simulated 3D environment, avatars can both interact with each other and the environment. This new type of world has important implications for business, education, and society at large. In order to fully use the benefits of virtual worlds, it is important to know how the residents (i.e., avatars) behave, such as how they express sentiments. This research in progress seeks to study avatar sentiments in virtual worlds to examine whether and how sentiments are conveyed by avatars. Both verbal and non-verbal cues will be utilized in the sentiment analysis. To conduct the study, an advanced data collection method is leveraged to obtain various types of avatar data from a large number of real virtual world residents in Second Life in an effective and efficient way. © (2012) by the AIS/ICIS Administrative Office All rights reserved.
  • Zimbra, D., & Chen, H. (2012). Scalable sentiment classification across multiple dark web forums. ISI 2012 - 2012 IEEE International Conference on Intelligence and Security Informatics: Cyberspace, Border, and Immigration Securities, 78-83.
    More info
    Abstract: This study examines several approaches to sentiment classification in the Dark Web Forum Portal, and opportunities to transfer classifiers and text features across multiple forums to improve scalability and performance. Although sentiment classifiers typically perform poorly when transferred across domains, experimentation reveals the devised approaches offer performance equivalent to the traditional forum-specific approach in classification in an unknown domain. Furthermore, incorporating the text features identified as significant indicators of sentiment in other forums can greatly improve the classification accuracy of the traditional forum-specific approach. © 2012 IEEE.
  • Abbasi, A., France, S., Zhang, Z., & Chen, H. (2011). Selecting attributes for sentiment classification using feature relation networks. IEEE Transactions on Knowledge and Data Engineering, 23(3), 447-462.
    More info
    Abstract: A major concern when incorporating large sets of diverse n-gram features for sentiment classification is the presence of noisy, irrelevant, and redundant attributes. These concerns can often make it difficult to harness the augmented discriminatory potential of extended feature sets. We propose a rule-based multivariate text feature selection method called Feature Relation Network (FRN) that considers semantic information and also leverages the syntactic relationships between n-gram features. FRN is intended to efficiently enable the inclusion of extended sets of heterogeneous n-gram features for enhanced sentiment classification. Experiments were conducted on three online review testbeds in comparison with methods used in prior sentiment classification research. FRN outperformed the comparison univariate, multivariate, and hybrid feature selection methods; it was able to select attributes resulting in significantly better classification accuracy irrespective of the feature subset sizes. Furthermore, by incorporating syntactic information about n-gram relations, FRN is able to select features in a more computationally efficient manner than many multivariate and hybrid techniques. © 2006 IEEE.
  • Chen, H. (2011). Editorial: Design science, grand challenges, and societal impacts. ACM Transactions on Management Information Systems, 2(1).
  • Chen, H. (2011). Smart health and wellbeing. IEEE Intelligent Systems, 26(5), 78-90.
    More info
    Abstract: In light of such overwhelming interest from governments and academia in adopting and advancing IT for effective healthcare, there are great opportunities for researchers and practitioners alike to invest efforts in conducting innovative and high-impact healthcare IT research. This IEEE Intelligent Systems Trends and Controversies (T&C) Department hopes to raise awareness and highlight selected recent research that helps move us toward such goals. This T&C department includes three articles on Smart Health and Wellbeing from distinguished experts in computer science, information systems, and medicine. Each article presents unique perspectives, advanced computational methods, and selected results and examples. © 2011 IEEE.
  • Chen, H. (2011). Smart market and money. IEEE Intelligent Systems, 26(6), 82-96.
    More info
    Abstract: With the widespread availability of Business Big Data and the recent advancement in text and Web mining, tremendous opportunities exist for computational and finance researchers to advance research relating to smart market and money. This T&C Department includes three article on smart market and money from distinguished experts in information systems and business. Each article presents unique perspectives, advanced computational methods, and selected results and examples. © 2006 IEEE.
  • Chen, H. (2011). Social intelligence and cultural awareness. IEEE Intelligent Systems, 26(4), 80-84.
    More info
    Abstract: The Board on Human-Systems Integration of the US National Research Council (NRC) held a workshop on Unifying Social Frameworks: Sociocultural Data to Accomplish Department of Defense Missions from 16-17 August 2010. Presenters and discussants addressed the variables and complex influenced human behavior, focusing on potential applications to the full spectrum of military operations. Major General Michael T. Flynn of the US Army, delivered the keynote address providing vital information about the cultural situation and needs of the military operating in Afghanistan. Two themes emerged from the workshop, including a theme focusing on data, its collection, its use in models, and the value of analyzing large collections of sociocultural data to identify the groups and individuals that are expected to pose risks in a particular environment.
  • Chen, H., & Yang, C. C. (2011). Special issue on social media analytics: Understanding the pulse of the society. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, 41(5), 826-.
  • Chen, H., & Zhang, Y. (2011). Trends and controversies. IEEE Intelligent Systems, 26(1), 80-89.
    More info
    Abstract: The rich social media data generated in virtual worlds has important implications for business, education, social science, and society at large. Similarly, massively multiplayer online games (MMOGs) have become increasingly popular and have online communities comprising tens of millions of players. They serve as unprecedented tools for theorizing about and empirically modeling the social and behavioral dynamics of individuals, groups, and networks within large communities. Some technologists consider virtual worlds and MMOGs to be likely candidates to become the Web 3.0. AI can play a significant role, from multiagent avatar research and immersive virtual interface design to virtual world and MMOG Web mining and computational social science modeling. This issue includes articles with research examples from distinguished experts in social science and computer science. Each article presents a unique research framework, computational methods, and selected results. © 2011 IEEE.
  • Chen, H., Chau, M., & Li, S. (2011). Enterprise risk and security management: Data, text and Web mining. Decision Support Systems, 50(4), 649-650.
  • Chen, H., Denning, D., Roberts, N., Larson, C. A., Ximing, Y. u., & Huang, C. (2011). The dark web forum portal: From multi-lingual to video. Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, 7-14.
    More info
    Abstract: Counter-terrorism, intelligence analysts, and other investigators continue to analyze the Internet presence of terrorists, hate groups, and other extremists through the study of primary sources including terrorists' own websites, videos, chat sites, and Internet forums. Forums and videos are both particularly rich sources of information. Forums discussion sites supporting online conversations capture each conversation in a thread and the ensuing postings are usually time-stamped and attributable to a particular online poster (author). With careful analysis, they can reveal trends in topics and discussions, the sequencing of ideas, and the relationships between posters. Videos gain a global audience when posted to YouTube, but identifying and finding videos relating to a specific interest or topic can be difficult among the tens of millions of available items. The Dark Web Forum Portal was originally constructed to allow the examination, from a broad perspective, of the use of Web forums by terrorist and extremist groups. The Video Portal module has been added to facilitate the study of video as it is used by these groups. Both portals are available to researchers on a request basis. In this paper, we examine the evolution of the Dark Web Forum Portal's system design, share the results of a user evaluation, and provide an overview of the development of the new video portal. © 2011 IEEE.
  • Chen, H., Larson, C. A., Elhourani, T., Zimbra, D., & Ware, D. (2011). The Geopolitical Web: Assessing societal risk in an uncertain world. Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, 60-64.
    More info
    Abstract: Country risk - the likelihood that a state will weaken or fail - and the methods of assessing it continue to be of serious concern to the international community. Country risk has traditionally been assessed by monitoring economic and financial indicators. However, social media (such as forums, blogs, and websites) are now important transporters of citizens' daily conversations and opinions, and as such may carry discernible indicators of risk, but they have been as yet little-used for this task. The Geopolitical Web project is a research effort with the ultimate goal of developing computational approaches for monitoring public opinion in regions of conflict, assessing country risk indicators in the social media of fragile or weakening states, and correlating these risk signals with commonly accepted quantitative geopolitical risk assessments. This paper presents the initial motivation for this data-driven project, collection procedures adopted, preliminary results of an automated topical analysis of the collection's content, and expected future work. By catching and deciphering possible signals of country risk in social discourse we hope to offer the international community an additional means of assessing the need for intervention in or support for fragile or weakening states. © 2011 IEEE.
  • Chen, H., Zhou, Y., Reid, E. F., & Larson, C. A. (2011). Introduction to special issue on terrorism informatics. Information Systems Frontiers, 13(1), 1-3.
  • Chen, K., Lu, H., Chen, T., Li, S., Lian, J., & Chen, H. (2011). Giving context to accounting numbers: The role of news coverage. Decision Support Systems, 50(4), 673-679.
    More info
    Abstract: Accounting numbers such as earnings per share are an important information source that conveys the value of firms. Previous studies on the return-earnings relation have confirmed that stock prices react to the information content in accounting numbers. However, other information sources such as financial news may also contain value-relevant information and affect investors' reaction to earnings announcements. We quantify news coverage about S&P 500 companies in the Wall Street Journal (WSJ) before earnings announcements and model its interaction with the return-earnings relation. Our empirical results show that news coverage decreases the information content of unexpected earnings and thus leads to a lower earnings response coefficient (ERC) for firms with higher news coverage. Statistically significant interaction between news coverage and unexpected earnings was observed. News coverage does not impact cumulated abnormal returns directly. We further document that this finding is not driven by firm size. The results suggest that financial news may play an important role in conveying value-related information to the markets. © 2010 Elsevier B.V. All rights reserved.
  • Chen, Y., Brown, S. A., Hu, P. J., King, C., & Chen, H. (2011). Managing emerging infectious diseases with information systems: Reconceptualizing outbreak management through the lens of loose coupling. Information Systems Research, 22(3), 447-468.
    More info
    Abstract: Increasing global connectivity makes emerging infectious diseases (EID) more threatening than ever before. Various information systems (IS) projects have been undertaken to enhance public health capacity for detect- ing EID in a timely manner and disseminating important public health information to concerned parties. While those initiatives seemed to offer promising solutions, public health researchers and practitioners raised concerns about their overall effectiveness. In this paper, we argue that the concerns about current public health IS projects are partially rooted in the lack of a comprehensive framework that captures the complexity of EID management to inform and evaluate the development of public health IS. We leverage loose coupling to analyze news cov- erage and contact tracing data from 479 patients associated with the severe acute respiratory syndrome (SARS) outbreak in Taiwan. From this analysis, we develop a framework for outbreak management. Our proposed framework identifies two types of causal circles-coupling and decoupling circles-between the central public health administration and the local capacity for detecting unusual patient cases. These two circles are triggered by important information-centric activities in public health practices and can have significant influence on the effectiveness of EID management. We derive seven design guidelines from the framework and our analysis of the SARS outbreak in Taiwan to inform the development of public health IS. We leverage the guidelines to evaluate current public health initiatives. By doing so, we identify limitations of existing public health IS, highlight the direction future development should consider, and discuss implications for research and public health policy. © 2011 INFORMS.
  • Dang, Y., Zhang, Y., Hu, P. J., Brown, S. A., & Chen, H. (2011). Knowledge mapping for rapidly evolving domains: A design science approach. Decision Support Systems, 50(2), 415-427.
    More info
    Abstract: Knowledge mapping can provide comprehensive depictions of rapidly evolving scientific domains. Taking the design science approach, we developed a Web-based knowledge mapping system (i.e., Nano Mapper) that provides interactive search and analysis on various scientific document sources in nanotechnology. We conducted multiple studies to evaluate Nano Mapper's search and analysis functionality respectively. The search functionality appears more effective than that of the benchmark systems. Subjects exhibit favorable satisfaction with the analysis functionality. Our study addresses several gaps in knowledge mapping for nanotechnology and illustrates desirability of using the design science approach to design, implement, and evaluate an advanced information system. © 2010 Elsevier B.V. All rights reserved.
  • Hu, P. J., & Chen, H. (2011). Analyzing information systems researchers' productivity and impacts: A perspective on the H index. ACM Transactions on Management Information Systems, 2(2).
    More info
    Abstract: Quantitative assessments of researchers' productivity and impacts are crucial for the information systems (IS) discipline. Motivated by its growing popularity and expanding use, we offer a perspective on the h index, which refers to the number of papers a researcher has coauthored with at least h citations each. We studied a partial list of 232 top IS researchers who received doctoral degrees between 1957 and 2003 and chose Google Scholar as the source for our analyses. At the individual level, we attempted to identify some of the most productive, high-impact researchers, as well as those who exhibited impressive paces of productivity. At the institution level, we revealed some institutions with relatively more productive researchers, as well as institutions that had produced more productive researchers. We also analyzed the overall IS community by examining the primary research areas of productive scholars identified by our analyses. We then compared their h index scores with those of top scholars in several related disciplines. © 2011 ACM.
  • Hu, P. J., Chen, H., Hu, H., Larson, C., & Butierez, C. (2011). Law enforcement officers' acceptance of advanced e-government technology: A survey study of COPLINK Mobile. Electronic Commerce Research and Applications, 10(1), 6-16.
    More info
    Abstract: Timely information access and knowledge support is critical for law enforcement, because officers require convenient and timely access to accurate data, relevant information, and integrated knowledge in their crime investigation and fighting activities. As an integrated system that provides such support, COPLINK can improve collaboration within and across agency boundaries. This study examines field officers' acceptance and actual use of COPLINK Mobile, a critical technology that offers COPLINK core query functionalities through a lightweight, handheld device or mobile applications running on a small bandwidth. We propose and empirically test a factor model explaining the focal technology acceptance with survey data collected from 40 field officers. The data support our model and most of the hypotheses, which can reasonably explain an officer's acceptance and actual use of COPLINK Mobile. Among the determinants investigated, perceived usefulness has the greatest impact and depends on both efficiency gain and social influence. Our findings have important implications for both research and practice. © 2010 Elsevier B.V. All rights reserved.
  • Jiexun, L. i., Wang, G. A., & Chen, H. (2011). Identity matching using personal and social identity features. Information Systems Frontiers, 13(1), 101-113.
    More info
    Abstract: Identity verification is essential in our mission to identify potential terrorists and criminals. It is not a trivial task because terrorists reportedly assume multiple identities using either fraudulent or legitimate means. A national identification card and biometrics technologies have been proposed as solutions to the identity problem. However, several studies show their inability to tackle the complex problem. We aim to develop data mining alternatives that can match identities referring to the same individual. Existing identity matching techniques based on data mining primarily rely on personal identity features. In this research, we propose a new identity matching technique that considers both personal identity features and social identity features. We define two groups of social identity features including social activities and social relations. The proposed technique is built upon a probabilistic relational model that utilizes a relational database structure to extract social identity features. Experiments show that the social activity features significantly improve the matching performance while the social relation features effectively reduce false positive and false negative decisions. © 2010 Springer Science+Business Media, LLC.
  • Limayem, M., Niederman, F., Slaughter, S. A., Chen, H., Gregor, S., & Winter, S. J. (2011). What are the grand challenges in information systems research? a debate and discussion. International Conference on Information Systems 2011, ICIS 2011, 5, 4421-4425.
  • Liu, X., Kaza, S., Zhang, P., & Chen, H. (2011). Determining inventor status and its effect on knowledge diffusion: A study on nanotechnology literature from China, Russia, and India. Journal of the American Society for Information Science and Technology, 62(6), 1166-1176.
    More info
    Abstract: In an increasingly global research landscape, it is important to identify the most prolific researchers in various institutions and their influence on the diffusion of knowledge. Knowledge diffusion within institutions is influenced by not just the status of individual researchers but also the collaborative culture that determines status. There are various methods to measure individual status, but few studies have compared them or explored the possible effects of different cultures on the status measures. In this article, we examine knowledge diffusion within science and technology-oriented research organizations. Using social network analysis metrics to measure individual status in large-scale coauthorship networks, we studied an individual's impact on the recombination of knowledge to produce innovation in nanotechnology. Data from the most productive and high-impact institutions in China (Chinese Academy of Sciences), Russia (Russian Academy of Sciences), and India (Indian Institutes of Technology) were used. We found that boundary-spanning individuals influenced knowledge diffusion in all countries. However, our results also indicate that cultural and institutional differences may influence knowledge diffusion. © 2011 ASIS&T.
  • Qin, J., Zhou, Y., & Chen, H. (2011). A multi-region empirical study on the internet presence of global extremist organizations. Information Systems Frontiers, 13(1), 75-88.
    More info
    Abstract: Extremist organizations are heavily utilizing Internet technologies to increase their abilities to influence the world. Studying those global extremist organizations' Internet presence would allow us to better understand extremist organizations' technical sophistication and their propaganda plans. In this work, we explore an integrated approach for collecting and analyzing extremist Internet presence. We employed automatic Web crawling techniques to build a comprehensive international extremist Web collection. We then used a systematic content analysis tool called the Dark Web Attribute System to analyze and compare these extremist organizations' Internet usage from three perspectives: technical sophistication, content richness, and Web interactivity. By studying 1.7 million multimedia Web documents from around 224 Web sites of extremist organizations, we found that while all extremist organizations covered in this study demonstrate high level of technical sophistication in their Web presence, Middle Eastern extremists are among the most sophisticated groups in both technical sophistication and media richness. US groups are the most active in supporting Internet communications. Our analysis results will help domain experts deepen their understanding on the global extremism movements and make better counter-extremism measures on the Internet. © 2010 Springer Science+Business Media, LLC.
  • Suakkaphong, N., Zhang, Z., & Chen, H. (2011). Disease named entity recognition using semisupervised learning and conditional random fields. Journal of the American Society for Information Science and Technology, 62(4), 727-737.
    More info
    Abstract: Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005. © 2011 ASIS&T.
  • Wang, G. A., Atabakhsh, H., & Chen, H. (2011). A hierarchical Naïve Bayes model for approximate identity matching. Decision Support Systems, 51(3), 413-423.
    More info
    Abstract: Organizations often manage identity information for their customers, vendors, and employees. Identity management is critical to various organizational practices ranging from customer relationship management to crime investigation. The task of searching for a specific identity is difficult because disparate identity information may exist due to the issues related to unintentional errors and intentional deception. In this paper we propose a hierarchical Naïve Bayes model that improves existing identity matching techniques in terms of searching effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based matching technique. With 50% training instances labeled, the proposed semi-supervised learning achieves a performance comparable to the fully supervised record comparison algorithm. The semi-supervised learning greatly reduces the efforts of manually labeling training instances without significant performance degradation. © 2011 Elsevier B.V. All rights reserved.
  • Woo, J., Son, J., & Chen, H. (2011). An SIR model for violent topic diffusion in social media. Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, 15-19.
    More info
    Abstract: Social media is being increasingly used as a political communication channel. The web makes it easy to spread extreme opinions or ideologies that were once restricted to small groups. Terrorists and extremists use the web to deliver their extreme ideology to people and encourage them to get involved in fanatic behaviors. In this research, we aim to understand the mechanisms and properties of the exposure process to extreme opinions through these new publication methods, especially web forums. We propose the topic diffusion model for web forums, based on the SIR (Susceptible, Infective, and Recovered) model frequently used in previous research to analyze disease outbreaks and knowledge diffusion. The logistic growth of possible authors, the interaction between possible authors and current authors, and the influence decay of past authors are incorporated in a novel topic-based SIR model. From the proposed model we can estimate the maximum number of authors on a topic, the degree of infectiousness of a topic, and the rate describing how fast past authors lose influence over others. We apply the proposed model to a major international Jihadi forum where extreme ideology is expounded and evaluate the model on the diffusion of major violent topics. The fitting results show that it is plausible to describe the mechanism of violent topic diffusion in web forums with the SIR epidemic model. © 2011 IEEE.
  • Zeng, S., Lin, M., & Chen, H. (2011). Dynamic user-level affect analysis in social media: Modeling violence in the dark web. Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, 1-6.
    More info
    Abstract: Affect represents a person's emotions toward objects, issues or other persons. Recent years have witnessed a surge in studies of users' affect in social media, as marketing literature has shown that users' affect influences decision making. The current literature in this area, however, has largely focused on the message level, using text-based features and various classification approaches. Such analyses not only overlook valuable information about the user who posts the messages, but also fail to consider that users' affect may change over time. To overcome these limitations, we propose a new research design for social media affect analysis by specifically incorporating users' characteristics and the time dimension. We illustrate our research design by applying it to a major Dark Web forum of international Jihadists. Empirical results show that our research design allows us to draw on theories from other disciplines, such as social psychology, to provide useful insights on the dynamic change of users' affect in social media. © 2011 IEEE.
  • Zhang, Y., Dang, Y., & Chen, H. (2011). Gender classification for web forums. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, 41(4), 668-677.
    More info
    Abstract: More and more women are participating in and exchanging opinions through community-based online social media. Questions concerning gender differences in the new media have been raised. This paper proposes a feature-based text classification framework to examine online gender differences between Web forum posters by analyzing writing styles and topics of interest. Our experiment on an Islamic women's political forum shows that feature sets containing both content-free and content-specific features perform significantly better than those consisting of only content-free features, feature selection can improve the classification results significantly, and female and male participants have significantly different topics of interest. © 2011 IEEE.
  • Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., & Nunamaker Jr., J. F. (2010). Detecting fake websites: The contribution of statistical learning theory. MIS Quarterly: Management Information Systems, 34(SPEC. ISSUE 3), 435-461.
    More info
    Abstract: Fake websites have become increasingly pervasive, generating billions of dollars in fraudulent revenue at the expense of unsuspecting Internet users. The design and appearance of these websites makes it difficult for users to manually identify them as fake. Automated detection systems have emerged as a mechanism for combating fake websites, however most are fairly simplistic in terms of their fraud cues and detection methods employed. Consequently, existing systems are susceptible to the myriad of obfuscation tactics used by fraudsters, resulting in highly ineffective fake website detection performance. In light of these deficiencies, we propose the development of a new class of fake website detection systems that are based on statistical learning theory (SLT). Using a design science approach, a prototype system was developed to demonstrate the potential utility of this class of systems. We conducted a series of experiments, comparing the proposed system against several existing fake website detection systems on a test bed encompassing 900 websites. The results indicate that systems grounded in SLT can more accurately detect various categories of fake websites by utilizing richer sets of fraud cues in combination with problem-specific knowledge. Given the hefty cost exacted by fake websites, the results have important implications for E-commerce and online security.
  • Chau, M., Wong, C. H., Zhou, Y., Qin, J., & Chen, H. (2010). Evaluating the use of search engine development tools in IT education. Journal of the American Society for Information Science and Technology, 61(2), 288-299.
    More info
    Abstract: It is important for education in computer science and information systems to keep up to date with the latest development in technology. With the rapid development of the Internet and the Web, many schools have included Internet-related technologies, such as Web search engines and e-commerce, as part of their curricula. Previous research has shown that it is effective to use search engine development tools to facilitate students' learning. However, the effectiveness of these tools in the classroom has not been evaluated. In this article, we review the design of three search engine development tools, SpidersRUs, Greenstone, and Alkaline, followed by an evaluation study that compared the three tools in the classroom. In the study, 33 students were divided into 13 groups and each group used the three tools to develop three independent search engines in a class project. Our evaluation results showed that SpidersRUs performed better than the two other tools in overall satisfaction and the level of knowledge gained in their learning experience when using the tools for a class project on Internet applications development. © 2009 ASIS & T.
  • Chen, H. (2010). AI and security informatics. IEEE Intelligent Systems, 25(5), 82-83.
    More info
    Abstract: Based on the available crime and intelligence knowledge, federal, state, and local authorities can make timely and accurate decisions to select effective strategies and tactics as well as allocate the appropriate amount of resources to detect, prevent, and respond to future attacks. Facing the critical mission of international security and various data and technical challenges, there is a pressing need to develop the science of security informatics. The main objective is the development of advanced information technologies, systems, algorithms, and databases for security-related applications using an integrated technological, organizational, and policy-based approach. Intelligent systems have much to contribute for this emerging field. © 2010 IEEE.
  • Chen, H. (2010). Business and market intelligence 2.0. IEEE Intelligent Systems, 25(1), 68-71.
    More info
    Abstract: Some articles on Business and Market Intelligence 2.0 from distinguished experts in marketing science, finance, accounting, and computer science, are presented. 'The Phase Transition of Markets and Organizations: The New Intelligence and Entrepreneurial Frontier', characterizes phase transition in markets and organizations as a move from individuals and resources being separate to being together. 'User- Generated Content on Social Media: Predicting New Product Market Success from Online Word of Mouth', explores the predictive validity of various text and sentiment measures of online word of mouth (WOM) for the market success of new products. 'On Data- Driven Analysis of User-Generated Content', discusses data-driven approaches, including content and network analysis that can be used to derive insights and characterize user-generated content from companies and other organizations.
  • Chen, H. (2010). Editorial: Welcome to the first issue of ACM TMIS. ACM Transactions on Management Information Systems, 1(1).
  • Chen, H., & Zimbra, D. (2010). AI and opinion mining. IEEE Intelligent Systems, 25(3), 74-76.
    More info
    Abstract: Opinion mining which is a sub discipline within data mining and computational linguistics refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online news sources, social media comments, and other user-generated content is discussed. Frameworks and methods for integrating sentiments and opinions expressed with other computational representations such as interesting topics or product features extracted from user-generated text, participant reply networks, spikes and outbreaks of ideas or events are also critically needed. Disagreement and subjectivity also held significant relationships with volatility, where less disagreement and high levels of subjectivity predicted periods of high stock volatility. Positive sentiment reduces trading volume, perhaps because satisfied shareholders hold their stock, while negative sentiment induces trading activity as shareholders defect.
  • Chen, H., Chau, M., Li, S., Urs, S. R., Srinivasa, S., & Wang, G. A. (2010). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6122 LNCS.
  • Dang, Y., Zhang, Y., & Chen, H. (2010). A lexicon-enhanced method for sentiment classification: An experiment on online product reviews. IEEE Intelligent Systems, 25(4), 46-53.
    More info
    Abstract: A proposed lexicon-enhanced method for sentiment classification combines machine-learning and semantic-orientation approaches into one framework that significantly improves sentiment-classification performance. © 2010 IEEE.
  • Dang, Y., Zhang, Y., Fan, L., Chen, H., & Roco, M. C. (2010). Trends in worldwide nanotechnology patent applications: 1991 to 2008. Journal of Nanoparticle Research, 12(3), 687-706.
    More info
    Abstract: Nanotechnology patent applications published during 1991-2008 have been examined using the "title-abstract" keyword search on esp@cenet "worldwide" database. The longitudinal evolution of the number of patent applications, their topics, and their respective patent families have been evaluated for 15 national patent offices covering 98% of the total global activity. The patent offices of the United States (USA), People's Republic of China (PRC), Japan, and South Korea have published the largest number of nanotechnology patent applications, and experienced significant but different growth rates after 2000. In most repositories, the largest numbers of nanotechnology patent applications originated from their own countries/regions, indicating a significant "home advantage." The top applicant institutions are from different sectors in different countries (e.g., from industry in the US and Canada patent offices, and from academe or government agencies at the PRC office). As compared to 2000, the year before the establishment of the US National Nanotechnology Initiative (NNI), numerous new invention topics appeared in 2008, in all 15 patent repositories. This is more pronounced in the USA and PRC. Patent families have increased among the 15 patent offices, particularly after 2005. Overlapping patent applications increased from none in 1991 to about 4% in 2000 and to about 27% in 2008. The largest share of equivalent nanotechnology patent applications (1,258) between two repositories was identified between the US and Japan patent offices.
  • Huang, C., Tianjun, F. u., & Chen, H. (2010). Text-based video content classification for online video-sharing sites. Journal of the American Society for Information Science and Technology, 61(5), 891-906.
    More info
    Abstract: With the emergence of Web 2.0, sharing personal content, communicating ideas, and interacting with other online users in Web 2.0 communities have become daily routines for online users. User-generated data from Web 2.0 sites provide rich personal information (e.g., personal preferences and interests) and can be utilized to obtain insight about cyber communities and their social networks. Many studies have focused on leveraging usergenerated information to analyze blogs and forums, but few studies have applied this approach to video-sharing Web sites. In this study, we propose a text-based framework for video content classification of online-video sharing Web sites. Different types of user-generated data (e.g., titles, descriptions, and comments) were used as proxies for online videos, and three types of text features (lexical, syntactic, and content-specific features) were extracted. Three feature-based classification techniques (C4.5, Naïve Bayes, and Support Vector Machine) were used to classify videos. To evaluate the proposed framework, user-generated data from candidate videos, which were identified by searching user-given keywords on You Tube, were first collected.Then, a subset of the collected data was randomly selected and manually tagged by users as our experiment data.The experimental results showed that the proposed approach was able to classify online videos based on users' interests with accuracy rates up to 87.2%, and all three types of text features contributed to discriminating videos. Support Vector Machine outperformed C4.5 and Naïve Bayes techniques in our experiments. In addition, our case study further demonstrated that accurate video-classification results are very useful for identifying implicit cyber communities on video-sharing Web sites. © 2010 ASIS&T.
  • Jiexun, L. i., Wang, G., & Chen, H. (2010). Identity matching using personal and social identity features. Information Systems Frontiers, 1-13.
    More info
    Abstract: Identity verification is essential in our mission to identify potential terrorists and criminals. It is not a trivial task because terrorists reportedly assume multiple identities using either fraudulent or legitimate means. A national identification card and biometrics technologies have been proposed as solutions to the identity problem. However, several studies show their inability to tackle the complex problem. We aim to develop data mining alternatives that can match identities referring to the same individual. Existing identity matching techniques based on data mining primarily rely on personal identity features. In this research, we propose a new identity matching technique that considers both personal identity features and social identity features. We define two groups of social identity features including social activities and social relations. The proposed technique is built upon a probabilistic relational model that utilizes a relational database structure to extract social identity features. Experiments show that the social activity features significantly improve the matching performance while the social relation features effectively reduce false positive and false negative decisions. © 2010 Springer Science+Business Media, LLC.
  • Liu, Y., Chen, Y., Lusch, R. F., Chen, H., Zimbra, D., & Zeng, S. (2010). User-generated content on social media: Predicting market success with online word-of-mouth. IEEE Intelligent Systems, 25(1), 75-78.
    More info
    Abstract: Online social media, a user-generated content or online word of mouth (WOM), which allows consumers to share their product opinions and experience and has the potential to influence product sales and firm strategy, is studied in context of the Hollywood movie industry. An online WOM information was collected from the message board of Yahoo Movies for a total of 257 movies released from 2005 to 2006. SentiWordNet and OpinionFinder, two lexical packages of computational linguistics, were used to construct the sentiment measures for the WOM data. Results show that WOM communication starts early in the preproduction period, becomes highly active before movie release, and diminishes as the movie is shown for more weeks in theaters. A movie that receives more active WOM communication tends to receive higher evaluations from movie critics, suggesting the number of messages could work as a signal for product quality.
  • Lu, H., Chen, H., Chen, T., Hung, M., & Li, S. (2010). Financial text mining: Supporting decision making using web 2.0 content. IEEE Intelligent Systems, 25(2), 78-82.
    More info
    Abstract: The significant use of online technologies has facilitated the creation of large amounts of textual data. The continuous textual data requires the development of a surveillance system that can collect, filter, extract, quantify, and analyze relevant information from the Internet. Finance-related textual content is divided into three categories, the first includes forums, blogs, and wikis, the second category includes news and research reports and the third category involves finance-related content generated by firms. Several firms maintain their own Web sites as a communication channel with consumers and investors. Public companies are required to submit their filings to the Edgar system, which is publicly accessible on the Web. The growing body of Web 2.0 content can facilitate the implementation of near real-time monitoring system and allow financial institutions to benefit from the continues textual data.
  • Schumaker, R. P., & Chen, H. (2010). A discrete stock price prediction engine based on financial news. Computer, 43(1), 51-56.
    More info
    Abstract: The Arizona Financial Text system leverages statistical learning to make trading decisions based on numeric price predictions. Research demonstrates that AZFinText outperforms the market average and performs well against existing quant funds. © 2006 IEEE.
  • Schumaker, R. P., & Chen, H. (2010). Interaction analysis of the ALICE chatterbot: A two-study investigation of dialog and domain questioning. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, 40(1), 40-51.
    More info
    Abstract: This paper analyzes and compares the data gathered from two previously conducted Artificial Linguistic Internet Chat Entity (ALICE) chatterbot studies that were focused on response accuracy and user satisfaction measures for six chatterbots. These chatterbots were further loaded with varying degrees of conversational, telecommunications, and terrorism knowledge. From our prior experiments using 347 participants, we obtained 33 446 human/chatterbot interactions. It was found that asking the ALICE chatterbots "are" and "where" questions resulted in higher response satisfaction levels, as compared to other interrogative-style inputs because of their acceptability to vague,binary, or clichéd chatterbot responses. We also found a relationship between the length of a query and the users perceived satisfaction of the chatterbot response, where shorter queries led to more satisfying responses. © 2009 IEEE.
  • Schumaker, R. P., Solieman, O. K., & Chen, H. (2010). Sports knowledge management and data mining. Annual Review of Information Science and Technology, 44, 115-157.
  • Tianjun, F. u., Abbasi, A., & Chen, H. (2010). A focused crawler for dark web forums. Journal of the American Society for Information Science and Technology, 61(6), 1213-1231.
    More info
    Abstract: The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings.The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic-and incremental-update approaches. Using the system, we were able to collect over 100 DarkWeb forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities. © 2010 ASIS&T.
  • Xin, L. i., Chen, H., & Su, L. i. (2010). Exploiting emotions in social interactions to detect online social communities. PACIS 2010 - 14th Pacific Asia Conference on Information Systems, 1426-1437.
    More info
    Abstract: The rapid development of Web 2.0 allows people to be involved in online interactions more easily than before and facilitates the formation of virtual communities. Online communities exert influence on their members' online and offline behaviors. Therefore, they are of increasing interest to researchers and business managers. Most virtual community studies consider subjects in the same Web application belong to one community. This boundary-defining method neglects subtle opinion differences among participants with similar interests. It is necessary to unveil the community structure of online participants to overcome this limitation. Previous community detection studies usually account for the structural factor of social networks to build their models. Based on the affect theory of social exchange, this research argues that emotions involved in social interactions should be considered in the community detection process. We propose a framework to extract social interactions and interaction emotions from user-generated contents and a GN-H co-training algorithm to utilize the two types of information in community detection. We show the benefit of including emotion information in community detection using simulated data. We also conduct a case study on a real-world Web forum dataset to exemplify the utility of the framework in identifying communities to support further analysis.
  • Xin, L. i., Chen, H., Jiexun, L. i., & Zhang, Z. (2010). Gene function prediction with gene interaction networks: A context graph kernel approach. IEEE Transactions on Information Technology in Biomedicine, 14(1), 119-128.
    More info
    PMID: 19789115;Abstract: Predicting gene functions is a challenge for biologists in the postgenomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels. © 2009 IEEE.
  • Yungchang, K. u., Chiu, C., Zhang, Y., Fan, L., & Chen, H. (2010). Global disease surveillance using social media: HIV/AIDS content intervention in web forums. ISI 2010 - 2010 IEEE International Conference on Intelligence and Security Informatics: Public Safety and Security, 170-.
    More info
    Abstract: Collecting potential data sources for use in proactively analyzing and evaluating strategies for syndromic surveillance and bio-defense have become critical issues in and challenges to infectious disease informatics [1]. Web forums link highly relevant information about patients' needs, disease pain, health conditions, and concerns for medical practice [2]. Health departments or medical service groups can use the information in the forums to identify disease sources and scope and to detect outbreaks while the possibility for intervention remains. © 2010 IEEE.
  • Zhang, Y., Ximing, Y. u., Dang, Y., & Chen, H. (2010). An integrated framework for avatar data collection from the virtual world. IEEE Intelligent Systems, 25(6), 17-23.
    More info
    Abstract: To mine the rich social media data produced in virtual worlds, an integrated framework combines bot- and spider-based approaches to collect avatar behavioral and profile data. © 2010 IEEE.
  • Zhang, Y., Zeng, S., Huang, C., Fan, L., Ximing, Y. u., Dang, Y., Larson, C. A., Denning, D., Roberts, N., & Chen, H. (2010). Developing a Dark Web collection and infrastructure for computational and social sciences. ISI 2010 - 2010 IEEE International Conference on Intelligence and Security Informatics: Public Safety and Security, 59-64.
    More info
    Abstract: In recent years, there have been numerous studies from a variety of perspectives analyzing the Internet presence of hate and extremist groups. Yet the websites and forums of extremist and terrorist groups have long remained an underutilized resource for terrorism researchers due to their ephemeral nature and access and analysis problems. The purpose of the Dark Web archive is to provide a research infrastructure for use by social scientists, computer and information scientists, policy and security analysts, and others studying a wide range of social and organizational phenomena and computational problems. The Dark Web Forum Portal provides web enabled access to critical international jihadist and other extremist web forums. The focus of this paper is on the significant extensions to previous work including: increasing the scope of data collection, adding an incremental spidering component for regular data updates; enhancing the searching and browsing functions; enhancing multilingual machine-translation for Arabic, French, German and Russian; and advanced Social Network Analysis. A case study on identifying active participants is shown at the end. © 2010 IEEE.
  • Zhu, B., Watts, S., & Chen, H. (2010). Visualizing social network concepts. Decision Support Systems, 49(2), 151-161.
    More info
    Abstract: Social network concepts are invaluable for understanding the social network phenomena, but they are difficult to comprehend without computerized visualization. However, most existing network visualization techniques provide limited support for the comprehension of network concepts. This research proposes an approach called concept visualization to facilitate the understanding of social network concepts. The paper describes an implementation of the approach. Results from a controlled laboratory experiment indicate that, compared with the benchmark system, the NetVizer system facilitated better understanding of the concepts of betweenness centrality, gatekeepers of subgroups, and structural similarity. It also supported a faster comprehension of subgroup identification. © 2010.
  • Zimbra, D., & Chen, H. (2010). Comparing the virtual linkage intensity and real world proximity of social movements. ISI 2010 - 2010 IEEE International Conference on Intelligence and Security Informatics: Public Safety and Security, 144-146.
    More info
    Abstract: The relationships between phenomena observed in the real world and their representations in virtual contexts have generated interest among researchers. In particular, the manifestations of social movements in virtual environments have been examined, with many studies dedicated to the analysis of the virtual linkages between groups. In this research, a form of link analysis was performed to examine the relationship between virtual linkage intensity and real world physical proximity among the social movement groups identified in the Southern Poverty Law Center Spring 2009 Intelligence Report. Findings indicate the existence of significant relationships between virtual linkage intensity and physical proximity, distinctive to various ideological categorizations. The results provide valuable insights into the behaviors of social movements in virtual environments. © 2010 IEEE.
  • Zimbra, D., Abbasi, A., & Chen, H. (2010). A Cyber-archaeology Approach to Social Movement Research: Framework and Case Study. Journal of Computer-Mediated Communication, 16(1), 48-70.
    More info
    Abstract: This paper presents a cyber-archaeology approach to social movement research. The approach overcomes many of the issues of scale and complexity facing social research in the Internet, enabling broad and longitudinal study of the virtual communities supporting social movements. Cultural cyber-artifacts of significance to the social movement are collected and classified using automated techniques, enabling analysis across multiple related virtual communities. Approaches to the analysis of cyber-artifacts are guided by perspectives of social movement theory. A case study on a broad group of related social movement virtual communities is presented to demonstrate the efficacy of the framework, and provide a detailed instantiation of the proposed approach for evaluation. © 2010 International Communication Association.
  • Abbasi, A., & Chen, H. (2009). A comparison of fraud cues and classification methods for fake escrow website detection. Information Technology and Management, 10(2-3 SPEC. ISS.), 83-101.
    More info
    Abstract: The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, such as web spam and spoof site detection, fake escrow website categorization has received little attention. The authentic appearance of fake escrow websites makes it difficult for Internet users to differentiate legitimate sites from phonies; making systems for detecting such websites an important endeavor. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of fraud cues extracted from web page text, image, and link information. We also compared several machine learning algorithms, including support vector machines, neural networks, decision trees, naïve bayes, and principal component analysis. Experiments were conducted to assess the proposed fraud cues and techniques on a test bed encompassing nearly 90,000 web pages derived from 410 legitimate and fake escrow websites. The combination of an extended feature set and a support vector machines ensemble classifier enabled accuracies over 90 and 96% for page and site level classification, respectively, when differentiating fake pages from real ones. Deeper analysis revealed that an extended set of fraud cues is necessary due to the broad spectrum of tactics employed by fraudsters. The study confirms the feasibility of using automated methods for detecting fake escrow websites. The results may also be useful for informing existing online escrow fraud resources and communities of practice about the plethora of fraud cues pervasive in fake websites. © Springer Science+Business Media, LLC 2009.
  • Abbasi, A., & Chen, H. (2009). A comparison of tools for detecting fake websites. Computer, 42(10), 78-86.
    More info
    Abstract: As fake website developers become more innovative, so too must the tools used to protect Internet users. A proposed system combines a support vector machine classifier and a rich feature set derived from website text, linkage, and images to better detect fraudulent sites. © 2009 IEEE.
  • Chen, H. (2009). AI and global science and technology assessment. IEEE Intelligent Systems, 24(4), 68-71.
    More info
    Abstract: The five essays on global science and technology S&T assessment from distinguished experts in knowledge mapping, scientometrics, information visualization, digital libraries, and multilingual knowledge management has been discussed. The first essay, 'China S&T Assessment' proposes three fundamental S&T assessment metrics and shows the Chinese emphasis on the physical and engineering sciences and its significant research productivity gains. The another essay, 'Open Data and Open Code for S&T Assessment', introduces science maps to help humans mentally organize, access, and manage complex digital library collections. The essay, 'Global S&T Assessment by Analysis of Large ETD Collections introduce the highly successful Networked Digital Library of Theses and Dissertations (NDLTD) project. The final essay, 'Managing Multilingual S&T Knowledge' describes a research framework for cross-lingual and polylingual text categorization and category integration.
  • Chen, H. (2009). IEDs in the dark web: Lexicon expansion and genre classification. 2009 IEEE International Conference on Intelligence and Security Informatics, ISI 2009, 173-175.
    More info
    Abstract: Improvised explosive device web pages represent a significant source of knowledge for security organizations. In this paper, we present significant improvements to our approach to the discovery and classification of IED related web pages in the Dark Web. We present a statistical feature ranking approach to the expansion of the keyword lexicon used to discover IED related web pages, which identified new relevant terms for inclusion. Additionally, we present an improved web page feature representation designed to better capture the structural and stylistic cues revealing of genres of communication, and a series of experiments comparing the classification performance of the new representation with our existing approach. ©2009 IEEE.
  • Chen, H., Dacier, M., Moens, M., Paass, G., & Yang, C. C. (2009). Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics (CSI-KDD): Preface. Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, CSI-KDD in Conjunction with SIGKDD'09, iii-vi.
  • Chen, H., Hu, P. J., Hu, H., Chu, E. L., & Hsu, F. (2009). AI, e-government, and politics 2.0. IEEE Intelligent Systems, 24(5), 64-86.
  • Chen, H., Xin, L. i., Chau, M., Ho, Y., & Tseng, C. (2009). Using open web APIs in teaching web mining. IEEE Transactions on Education, 52(4), 482-490.
    More info
    Abstract: With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems curricula. This paper reports on an experience using open Web Application Programming Interfaces (APIs) that have been made available by major Internet companies (e.g., Google, Amazon, and eBay) in a class project to teach Web mining applications. The instructor's observations of the students' performance and a survey of the students' opinions show that the class project achieved its objectives and students acquired valuable experience in leveraging the APIs to build interesting Web mining applications. © 2006 IEEE.
  • Chen, H., Yang, C. C., Chau, M., & Li, S. (2009). Lecture Notes in Computer Science: Preface. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5477, vi.
  • Chung, W., & Chen, H. (2009). Browsing the underdeveloped web: An experiment on the arabic medical web directory. Journal of the American Society for Information Science and Technology, 60(3), 595-607.
    More info
    Abstract: While the Web has grown significantly in recent years, some portions of the Web remain largely underdeveloped, as shown in a lack of high-quality content and functionality. An example is the Arabic Web, in which a lack of well-structured Web directories limits users' ability to browse for Arabic resources. In this research, we proposed an approach to building Web directories for the underdeveloped Web and developed a proof-of-concept prototype called the Arabic Medical Web Directory (AMed- Dir) that supports browsing of over 5,000 Arabic medical Web sites and pages organized in a hierarchical structure. We conducted an experiment involving Arab participants and found that the AMedDir significantly outperformed two benchmark Arabic Web directories in terms of browsing effectiveness, efficiency, information quality, and user satisfaction. Participants expressed strong preference for the AMedDir and provided many positive comments. This research thus contributes to developing a useful Web directory for organizing the information in the Arabic medical domain and to a better understanding of how to support browsing on the underdeveloped Web.
  • Chung, W., Chen, H., & Reid, E. (2009). Business stakeholder analyzer: An experiment of classifying stakeholders on the web. Journal of the American Society for Information Science and Technology, 60(1), 59-74.
    More info
    Abstract: As the Web is used increasingly to share and disseminate information, business analysts and managers are challenged to understand stakeholder relationships.Traditional stakeholder theories and frameworks employ a manual approach to analysis and do not scale up to accommodate the rapid growth of the Web. Unfortunately, existing business intelligence (BI) tools lack analysis capability, and research on BI systems is sparse. This research proposes a framework for designing BI systems to identify and to classify stakeholders on the Web, incorporating human knowledge and machine-learned information from Web pages. Based on the framework, we have developed a prototype called Business Stakeholder Analyzer (BSA) that helps managers and analysts to identify and to classify their stakeholders on the Web. Results from our experiment involving algorithm comparison, feature comparison, and a user study showed that the system achieved better within-class accuracies in widespread stakeholder types such as partner/sponsor/supplier and media/reviewer, and was more efficient than human classification. The student and practitioner subjects in our user study strongly agreed that such a system would save analysts' time and help to identify and classify stakeholders. This research contributes to a better understanding of how to integrate information technology with stakeholder theory, and enriches the knowledge base of BI system design. © 2008 ASIS&T.
  • Dang, Y., Chen, H., Zhang, Y., & Roco, M. C. (2009). Knowledge sharing and diffusion patterns. IEEE Nanotechnology Magazine, 3(3), 16-21.
    More info
    Abstract: Due to nanotechnology's potential to shape a country's future earning power in globally competitive markets, more than 60 countries have adopted national projects or programs to stimulate research and innovation in technology [1]. Both industrialized and developing countries have intensified their nanotechnology R & D efforts of late [2]. Patent analysis can reveal the scope and direction of nanotechnology R & D trends and has been used to assess the development of different research communities and technology fields [3], [4]; to study nanotechnology patents published by the U.S. Patent and Trademark Office (USPTO), the European Patent Office (EPO), and the Japan Patent Office (JPO) [5], [6]; and to examine the impact of U.S. National Science Foundation grants on USPTO nanotechnology patents [7]. © 2009 IEEE.
  • Dang, Y., Zhang, Y., Chen, H., Hu, P. J., Brown, S. A., & Larson, C. (2009). Arizona literature mapper: An integrated approach to monitor and analyze global bioterrorism research literature. Journal of the American Society for Information Science and Technology, 60(7), 1466-1485.
    More info
    Abstract: Biomedical research is critical to biodefense, which is drawing increasing attention from governments globally as well as from various research communities. The U.S. government has been closely monitoring and regulating biomedical research activities, particularly those studying or involving bioterrorism agents or diseases. Effective surveillance requires comprehensive understanding of extant biomedical research and timely detection of new developments or emerging trends. The rapid knowledge expansion, technical breakthroughs, and spiraling collaboration networks demand greater support for literature search and sharing, which cannot be effectively supported by conventional literature search mechanisms or systems. In this study, we propose an integrated approach that integrates advanced techniques for content analysis, network analysis, and information visualization. We design and implement Arizona Literature Mapper, a Web-based portal that allows users to gain timely, comprehensive understanding of bioterrorism research, including leading scientists, research groups, institutions as well as insights about current mainstream interests or emerging trends. We conduct two user studies to evaluate Arizona Literature Mapper and include a well-known system for benchmarking purposes. According to our results, Arizona Literature Mapper is significantly more effective for supporting users' search of bioterrorism publications than PubMed. Users consider Arizona Literature Mapper more useful and easier to use than PubMed. Users are also more satisfied with Arizona Literature Mapper and show stronger intentions to use it in the future. Assessments of Arizona Literature Mapper's analysis functions are also positive, as our subjects consider them useful, easy to use, and satisfactory. Our results have important implications that are also discussed in the article.
  • Daning, H. u., Kaza, S., & Chen, H. (2009). Identifying significant facilitators of dark network evolution. Journal of the American Society for Information Science and Technology, 60(4), 655-665.
    More info
    Abstract: Social networks evolve over time with the addition and removal of nodes and links to survive and thrive in their environments. Previous studies have shown that the linkformation process in such networks is influenced by a set of facilitators. However, there have been few empirical evaluations to determine the important facilitators. In a research partnership with law enforcement agencies, we used dynamic social-network analysis methods to examine several plausible facilitators of co-offending relationships in a large-scale narcotics network consisting of individuals and vehicles. Multivariate Cox regression and a two-proportion z-test on cyclic and focal closures of the network showed that mutual acquaintance and vehicle affiliations were significant facilitators for the network under study. We also found that homophily with respect to age, race, and gender were not good predictors of future link formation in these networks. Moreover, we examined the social causes and policy implications for the significance and insignificance of various facilitators including common jails on future co-offending.These findings provide important insights into the link-formation processes and the resilience of social networks. In addition, they can be used to aid in the prediction of future links. The methods described can also help in understanding the driving forces behind the formation and evolution of social networks facilitated by mobile and Web technologies.
  • Hsu, F., Hu, P. J., Chen, H., & Hu, H. (2009). Examining agencies' satisfaction with electronic record management systems in e-government: A large-scale survey study. Lecture Notes in Business Information Processing, 22 LNBIP, 25-36.
    More info
    Abstract: While e-government is propelling and maturing steadily, advanced technological capabilities alone cannot guarantee agencies' realizing the full benefits of the enabling computer-based systems. This study analyzes information systems in e-government settings by examining agencies' satisfaction with an electronic record management system (ERMS). Specifically, we investigate key satisfaction determinants that include regulatory compliance, job relevance, and satisfaction with support services for using the ERMS. We test our model and the hypotheses in it, using a large-scale survey that involves a total of 1,652 government agencies in Taiwan. Our results show significant effects of regulatory compliance on job relevance and satisfaction with support services, which in turn determine government agencies' satisfaction with an ERMS. Our data exhibit a reasonably good fit to our model, which can explain a significant portion of the variance in agencies' satisfaction with an ERMS. Our findings have several important implications to research and practice, which are also discussed. © 2009 Springer Berlin Heidelberg.
  • Hsu, F., Hu, P. J., Chen, H., & Yu, C. (2009). The strategic co-alignment for implementing information systems in E-government. PACIS 2009 - 13th Pacific Asia Conference on Information Systems: IT Services in a Global Environment.
    More info
    Abstract: A regulating agency in a government, i.e. regulator, must co-align its information systems (IS) planning strategy with executing agencies, i.e. executors, for better e-government performance. Using an established strategic co-alignment model, we analyze the mutual participating strategies between regulator and executors and examine the outcomes and performance associated with that coalignment choice. After conducting a large-scale survey study of government agencies in Taiwan, the co-alignment relationship between e-government IS policy regulator and executor is examined. Based on the findings, we discuss their implications for e-government research and practice.
  • Hu, P. J., Chen, H., & Hu, H. (2009). Law enforcement officers' acceptance of advanced e-government technology: A survey study of COPLNK mobile. ACM International Conference Proceeding Series, 160-168.
    More info
    Abstract: Timely information access and effective knowledge support is crucial to law enforcement officers' crime fighting and investigations. An expanding array of e-government initiatives target the development of advanced information technologies and their deployment in law enforcement agencies. Abase in point is COPLINK, an integrated system that provides law enforcement officers with timely data access, effective information support, integrated knowledge sharing, and improved collaboration within or beyond the agency boundaries. In this study, we examine law enforcement officers' acceptance of COPLONK Mobile by proposing and testing a factor model premised in established theoretical foundations. According to our results, the model is capable of explaining or predicting officers' intentions to use the technology. Our survey data support the proposed model and the hypotheses it suggests. Among the acceptance determinants we investigated, perceived usefulness appears to have the most significant influence on individual officers' intention to use COPLONK Mobile. Copyright © 2009 ACM.
  • Jennifer, X. u., & Chen, H. (2009). Xu responds. Communications of the ACM, 52(4), 9-.
  • Kaza, S., & Chen, H. (2009). Effect of inventor status on intra-organizational innovation evolution. Proceedings of the 42nd Annual Hawaii International Conference on System Sciences, HICSS.
    More info
    Abstract: Innovation is one of the primary characteristics that separates successful from unsuccessful organizations. Organizations have a choice in selecting knowledge that is recombined to produce new innovations. The selection of knowledge is influenced by the status of inventors in an organization's internal knowledge network. In this study, we model knowledge flow within an organization and contend that it exhibits unique characteristics not incorporated in most social network measures. Using the model, we also propose a new measure based on random walks and team identification and use it to examine innovation selection in a large organization. Using empirical methods, we find that inventor status determined by the new measure had a significant positive relationship with the likelihood that his/her knowledge would be selected for recombination. We believe that the new measure in addition to modeling knowledge flow in a scientific collaboration network helps better understand how innovation evolves within organizations. © 2009 IEEE.
  • Kaza, S., Jennifer, X. u., Marshall, B., & Chen, H. (2009). Topological analysis of criminal activity networks: Enhancing transportation security. IEEE Transactions on Intelligent Transportation Systems, 10(1), 83-91.
    More info
    Abstract: The security of border and transportation systems is a critical component of the national strategy for homeland security. The security concerns at the border are not independent of law enforcement in border-area jurisdictions because the information known by local law enforcement agencies may provide valuable leads that are useful for securing the border and transportation infrastructure. The combined analysis of law enforcement information and data generated by vehicle license plate readers at international borders can be used to identify suspicious vehicles and people at ports of entry. This not only generates better quality leads for border protection agents but may also serve to reduce wait times for commerce, vehicles, and people as they cross the border. This paper explores the use of criminal activity networks (CANs) to analyze information from law enforcement and other sources to provide value for transportation and border security. We analyze the topological characteristics of CAN of individuals and vehicles in a multiple jurisdiction scenario. The advantages of exploring the relationships of individuals and vehicles are shown. We find that large narcotic networks are small world with short average path lengths ranging from 4.5 to 8.5 and have scale-free degree distributions with power law exponents of 0.851.3. In addition, we find that utilizing information from multiple jurisdictions provides higher quality leads by reducing the average shortest-path lengths. The inclusion of vehicular relationships and border-crossing information generates more investigative leads that can aid in securing the border and transportation infrastructure. © 2006 IEEE.
  • Liu, X., Zhang, P., Xin, L. i., Chen, H., Dang, Y., Larson, C., Roco, M. C., & Wang, X. (2009). Trends for nanotechnology development in China, Russia, and India. Journal of Nanoparticle Research, 11(8), 1845-1866.
    More info
    Abstract: China, Russia, and India are playing an increasingly important role in global nanotechnology research and development (R&D). This paper comparatively inspects the paper and patent publications by these three countries in the Thomson Science Citation Index Expanded (SCI) database and United States Patent and Trademark Office (USPTO) database (1976-2007). Bibliographic, content map, and citation network analyses are used to evaluate country productivity, dominant research topics, and knowledge diffusion patterns. Significant and consistent growth in nanotechnology papers are noted in the three countries. Between 2000 and 2007, the average annual growth rate was 31.43% in China, 11.88% in Russia, and 33.51% in India. During the same time, the growth patterns were less consistent in patent publications: the corresponding average rates are 31.13, 10.41, and 5.96%. The three countries' paper impact measured by the average number of citations has been lower than the world average. However, from 2000 to 2007, it experienced rapid increases of about 12.8 times in China, 8 times in India, and 1.6 times in Russia. The Chinese Academy of Sciences (CAS), the Russian Academy of Sciences (RAS), and the Indian Institutes of Technology (IIT) were the most productive institutions in paper publication, with 12,334, 6,773, and 1,831 papers, respectively. The three countries emphasized some common research topics such as "Quantum dots," "Carbon nanotubes," "Atomic force microscopy," and "Scanning electron microscopy," while Russia and India reported more research on nano-devices as compared with China. CAS, RAS, and IIT played key roles in the respective domestic knowledge diffusion. © 2009 Springer Science+Business Media B.V.
  • Schumaker, R. P., & Chen, H. (2009). Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions on Information Systems, 27(2).
    More info
    Abstract: Our research examines a predictive machine learning approach for financial news articles analysis using several different textual representations: bag of words, noun phrases, and named entities. Through this approach, we investigated 9,211 financial news articles and 10,259,042 stock quotes covering the S&P 500 stocks during a five week period. We applied our analysis to estimate a discrete stock price twenty minutes after a news article was released. Using a support vector machine (SVM) derivative specially tailored for discrete numeric prediction and models containing different stock-specific variables, we show that the model containing both article terms and stock price at the time of article release had the best performance in closeness to the actual future stock price (MSE 0.04261), the same direction of price movement as the future price (57.1% directional accuracy) and the highest return using a simulated trading engine (2.06% return). We further investigated the different textual representations and found that a Proper Noun scheme performs better than the de facto standard of Bag of Words in all three metrics.
  • Schumaker, R. P., Schumaker, R. P., Chen, H., & Chen, H. (2009). A quantitative stock prediction system based on financial news. Information Processing and Management, 45(5), 571-583.
    More info
    Abstract: We examine the problem of discrete stock price prediction using a synthesis of linguistic, financial and statistical techniques to create the Arizona Financial Text System (AZFinText). The research within this paper seeks to contribute to the AZFinText system by comparing AZFinText's predictions against existing quantitative funds and human stock pricing experts. We approach this line of research using textual representation and statistical machine learning methods on financial news articles partitioned by similar industry and sector groupings. Through our research, we discovered that stocks partitioned by Sectors were most predictable in measures of Closeness, Mean Squared Error (MSE) score of 0.1954, predicted Directional Accuracy of 71.18% and a Simulated Trading return of 8.50% (compared to 5.62% for the S&P 500 index). In direct comparisons to existing market experts and quantitative mutual funds, our system's trading return of 8.50% outperformed well-known trading experts. Our system also performed well against the top 10 quantitative mutual funds of 2005, where our system would have placed fifth. When comparing AZFinText against only those quantitative funds that monitor the same securities, AZFinText had a 2% higher return than the best performing quant fund. © 2009 Elsevier Ltd. All rights reserved.
  • Thuraisingham, B., & Hsinchun, C. (2009). IEEE ISI 2009 welcome message from conference co-chairs. 2009 IEEE International Conference on Intelligence and Security Informatics, ISI 2009.
  • Tianjun, F., Huang, C., & Hsinchun, C. (2009). Identification of extremist videos in online video sharing sites. 2009 IEEE International Conference on Intelligence and Security Informatics, ISI 2009, 179-181.
    More info
    Abstract: Web 2.0 has become an effective grassroots communication platform for extremists to promote their ideas, share resources, and communicate among each other. As an important component of Web 2.0, online video sharing sites such as YouTube and Google video have also been utilized by extremist groups to distribute videos. This study presented a framework for identifying extremist videos in online video sharing sites by using user-generated text content such as comments, video descriptions, and titles without downloading the videos. Text features including lexical features, syntactic features and content specific features were first extracted. Then Information Gain was used for feature selection, and Support Vector Machine was deployed for classification. The exploratory experiment showed that our proposed framework is effective for identifying online extremist videos, with the F-measure as high as 82%. ©2009 IEEE.
  • Xin, L. i., & Chen, H. (2009). Recommendation as link prediction: A graph kernel-based machine learning approach. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 213-216.
    More info
    Abstract: Recommender systems have demonstrated commercial success in multiple industries. In digital libraries they have the potential to be used as a support tool for traditional information retrieval functions. Among the major recommendation algorithms, the successful collaborative filtering (CF) methods explore the use of user-item interactions to infer user interests. Based on the finding that transitive user-item associations can alleviate the data sparsity problem in CF, multiple heuristic algorithms were designed to take advantage of the user-item interaction networks with both direct and indirect interactions. However, the use of such graph representation was still limited in learning-based algorithms. In this paper, we propose a graph kernel-based recommendation framework. For each user-item pair, we inspect its associative interaction graph (AIG) that contains the users, items, and interactions n steps away from the pair. We design a novel graph kernel to capture the AIG structures and use them to predict possible user-item interactions. The framework demonstrates improved performance on an online bookstore dataset, especially when a large number of suggestions are needed. Copyright 2009 ACM.
  • Xin, L. i., Chen, H., Zhang, Z., Jiexun, L. i., & Nunamaker, J. (2009). Managing knowledge in light of its evolution process: An empirical study on citation network-based patent classification. Journal of Management Information Systems, 26(1), 129-153.
    More info
    Abstract: Knowledge management is essential to modern organizations. Due to the information overload problem, managers are facing critical challenges in utilizing the data in organizations. Although several automated tools have been applied, previous applications often deem knowledge items independent and use solely contents, which may limit their analysis abilities. This study focuses on the process of knowledge evolution and proposes to incorporate this perspective into knowledge management tasks. Using a patent classification task as an example, we represent knowledge evolution processes with patent citations and introduce a labeled citation graph kernel to classify patents under a kernel-based machine learning framework. In the experimental study, our proposed approach shows more than 30 percent improvement in classification accuracy compared to traditional content-based methods. The approach can potentially affect the existing patent management procedures. Moreover, this research lends strong support to considering knowledge evolution processes in other knowledge management tasks. © 2009 M.E. Sharpe, Inc.
  • Xin, L. i., Daning, H. u., Dang, Y., Chen, H., Roco, M. C., Larson, C. A., & Chan, J. (2009). Nano Mapper: An Internet knowledge mapping system for nanotechnology development. Journal of Nanoparticle Research, 11(3), 529-552.
    More info
    Abstract: Nanotechnology research has experienced rapid growth in recent years. Advances in information technology enable efficient investigation of publications, their contents, and relationships for large sets of nanotechnology-related documents in order to assess the status of the field. This paper presents the development of a new knowledge mapping system, called Nano Mapper ( http://nanomapper.eller.arizona.edu ), which integrates the analysis of nanotechnology patents and research grants into a Web-based platform. The Nano Mapper system currently contains nanotechnology-related patents for 1976-2006 from the United States Patent and Trademark Office (USPTO), European Patent Office (EPO), and Japan Patent Office (JPO), as well as grant documents from the U.S. National Science Foundation (NSF) for the same time period. The system provides complex search functionalities, and makes available a set of analysis and visualization tools (statistics, trend graphs, citation networks, and content maps) that can be applied to different levels of analytical units (countries, institutions, technical fields) and for different time intervals. The paper shows important nanotechnology patenting activities at USPTO for 2005-2006 identified through the Nano Mapper system. © 2008 Springer Science+Business Media B.V.
  • Yulei, Z., Shuo, Z., Fan, L. i., Yan, D., Larson, C. A., & Hsinchun, C. (2009). Dark web forums portal: Searching and analyzing Jihadist forums. 2009 IEEE International Conference on Intelligence and Security Informatics, ISI 2009, 71-76.
    More info
    Abstract: With the advent of Web 2.0, the Web is acting as a platform which enables end-user content generation. As a major type of social media in Web 2.0, Web forums facilitate intensive interactions among participants. International Jihadist groups often use Web forums to promote violence and distribute propaganda materials. These Dark Web forums are heterogeneous and widely distributed. Therefore, how to access and analyze the forum messages and interactions among participants is becoming an issue. This paper presents a general framework for Web forum data integration. Specifically, a Web-based knowledge portal, the Dark Web Forums Portal, is built based on the framework. The portal incorporates the data collected from different international Jihadist forums and provides several important analysis functions, including forum browsing and searching (in single forum and across multiple forums), forum statistics analysis, multilingual translation, and social network visualization. Preliminary results of our user study show that the Dark Web Forums Portal helps users locate information quickly and effectively. Users found the forum statistics analysis, multilingual translation, and social network visualization functions of the portal to be particularly valuable. ©2009 IEEE.
  • Yulei, Z., Yan, D., & Hsinchun, C. (2009). Gender difference analysis of political web forums: An experiment on an international Islamic women's forum. 2009 IEEE International Conference on Intelligence and Security Informatics, ISI 2009, 61-64.
    More info
    Abstract: As an important type of social media, the political Web forum has become a major communication channel for people to discuss and debate political, cultural and social issues. Although the Internet has a male-dominated history, more and more women have started to share their concerns and express opinions through online discussion boards and Web forums. This paper presents an automated approach to gender difference analysis of political Web forums. The approach uses rich textual feature representation and machine learning techniques to examine the online gender differences between female and male participants on political Web forums by analyzing writing styles and topics of interest. The results of gender difference analysis performed on a large and long-standing international Islamic women's political forum are presented, showing that female and male participants have significantly different topics of interest. ©2009 IEEE.
  • Zhang, Y., Dang, Y., Chen, H., Thurmond, M., & Larson, C. (2009). Automatic online news monitoring and classification for syndromic surveillance. Decision Support Systems, 47(4), 508-517.
    More info
    Abstract: Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to protecting against major outbreaks. By monitoring online news sources and developing an accurate news classification system for syndromic surveillance, public health personnel can be apprised of outbreaks and potential outbreak situations. In this study, we have developed a framework for automatic online news monitoring and classification for syndromic surveillance. The framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases. In recent classification experiments, we compared the performance of different feature subsets on different machine learning algorithms. The results showed that the combined feature subsets including Bag of Words, Noun Phrases, and Named Entities features outperformed the Bag of Words feature subsets. Furthermore, feature selection improved the performance of feature subsets in online news classification. The highest classification performance was achieved when using SVM upon the selected combination feature subset. © 2009 Elsevier B.V. All rights reserved.
  • Abbasi, A., & Chen, H. (2008). Cybergate: A design framework and system for text analysis of computer-mediated communication. MIS Quarterly: Management Information Systems, 32(4), 811-837.
    More info
    Abstract: Content analysis of computer-mediated communication (CMC) is important for evaluating the effectiveness of electronic communication in various organizational settings. CMC text analysis relies on systems capable of providing suitable navigation and knowledge discovery functionalities. However, existing CMC systems focus on structural features, with little support for features derived from message text. This deficiency is attributable to the informational richness and representational complexities associated with CMC text. In order to address this shortcoming, we propose a design framework for CMC text analysis systems. Grounded in systemic functional linguistic theory, the proposed framework advocates the development of systems capable of representing the rich array of information types inherent in CMC text. It also provides guidelines regarding the choice of features, feature selection, and visualization techniques that CMC text analysis systems should employ. The CyberGate system was developed as an instantiation of the design framework. CyberGate incorporates a rich feature set and complementary feature selection and visualization methods, including the writeprints and ink blots techniques. An application example was used to illustrate the system 's ability to discern important patterns in CMC text. Furthermore, results from numerous experiments conducted in comparison with benchmark methods confirmed the viability of CyberGate 's features and techniques. The results revealed that the CyberGate system and its underlying design framework can dramatically improve CMC text analysis capabilities over those provided by existing systems.
  • Abbasi, A., & Chen, H. (2008). Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information Systems, 26(2).
    More info
    Abstract: One of the problems often associated with online anonymity is that it hinders social accountability, as substantiated by the high levels of cybercrime. Although identity cues are scarce in cyberspace, individuals often leave behind textual identity traces. In this study we proposed the use of stylometric analysis techniques to help identify individuals based on writing style. We incorporated a rich set of stylistic features, including lexical, syntactic, structural, content-specific, and idiosyncratic attributes. We also developed the Writeprints technique for identification and similarity detection of anonymous identities. Writeprints is a Karhunen-Loeve transforms-based technique that uses a sliding window and pattern disruption algorithm with individual author-level feature sets. The Writeprints technique and extended feature set were evaluated on a testbed encompassing four online datasets spanning different domains: email, instant messaging, feedback comments, and program code. Writeprints outperformed benchmark techniques, including SVM, Ensemble SVM, PCA, and standard Karhunen-Loeve transforms, on the identification and similarity detection tasks with accuracy as high as 94% when differentiating between 100 authors. The extended feature set also significantly outperformed a baseline set of features commonly used in previous research. Furthermore, individual-author-level feature sets generally outperformed use of a single group of attributes. © 2008 ACM.
  • Abbasi, A., Chen, H., & Nunamaker Jr., J. F. (2008). Stylometric identification in electronic markets: Scalability and robustness. Journal of Management Information Systems, 25(1), 49-78.
    More info
    Abstract: Online reputation systems are intended to facilitate the propagation of word of mouth as a credibility scoring mechanism for improved trust in electronic marketplaces. However, they experience two problems attributable to anonymity abuse - easy identity changes and reputation manipulation. In this study, we propose the use of stylometric analysis to help identify online traders based on the writing style traces inherent in their posted feedback comments. We incorporated a rich stylistic feature set and developed the Writeprint technique for detection of anonymous trader identities. The technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 200 eBay traders. Experiments conducted to assess the scalability (number of traders) and robustness (against intentional obfuscation) of the proposed approach found it to significantly outperform benchmark stylometric techniques. The results indicate that the proposed method in~y help militate against easy identity changes and reputation manipulation in electronic markets. © 2008 M.E. Sharpe, Inc.
  • Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Transactions on Information Systems, 26(3).
    More info
    Abstract: The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments. © 2008 ACM.
  • Abbasi, A., Chen, H., Thoms, S., & Tianjun, F. u. (2008). Affect analysis of web forums and blogs using correlation ensembles. IEEE Transactions on Knowledge and Data Engineering, 20(9), 1168-1180.
    More info
    Abstract: Analysis of affective intensities in computer-mediated communication is important in order to allow a better understanding of online users' emotions and preferences. Despite considerable research on textual affect classification, it is unclear which features and techniques are most effective. In this study, we compared several feature representations for affect analysis, including learned n-grams and various automatically and manually crafted affect lexicons. We also proposed the support vector regression correlation ensemble (SVRCE) method for enhanced classification of affect intensities. SVRCE uses an ensemble of classifiers each trained using a feature subset tailored toward classifying a single affect class. The ensemble is combined with affect correlation information to enable better prediction of emotive intensities. Experiments were conducted on four test beds encompassing web forums, blogs, and online stories. The results revealed that learned n-grams were more effective than lexicon-based affect representations. The findings also indicated that SVRCE outperformed comparison techniques, including Pace regression, semantic orientation, and WordNet models. Ablation testing showed that the improved performance of SVRCE was attributable to its use of feature ensembles as well as affect correlation information. A brief case study was conducted to illustrate the utility of the features and techniques for affect analysis of large archives of online discourse. © 2008 IEEE.
  • Chau, M., & Chen, H. (2008). A machine learning approach to web page filtering using content and structure analysis. Decision Support Systems, 44(2), 482-494.
    More info
    Abstract: As the Web continues to grow, it has become increasingly difficult to search for relevant information using traditional search engines. Topic-specific search engines provide an alternative way to support efficient information retrieval on the Web by providing more precise and customized searching in various domains. However, developers of topic-specific search engines need to address two issues: how to locate relevant documents (URLs) on the Web and how to filter out irrelevant documents from a set of documents collected from the Web. This paper reports our research in addressing the second issue. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. We represent each Web page by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. Two experiments were designed and conducted to compare the proposed Web-feature approach with two existing Web page filtering methods - a keyword-based approach and a lexicon-based approach. The experimental results showed that the proposed approach in general performed better than the benchmark approaches, especially when the number of training documents was small. The proposed approaches can be applied in topic-specific search engine development and other Web applications such as Web content management. © 2007 Elsevier B.V. All rights reserved.
  • Chau, M., Qin, J., Zhou, Y., Tseng, C., & Chen, H. (2008). SpidersRUs: Creating specialized search engines in multiple languages. Decision Support Systems, 45(3), 621-640.
    More info
    Abstract: While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic. © 2007 Elsevier B.V. All rights reserved.
  • Chen, H. (2008). Discovery of improvised explosive device content in the Dark Web. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 88-93.
    More info
    Abstract: Improvised explosive device related web content offers a wealth of knowledge to members of the security and intelligence communities. However, acquiring the desired topical information remains a challenge for analysts due to issues including site identification, accessibility, and language. This paper presents a focused crawling approach for the discovery and collection of improvised explosive device content from the Dark Web. Results and examples from an exploratory collection effort are described. Site map and link analyses were also performed, offering insight into the communication dynamics and publication of improvised explosive device web content. ©2008 IEEE.
  • Chen, H. (2008). IEDs in the Dark Web: Genre classification of improvised explosive device web pages. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 94-97.
    More info
    Abstract: Improvised explosive device web pages represent a significant source of knowledge for security organizations. These web pages exist in distinctive genres of communication, providing different types and levels of information for the intelligence community. This paper presents a framework for the classification of improvised explosive device web pages by genre. The approach uses a complex feature extractor, extended feature representation, and support vector machine learning algorithms. Improvised explosive device web pages were collected from the Dark Web and two classification models were examined, one using feature selection. Classification accuracy exceeded 88%. ©2008 IEEE.
  • Chen, H. (2008). Nuclear threat detection via the nuclear web and dark web: Framework and preliminary study. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5376 LNCS, 85-96.
    More info
    Abstract: We believe the science of Intelligence and Security Informatics (ISI) can help with nuclear forensics and attribution. ISI research can help advance the intelligence collection, analytical techniques and instrumentation used in determining the origin, capability, intent, and transit route of nuclear materials by selected hostile countries and (terrorist) groups. We propose a research framework that aims to investigate the Capability, Accessibility, and Intent of critical high-risk countries, institutions, researchers, and extremist or terrorist groups. We propose to develop a knowledge base of the Nuclear Web that will collect, analyze, and pinpoint significant actors in the high-risk international nuclear physics and weapon community. We also identify potential extremist or terrorist groups from our Dark Web testbed who might pose WMD threats to the US and the international community. Selected knowledge mapping and focused web crawling techniques and findings from a preliminary study are presented in this paper. © 2008 Springer Berlin Heidelberg.
  • Chen, H. (2008). Sentiment and affect analysis of Dark Web forums: measuring radicalization on the internet. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 104-109.
    More info
    Abstract: Dark Web forums are heavily used by extremist and terrorist groups for communication, recruiting, ideology sharing, and radicalization. These forums often have relevance to the Iraqi insurgency or Al-Qaeda and are of interest to security and intelligence organizations. This paper presents an automated approach to sentiment and affect analysis of selected radical international Jihadist Dark Web forums. The approach incorporates a rich textual feature representation and machine learning techniques to identify and measure the sentiment polarities and affect intensities expressed in forum communications. The results of sentiment and affect analysis performed on two large-scale Dark Web forums are presented, offering insight into the communities and participants. ©2008 IEEE.
  • Chen, H., Chen, H. -., Lu, H., Zeng, D., Trujillo, L., & Komatsu, K. (2008). Ontology-enhanced automatic chief complaint classification for syndromic surveillance. Journal of biomedical informatics, 41(2).
    More info
    Emergency department free-text chief complaints (CCs) are a major data source for syndromic surveillance. CCs need to be classified into syndromic categories for subsequent automatic analysis. However, the lack of a standard vocabulary and high-quality encodings of CCs hinder effective classification. This paper presents a new ontology-enhanced automatic CC classification approach. Exploiting semantic relations in a medical ontology, this approach is motivated to address the CC vocabulary variation problem in general and to meet the specific need for a classification approach capable of handling multiple sets of syndromic categories. We report an experimental study comparing our approach with two popular CC classification methods using a real-world dataset. This study indicates that our ontology-enhanced approach performs significantly better than the benchmark methods in terms of sensitivity, F measure, and F2 measure.
  • Chen, H., Chung, W., Qin, J., Reid, E., Sageman, M., & Weimann, G. (2008). Uncovering the Dark Web: A case study of Jjihad on the Web. Journal of the American Society for Information Science and Technology, 59(8), 1347-1359.
    More info
    Abstract: While the Web has become a worldwide platform for communication, terrorists share their ideology and communicate with members on the "Dark Web" - the reverse side of the Web used by terrorists. Currently, the problems of information overload and difficulty to obtain a comprehensive picture of terrorist activities hinder effective and efficient analysis of terrorist information on the Web. To improve understanding of terrorist activities, we have developed a novel methodology for collecting and analyzing Dark Web information. The methodology incorporates information collection, analysis, and visualization techniques, and exploits various Web information sources. We applied it to collecting and analyzing information of 39 Jihad Web sites and developed visualization of their site contents, relationships, and activity levels. An expert evaluation showed that the methodology is very useful and promising, having a high potential to assist in investigation and understanding of terrorist activities by producing results that could potentially help guide both policymaking and intelligence research.
  • Chen, H., Roco, M. C., Xin, L. i., & Lin, Y. (2008). Trends in nanotechnology patents. Nature Nanotechnology, 3(3), 123-125.
    More info
    PMID: 18654475;Abstract: An analysis of 30 years of data on patent publications from the US Patent and Trademark Office, the European Patent Office and the Japan Patent Office confirms the dominance of companies and selected academic institutions from the US, Europe and Japan in the commercialization of nanotechnology. © 2008 Nature Publishing Group.
  • Chen, H., Thoms, S., & Tianjun, F. u. (2008). Cyber extremism in Web 2.0: An exploratory study of international Jihadist groups. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 98-103.
    More info
    Abstract: As part of the NSF-funded Dark Web research project, this paper presents an exploratory study of cyber extremism on the Web 2.0 media: blogs, YouTube, and Second Life. We examine international Jihadist extremist groups that use each of these media. We observe that these new, interactive, multimedia-rich forms of communication provide effective means for extremists to promote their ideas, share resources, and communicate among each other. The development of automated collection and analysis tools for Web 2.0 can help policy makers, intelligence analysts, and researchers to better understand extremists' ideas and communication patterns, which may lead to strategies that can counter the threats posed by extremists in the second-generation Web. ©2008 IEEE.
  • Chen, Y., Abbasi, A., & Chen, H. (2008). Developing ideological networks using social network analysis and writeprints: A case study of the international Falun Gong movement. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 7-12.
    More info
    Abstract: The convenience of the Internet has made it possible for activist groups to easily form alliances through their websites to appeal to wider audience and increase their impact. In this study, we investigate the potential of using Social Network Analysis (SNA) and Writeprints to discover the fusion of activitst ideas on the Internet, focusing on the Falun Gong movement. We find that network visualization is very useful to reveal how different types of websites or ideas are associated and, in some cases, mixed together. Furthermore, the measures of centrality in SNA help to reveal which websites most prominently link to other websites. We find that Writeprints can be used to identify the ideas which an author gradually introduces and combines through a series of messages. ©2008 IEEE.
  • Chung, W., Lai, G., Bonillas, A., Wei, X. i., & Chen, H. (2008). Organizing domain-specific information on the Web: An experiment on the Spanish business Web directory. International Journal of Human Computer Studies, 66(2), 51-66.
    More info
    Abstract: Web directories organize voluminous information into hierarchical structures, helping users to quickly locate relevant information and to support decision-making. The development of existing ontologies and Web directories either relies on expert participation that may not be available or uses automatic approaches that lack precision. As more users access the Web in their native languages, better approaches to organizing and developing non-English Web directories are needed. In this paper, we have proposed a semi-automatic framework, which consists of anchor directory boosting, meta-searching, and heuristic filtering, to construct domain-specific Web directories. Using the framework, we have built a Web directory in the Spanish business (SBiz) domain. Experimental results show that the SBiz Web directory achieved significantly better recall, F-value, efficiency, and satisfaction rating than the benchmark directory. Subjects provided favorable comments on the SBiz Web directory. This research thus contributes to developing a useful framework for organizing domain-specific information on the Web and to providing empirical findings and useful insights for end-users, system developers, and researchers of Web information seeking and knowledge management. © 2007 Elsevier Ltd. All rights reserved.
  • Dang, Y., Zhang, Y., Suakkaphong, N., Larson, C., & Chen, H. (2008). An integrated approach to mapping worldwide bioterrorism research capabilities. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 212-214.
    More info
    Abstract: Biomedical research used for defense purposes may also be applied to biological weapons development. To mitigate risk, the U.S. Government has attempted to monitor and regulate biomedical research labs, especially those that study bioterrorism agents/diseases. However, monitoring worldwide biomedical researchers and their work is still an issue. In this study, we developed an integrated approach to mapping worldwide bioterrorism research literature. By utilizing knowledge mapping techniques, we analyzed the productivity status, collaboration status, and emerging topics in bioterrorism domain. The analysis results provide insights into the research status of bioterrorism agents/diseases and thus allow a more comprehensive view of bioterrorism researchers and ongoing work. ©2008 IEEE.
  • J., C., & Chen, H. (2008). Botnets, and the cybercriminal underground. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 206-211.
    More info
    Abstract: An underground community of cyber criminals has grown in recent years with powerful technologies capable of inflicting serious economic and infrastructural harm in the digital age. This paper serves as an introduction to the world of botnets and to the efforts of the nonprofit group "The Shad-owServer Foundation" to track them. A data mining exploration is performed on ShadowServer's datasets to investigate possible classification mechanisms for threat assessment. ©2008 IEEE.
  • Jennifer, X. u., & Chen, H. (2008). The topology of dark networks. Communications of the ACM, 51(10), 58-65.
    More info
    Abstract: A study was conducted to understand the network structure and activities of criminal or terrorist networks or dark networks. The study topological properties of these dark networks to understand the structural properties of dark networks. The topological analysis determine the statistical characteristics of large network structure. The study found that these large networks can be divided into three groups that are, random, small-world, and scale free. The study also found that the dark network was build from open-source data, transcripts of court proceedings and press, and web articles. The study observed that the networks has many isolated components and a single giant component. The study also observed that the criminal and terrorists are able to connect with any other member in a network through few mediators.
  • Jennifer, X. u., & Chen, H. (2008). Understanding the nexus of terrorist web sites. Studies in Computational Intelligence, 135, 65-78.
    More info
    Abstract: In recent years terrorist groups have been using the World-Wide Web to spread their ideologies, disseminate propaganda, and recruit members. Studying the terrorist Web sites may help us understand the characteristics of these Web sites and predict terrorist activities. In this chapter, we propose to apply network topological analysis methods on systematically collected the terrorist Web site data and to study the structural characteristics at the Web page level. We conducted a case study using the methods on three collections of terrorist Web sites: Middle-Eastern, US domestic, and Latin-American. We found that the Web page networks from these three collections have the small-world and scale-free characteristics. We also found that smaller size Web sites which share similar interests tend to make stronger inter-site linkages, which help them form the giant component in the networks. © 2008 Springer-Verlag Berlin Heidelberg.
  • Jiexun, L. i., Wang, G. A., & Chen, H. (2008). PRM-based identity matching using social context. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 150-155.
    More info
    Abstract: Identity management is critical for many intelligence and security applications. Identity information is not reliable due to the problems of unintentional errors and intentional deception by the criminals. Most of existing identity matching techniques consider personal identity features only. In this article we propose a PRM-based identity matching technique that takes both personal identity features and social contexts into account. We identify two groups of social context features, namely social activity and social relation features. Experiments show that the social activity features significantly improve the matching performance while the social relation features effectively reduce false positive and false negative. ©2008 IEEE.
  • Jiexun, L. i., Zhang, Z., Xin, L. i., & Chen, H. (2008). Kernel-based learning for biomedical relation extraction. Journal of the American Society for Information Science and Technology, 59(5), 756-769.
    More info
    Abstract: Relation extraction is the process of scanning text for relationships between named entities. Recently, significant studies have focused on automatically extracting relations from biomedical corpora. Most existing biomedical relation extractors require manual creation of biomedical lexicons or parsing templates based on domain knowledge. In this study, we propose to use kernel-based learning methods to automatically extract biomedical relations from literature text. We develop a framework of kernel-based learning for biomedical relation extraction. In particular, we modified the standard tree kernel function by incorporating a trace kernel to capture richer contextual information. In our experiments on a biomedical corpus, we compare different kernel functions for biomedical relation detection and classification. The experimental results show that a tree kernel outperforms word and sequence kernels for relation detection, our trace-tree kernel outperforms the standard tree kernel, and a composite kernel outperforms individual kernels for relation extraction.
  • Kaza, S., & Chen, H. (2008). Evaluating ontology mapping techniques: An experiment in public safety information sharing. Decision Support Systems, 45(4), 714-728.
    More info
    Abstract: The public safety community in the United States consists of thousands of local, state, and federal agencies, each with its own information system. In the past few years, there has been a thrust on the seamless interoperability of systems in these agencies. Ontology-based interoperability approaches in the public safety domain need to rely on mapping between ontologies as each agency has its own representation of information. However, there has been little study of ontology mapping techniques in this domain. We evaluate current mapping techniques with real-world data representations from law-enforcement and public safety data sources. In addition, we implement an information theory based tool called MIMapper that uses WordNet and mutual information between data instances to map ontologies. We find that three tools: PROMPT, Chimaera, and LOM, have average F-measures of 0.46, 0.49, and 0.68 when matching pairs of ontologies with the number of classes ranging from 13-73. MIMapper performs better with an average F-measure of 0.84 in performing the same task. We conclude that the tools that use secondary sources (like WordNet) and data instances to establish mappings between ontologies are likely to perform better in this application domain. © 2007 Elsevier B.V. All rights reserved.
  • Kaza, S., & Chen, H. (2008). Suspect vehicle identification for border safety. Studies in Computational Intelligence, 135, 305-318.
    More info
    Abstract: Border safety is a critical part of national and international security. The U.S. Department of Homeland Security searches vehicles entering the country at land borders for drugs and other contraband. Customs and Border Protection (CBP) agents believe that such vehicles operate in groups and if the criminal links of one vehicle are known then their border crossing patterns can be used to identify other partner vehicles. We perform this association analysis by using mutual information (MI) to identify vehicles that may be involved in criminal activity. CBP agents also suggest that criminal vehicles may cross at certain times or ports to try and evade inspection. In a partnership with border-area law enforcement agencies and CBP, we include these heuristics in the MI formulation and identify suspect vehicles using large-scale, real-world data collections. Statistical tests and selected cases judged by domain experts show that the heuristic-enhanced MI performs significantly better than classical MI in identifying pairs of potentially criminal vehicles. The techniques described can be used to assist CBP agents perform their functions both efficiently and effectively. © 2008 Springer-Verlag Berlin Heidelberg.
  • Marshall, B., Chen, H., & Kaza, S. (2008). Using importance flooding to identify interesting networks of criminal activity. Journal of the American Society for Information Science and Technology, 59(13), 2099-2114.
    More info
    Abstract: Effectively harnessing available data to support homeland-security-related applications is a major focus in the emerging science of intelligence and security informatics (ISI). Many studies have focused on criminal-network analysis as a major challenge within the ISI domain. Though various methodologies have been proposed, none have been tested for usefulness in creating link charts. This study compares manually created link charts to suggestions made by the proposed importance-flooding algorithm. Mirroring manual investigational processes, our iterative computation employs association-strength metrics, incorporates path-based node importance heuristics, allows for case-specific notions of importance, and adjusts based on the accuracy of previous suggestions. Interesting items are identified by leveraging both node attributes and network structure in a single computation. Our data set was systematically constructed from heterogeneous sources and omits many privacy-sensitive data elements such as case narratives and phone numbers. The flooding algorithm improved on both manual and link-weight-only computations, and our results suggest that the approach is robust across different interpretations of the user-provided heuristics. This study demonstrates an interesting methodology for including user-provided heuristics in network-based analysis, and can help guide the development of ISI-related analysis tools.
  • Mielke, C., & Chen, H. (2008). Mapping dark web geolocation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5376 LNCS, 97-107.
    More info
    Abstract: In this paper we first provide a brief review of the Dark Web project of the University of Arizona Artificial Intelligence Lab. We then report our research design and case study that aim to identify the geolocation of the countries, cities, and ISPs that host selected international Jihadist web sites. We provide an overview of key relevant Internet functionality and architecture and present techniques for exploiting networking technologies for locating servers and resources. Significant findings from our case study and suggestion for future research are also presented. © 2008 Springer Berlin Heidelberg.
  • Salem, A., Reid, E., & Chen, H. (2008). Multimedia content coding and analysis: Unraveling the content of jihadi extremist groups' videos. Studies in Conflict and Terrorism, 31(7), 605-626.
    More info
    Abstract: This article presents an exploratory study of jihadi extremist groups' videos using content analysis and a multimedia coding tool to explore the types of video, groups' modus operandi, and production features that lend support to extremist groups. The videos convey messages powerful enough to mobilize members, sympathizers, and even new recruits to launch attacks that are captured (on video) and disseminated globally through the Internet. They communicate the effectiveness of the campaigns and have a much wider impact because the messages are media rich with nonverbal cues and have vivid images of events that can evoke not only a multitude of psychological and emotional responses but also violent reactions. The videos are important for jihadi extremist groups' learning, training, and recruitment. In addition, the content collection and analysis of extremist groups' videos can help policymakers, intelligence analysts, and researchers better understand the extremist groups' terror campaigns and modus operandi, and help suggest counterintelligence strategies and tactics for troop training.
  • Schumaker, R. P., & Chen, H. (2008). Evaluating a news-aware quantitative trader: The effect of momentum and contrarian stock selection strategies. Journal of the American Society for Information Science and Technology, 59(2), 247-255.
    More info
    Abstract: We study the coupling of basic quantitative portfolio selection strategies with a financial news article prediction system, AZFinText. By varying the degrees of portfolio formation time, we found that a hybrid system using both quantitative strategy and a full set of financial news articles performed the best. With a 1-week portfolio formation period, we achieved a 20.79% trading return using a Momentum strategy and a 4.54% return using a Contrarian strategy over a 5-week holding period. We also found that trader overreaction to these events led AZFinText to capitalize on these short-term surges in price.
  • Shieh, I., Chen, S., Lee, D., Lee, S., & Chen, H. (2008). Welcome message from conference co-chairs. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, ix-x.
  • Tianjun, F. u., & Chen, H. (2008). Analysis of cyberactivism: A case study of online free Tibet activities. IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, 1-6.
    More info
    Abstract: Cyberactivism refers to the use of the Internet to advocate vigorous or intentional actions to bring about social or political change. Cyberactivism analysis aims to improve the understanding of cyber activists and their online communities. In this paper, we present a case study of online Free Tibet activities. For web site analysis, we use the inlink and outlink information of five selected seed URLs to construct the network of Free Tibet web sites. The network shows the close relationships between our five seed sites. Centrality measures reveal that tibet.org is probably an information hub site in the network. Further content analysis tells us that common hub site words are most popular in tibet.org whereas dalailama.com focuses mostly on religious words. For forum analysis, descriptive statistics such as the number of posts each month and the post distribution of forum users illustrate that the two large forums FreeTibetAndYou and RFAnews-Tibbs have experienced significant reduction in activities in recent years and that a small percentage of their users contribute the majority of posts. Important phrases of several long threads and active forum users are identified by using mutual information and TF-IDF scores. Such topical analyses help us understand the topics discussed in the forums and the ideas and interest of those forum users. Finally, social network analyses of the forum users are conducted to reflect their interactions and the social structure of their online communities. ©2008 IEEE.
  • Tianjun, F. u., Abbasi, A., & Chen, H. (2008). A hybrid approach to web forum interactional coherence analysis. Journal of the American Society for Information Science and Technology, 59(8), 1195-1209.
    More info
    Abstract: Despite the rapid growth of text-based computer-mediated communication (CMC), its limitations have rendered the media highly incoherent. This poses problems for content analysis of online discourse archives. Interactional coherence analysis (ICA) attempts to accurately identify and construct CMC interaction networks. In this study, we propose the Hybrid Interactional Coherence (HIC) algorithm for identification of web forum interaction. HIC utilizes a bevy of system and linguistic features, including message header information, quotations, direct address, and lexical relations. Furthermore, several similarity-based methods including a Lexical Match Algorithm (LMA) and a sliding window method are utilized to account for interactional idiosyncrasies. Experiments results on two web forums revealed that the proposed HIC algorithm significantly outperformed comparison techniques in terms of precision, recall, and F-measure at both the forum and thread levels. Additionally, an example was used to illustrate how the improved ICA results can facilitate enhanced social network and role analysis capabilities.
  • Wang, J., Tianjun, F. u., Lin, H., & Chen, H. (2008). Exploring gray web forums: Analysis and investigation of forum-based communities in Taiwan. Studies in Computational Intelligence, 135, 121-134.
    More info
    Abstract: Our society is in a state of transformation toward a "virtual society". However, due to the nature of anonymity and less observability, internet activities have become more diverse and obscure. As a result, unscrupulous individuals or criminals may exploit the internet as a channel for their illegal activities to avoid the apprehension by law enforcement officials. This paper examines the "Gray Web Forums" in Taiwan. We study their characteristics and develop an analysis framework for assisting investigations on forum communities. Based on the statistical data collected from online forums, we found that the relationship between a posting and its responses is highly correlated to the forum nature. In addition, hot threads extracted based on posting activity and our proposed metric can be used to assist analysts in identifying illegal or inappropriate contents. Furthermore, a member's role and his/her activities in a virtual community can be identified by member level analysis. In addition, two schemes based on content analysis were also developed to search for illegal information items in gray forums. The experiment results show that hot threads are correlated to illegal information items, but the retrieval effectiveness can be significantly improved by search schemes based on content analysis. © 2008 Springer-Verlag Berlin Heidelberg.
  • Xin, L. i., Chen, H., Dang, Y., Lin, Y., Larson, C. A., & Roco, M. C. (2008). A longitudinal analysis of nanotechnology literature: 1976-2004. Journal of Nanoparticle Research, 10(SUPPL. 1), 3-22.
    More info
    Abstract: Nanotechnology research and applications have experienced rapid growth in recent years. We assessed the status of nanotechnology research worldwide by applying bibliographic, content map, and citation network analysis to a data set of about 200,000 nanotechnology papers published in the Thomson Science Citation Index Expanded database (SCI) from 1976 to 2004. This longitudinal study shows a quasi-exponential growth of nanotechnology articles with an average annual growth rate of 20.7% after 1991. The United States had the largest contribution of nanotechnology research and China and Korea had the fastest growth rates. The largest institutional contributions were from the Chinese Academy of Sciences and the Russian Academy of Sciences. The high-impact papers generally described tools, theories, technologies, perspectives, and overviews of nanotechnology. From the top 20 institutions, based on the average number of paper citations in 1976-2004, 17 were in the Unites States, 2 in France and 1 in Germany. Content map analysis identified the evolution of the major topics researched from 1976 to 2004, including investigative tools, physical phenomena, and experiment environments. Both the country citation network and the institution citation network had relatively high clustering, indicating the existence of citation communities in the two networks, and specific patterns in forming citation communities. The United States, Germany, Japan, and China were major citation centers in nanotechnology research with close inter-citation relationships. © 2008 Springer Science+Business Media B.V.
  • Yang, C. C., Wei, C., & Chen, H. (2008). Editors' introduction special issue on multilingual knowledge management. Decision Support Systems, 45(3), 551-553.
  • Zhou, Y., Huang, F., & Chen, H. (2008). Combining probability models and web mining models: A framework for proper name transliteration. Information Technology and Management, 9(2), 91-103.
    More info
    Abstract: The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research we propose a generic transliteration framework, which incorporates an enhanced Hidden Markov Model (HMM) and a Web mining model. We improved the traditional statistical-based transliteration in three areas: (1) incorporated a simple phonetic transliteration knowledge base; (2) incorporated a bigram and a trigram HMM; (3) incorporated a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on an English-Arabic back transliteration. Experiments showed that when using HMM alone, a combination of the bigram and trigram HMM approach performed the best for English-Arabic transliteration. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 79.05%. Overall, our framework achieved a precision of 0.72 when the eight best transliterations were considered. Our results show promise for using transliteration techniques to improve multilingual Web retrieval. © Springer Science+Business Media, LLC 2007.
  • Zhu, B., & Chen, H. (2008). Communication-Garden System: Visualizing a computer-mediated communication process. Decision Support Systems, 45(4), 778-794.
    More info
    Abstract: Archives of computer-mediated communication (CMC) could be valuable organizational resources. Most CMC archive systems focus on presenting one of the three aspects of a CMC community: discussion content, participants' behavior, or social networks among participants. Very few CMC archive systems support the easy integration of these three aspects. This paper thus describes two-phase research to propose an automatic approach that facilitates users' integrated understanding of discussion content and behavior of CMC participants. We validated the approach through the development and evaluation of a prototype system, the Communication-Garden system. © 2008 Elsevier B.V. All rights reserved.
  • Abbasi, A., & Chen, H. (2007). A framework for stylometric similarity detection in online settings. Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights, 2, 1442-1451.
    More info
    Abstract: Online marketplaces and communication media such as email, web sites, forums, and chat rooms have been ubiquitously integrated into our everyday lives. Unfortunately, the anonymous nature of these channels makes them an ideal avenue for online fraud, hackers, and cybercrime. Anonymity and the sheer volume of online content make cyber identity tracing an essential yet strenuous endeavor for Internet users and human analysts. In order to address these challenges, we propose a framework for online stylometric analysis to assist in distinguishing authorship in online communities based on writing style. Our framework includes the use of a scalable identity-level similarity detection technique coupled with an extensive stylistic feature set and an identity database. The framework is intended to support stylometric authentication for Internet users as well as provide support for forensic investigations. The proposed technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 100 electronic market traders. The method outperformed benchmark stylometric techniques with an accuracy of approximately 95% when differentiating between 200 trader identities. The results indicate that the proposed stylometric analysis approach may help mitigate the effects of online anonymity abuse.
  • Abbasi, A., & Chen, H. (2007). Affect intensity analysis of dark web forums. ISI 2007: 2007 IEEE Intelligence and Security Informatics, 282-288.
    More info
    Abstract: Affects play an important role in influencing people's perceptions and decision making. Affect analysis is useful for measuring the presence of hate, violence, and the resulting propaganda dissemination across extremist groups. In this study we performed affect analysis of U.S. and Middle Eastern extremist group forum postings. We constructed an affect lexicon using a probabilistic disambiguation technique to measure the usage of violence and hate affects. These techniques facilitate In depth analysis of multilingual content. The proposed approach was evaluated by applying It across 16 U.S. supremacist and Middle Eastern extremist group forums. Analysis across regions reveals that the Middle Eastern test bed forums have considerably greater violence intensity than the U.S. groups. There is also a strong linear relationship between the usage of hate and violence across the Middle Eastern messages. © 2007 IEEE.
  • Abbasi, A., & Chen, H. (2007). Categorization and analysis of text in computer mediated communication archives using visualization. Proceedings of the ACM International Conference on Digital Libraries, 11-18.
    More info
    Abstract: Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization. Copyright 2007 ACM.
  • Chau, M., & Chen, H. (2007). Incorporating web analysis into neural networks: An example in hopfield net searching. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 37(3), 352-358.
    More info
    Abstract: Neural networks have been used in various applications on the World Wide Web, but most of them only rely on the available input-output examples without incorporating Web-specific knowledge, such as Web link analysis, into the network design. In this paper, we propose a new approach in which the Web is modeled as an asymmetric Hopfield Net. Each neuron in the network represents a Web page, and the connections between neurons represent the hyperlinks between Web pages. Web content analysis and Web link analysis are also incorporated into the model by adding a page content score function and a link score function into the weights of the neurons and the synapses, respectively. A simulation study was conducted to compare the proposed model with traditional Web search algorithms, namely, a breadth-first search and a best-first search using PageRank as the heuristic. The results showed that the proposed model performed more efficiently and effectively in searching for domain-specific Web pages. We believe that the model can also be useful in other Web applications such as Web page clustering and search result ranking. © 2007 IEEE.
  • Chau, M., Shiu, B., Chan, I., & Chen, H. (2007). Redips: Backlink search and analysis on the web for business intelligence analysis. Journal of the American Society for Information Science and Technology, 58(3), 351-365.
    More info
    Abstract: The World Wide Web presents significant opportunities for business intelligence analysis as it can provide information about a company's external environment and its stakeholders. Traditional business intelligence analysis on the Web has focused on simple keyword searching. Recently, it has been suggested that the incoming links, or backlinks, of a company's Web site (i.e., other Web pages that have a hyperlink pointing to the company of interest) can provide important insights about the company's "online communities." Although analysis of these communities can provide useful signals for a company and information about its stakeholder groups, the manual analysis process can be very time-consuming for business analysts and consultants. In this article, we present a tool called Redips that automatically integrates backlink meta-searching and text-mining techniques to facilitate users in performing such business intelligence analysis on the Web. The architectural design and implementation of the tool are presented in the article. To evaluate the effectiveness, efficiency, and user satisfaction of Redips, an experiment was conducted to compare the tool with two popular business intelligence analysis methods - using backlink search engines and manual browsing. The experiment results showed that Redips was statistically more effective than both benchmark methods (in terms of Recall and F-measure) but required more time in search tasks. In terms of user satisfaction, Redips scored statistically higher than backlink search engines in all five measures used, and also statistically higher than manual browsing in three measures. © 2006 Wiley Periodicals, Inc.
  • Chen, H., Chen, H. -., Quiñones, K. D., Su, H., Marshall, B., & Eggers, S. (2007). User-centered evaluation of Arizona BioPathway: an information extraction, integration, and visualization system. IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society, 11(5).
    More info
    Explosive growth in biomedical research has made automated information extraction, knowledge integration, and visualization increasingly important and critically needed. The Arizona BioPathway (ABP) system extracts and displays biological regulatory pathway information from the abstracts of journal articles. This study uses relations extracted from more than 200 PubMed abstracts presented in a tabular and graphical user interface with built-in search and aggregation functionality. This paper presents a task-centered assessment of the usefulness and usability of the ABP system focusing on its relation aggregation and visualization functionalities. Results suggest that our graph-based visualization is more efficient in supporting pathway analysis tasks and is perceived as more useful and easier to use as compared to a text-based literature-viewing method. Relation aggregation significantly contributes to knowledge-acquisition efficiency. Together, the graphic and tabular views in the ABP Visualizer provide a flexible and effective interface for pathway relation browsing and analysis. Our study contributes to pathway-related research and biological information extraction by assessing the value of a multiview, relation-based interface that supports user-controlled exploration of pathway information across multiple granularities.
  • Chen, H., Kantor, P., & Roberts, F. (2007). ISI 2007 Preface. ISI 2007: 2007 IEEE Intelligence and Security Informatics, iii-iv.
  • Chen, Y., Tseng, C., King, C., Wu, T. J., & Chen, H. (2007). Incorporating geographical contacts into social network analysis for contact tracing in epidemiology: A study on Taiwan SARS data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4506 LNCS, 23-36.
    More info
    Abstract: In epidemiology, contact tracing is a process to control the spread of an infectious disease and identify individuals who were previously exposed to patients with the disease. After the emergence of AIDS, Social Network Analysis (SNA) was demonstrated to be a good supplementary tool for contact tracing. Traditionally, social networks for disease investigations are constructed only with personal contacts. However, for diseases which transmit not only through personal contacts, incorporating geographical contacts into SNA has been demonstrated to reveal potential contacts among patients. In this research, we use Taiwan SARS data to investigate the differences in connectivity between personal and geographical contacts in the construction of social networks for these diseases. According to our results, geographical contacts, which increase the average degree of nodes from 0 to 108.62 and decrease the number of components from 961 to 82, provide much higher connectivity than personal contacts. Therefore, including geographical contacts is important to understand the underlying context of the transmission of these diseases. We further explore the differences in network topology between one-mode networks with only patients and multi-mode networks with patients and geographical locations for disease investigation. We find that including geographical locations as nodes in a social network provides a good way to see the role that those locations play in the disease transmission and reveal potential bridges among those geographical locations and households. © Springer-Verlag Berlin Heidelberg 2007.
  • Chung, W., & Chen, H. (2007). Building a directory for the underdeveloped web: An experiment on the arabic medical web directory. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4822 LNCS, 468-477.
    More info
    Abstract: Despite significant growth of the Web in recent years, some portions of the Web remain largely underdeveloped, as shown in a lack of high quality content and functionality. An example is the Arabic Web, in which a lack of well-structured Web directories has limited users' ability to browse for Arabic resources. In this research, we proposed an approach to building Web directories for the underdeveloped Web and developed a proof-of-concept prototype called Arabic Medical (AMed) Web Directory that supports browsing of over 5,000 Arabic medical Web sites and pages organized in a hierarchical structure. We conducted an experiment involving Arab subjects and found that AMed directory significantly outperformed a benchmark Arabic Web directory in terms of browsing effectiveness and user ratings. This research thus contributes to developing a useful Web directory for organizing information of the Arabic medical domain and to better understanding of supporting browsing on the underdeveloped Web. © Springer-Verlag Berlin Heidelberg 2007.
  • Daning, H. u., Chen, H., Huang, Z., & Roco, M. C. (2007). Longitudinal study on patent citations to academic research articles in nanotechnology (1976-2004). Journal of Nanoparticle Research, 9(4), 529-542.
    More info
    Abstract: Academic nanoscale science and engineering (NSE) research provides a foundation for nanotechnology innovation reflected in patents. About 60% or about 50,000 of the NSE-related patents identified by "full-text" keyword searching between 1976 and 2004 at the United States Patent and Trademark Office (USPTO) have an average of approximately 18 academic citations. The most cited academic journals, individual researchers, and research articles have been evaluated as sources of technology innovation in the NSE area over the 28-year period. Each of the most influential articles was cited about 90 times on the average, while the most influential author was cited more than 700 times by the NSE-related patents. Thirteen mainstream journals accounted for about 20% of all citations. Science, Nature and Proceedings of the National Academy of Sciences (PNAS) have consistently been the top three most cited journals, with each article being cited three times on average. There is another kind of influential journals, represented by Biosystems and Origin of Life, which have very few articles cited but with exceptionally high frequencies. The number of academic citations per year from ten most cited journals has increased by over 17 times in the interval (1990-1999) as compared to (1976-1989), and again over 3 times in the interval (2000-2004) as compared to (1990-1999). This is an indication of increased used of academic knowledge creation in the NSE-related patents. © 2007 Springer Science+Business Media, Inc.
  • Hsinchun, C. (2007). Exploring extremism and terrorism on the web: The Dark Web project. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4430 LNCS, 1-20.
    More info
    Abstract: In this paper we discuss technical issues regarding intelligence and security informatics (ISI) research to accomplish the critical missions of international security and counter-terrorism. We propose a research framework addressing the technical challenges facing counter-terrorism and crime-fighting applications with a primary focus on the knowledge discovery from databases (KDD) perspective. We also present several Dark Web related case studies for open-source terrorism information collection, analysis, and visualization. Using a web spidering approach, we have developed a large-scale, longitudinal collection of extremist-generated Internet-based multimedia and multilingual contents. We have also developed selected computational link analysis, content analysis, and authorship analysis techniques to analyze the Dark Web collection. © Springer-Verlag Berlin Heidelberg 2007.
  • Huang, Z., Jiexun, L. i., Hua, S. u., Watts, G. S., & Chen, H. (2007). Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Systems, 43(4), 1207-1225.
    More info
    Abstract: We present two algorithms for learning large-scale gene regulatory networks from microarray data: a modified information-theory-based Bayesian network algorithm and a modified association rule algorithm. Simulation-based evaluation using six datasets indicated that both algorithms outperformed their unmodified counterparts, especially when analyzing large numbers of genes. Both algorithms learned about 20% (50% if directionality and relation type were not considered) of the relations in the actual models. In our empirical evaluation based on two real datasets, domain experts evaluated subsets of learned relations with high confidence and identified 20-30% to be "interesting" or "maybe interesting" as potential experiment hypotheses. © 2006 Elsevier B.V. All rights reserved.
  • Jeixun, L. i., Hua, S. u., Chen, H., & Futscher, B. W. (2007). Optimal search-based gene subset selection for gene array cancer classification. IEEE Transactions on Information Technology in Biomedicine, 11(4), 398-405.
    More info
    PMID: 17674622;Abstract: High dimensionality has been a major problem for gene array-based cancer classification. It is critical to identify marker genes for cancer diagnoses. We developed a framework of gene selection methods based on previous studies. This paper focuses on optimal search-based subset selection methods because they evaluate the group performance of genes and help to pinpoint global optimal set of marker genes. Notably, this paper is the first to introduce tabu search (TS) to gene selection from high-dimensional gene array data. Our comparative study of gene selection methods demonstrated the effectiveness of optimal search-based gene subset selection to identify cancer marker genes. TS was shown to be a promising tool for gene subset selection. © 2007 IEEE.
  • Jiexun, L. i., Hua, S. u., & Chen, H. (2007). Identification of Marker Genes from High-Dimensional Microarray Data for Cancer Classification. Knowledge Discovery in Bioinformatics: Techniques, Methods, and Applications, 71-87.
  • Kaza, S., Daning, H. u., & Chen, H. (2007). Dynamic social network analysis of a dark network: Identifying significant facilitators. ISI 2007: 2007 IEEE Intelligence and Security Informatics, 40-46.
    More info
    Abstract: "Dark Networks" refer to various illegal and covert social networks like criminal and terrorist networks. These networks evolve over time with the formation and dissolution of links to survive control efforts by authorities. Previous studies have shown that the link formation process in such networks is influenced by a set of facilitators. However, there have been few empirical evaluations to determine the significant facilitators. In this study, we used dynamic social network analysis methods to examine several plausible link formation facilitators in a large-scale real-world narcotics network. Multivariate Cox regression showed that mutual acquaintance and vehicle affiliations were significant facilitators in the network under study. These findings provide insights into the link formation processes and the resilience of dark networks. They also can be used to help authorities predict co-offending in future crimes. © 2007 IEEE.
  • Kaza, S., Wang, Y., & Chen, H. (2007). Enhancing border security: Mutual information analysis to identify suspect vehicles. Decision Support Systems, 43(1), 199-210.
    More info
    Abstract: In recent years border safety has been identified as a critical part of homeland security. The Department of Homeland Security searches vehicles entering the country for drugs and other contraband. Customs and Border Protection (CBP) agents believe that such vehicles operate in groups and if the criminal links of one vehicle are known then their border crossing patterns can be used to identify other partner vehicles. We perform this association analysis by using mutual information (MI) to identify pairs of vehicles that may be involved in criminal activity. CBP agents also suggest that criminal vehicles may cross at certain times or ports to try and evade inspection. We propose to modify the MI formulation to include these heuristics by using law enforcement data from border-area jurisdictions. Statistical tests and selected cases judged by domain experts show that modified MI performs significantly better than classical MI in identifying potentially criminal vehicles. © 2006 Elsevier B.V. All rights reserved.
  • Leroy, G., & Chen, H. (2007). Introduction to the special issue on decision support in medicine. Decision Support Systems, 43(4), 1203-1206.
    More info
    Abstract: Information technology plays an important role in medicine because of the advanced decision support systems (DSS) it can provide. We provide an overview of the building blocks necessary for a medical decision support system and introduce seven research articles in this special issue that describe the development and evaluation of individual medical DSS building blocks or complete medical DSS. © 2006 Elsevier B.V. All rights reserved.
  • Leroy, G., Jennifer, X. u., Chung, W., Eggers, S., & Chen, H. (2007). An end user evaluation of query formulation and results review tools in three medical meta-search engines. International Journal of Medical Informatics, 76(11-12), 780-789.
    More info
    PMID: 16996298;Abstract: Purpose: Retrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools. Methods: We compared three medical search engines with support tools that require more or less effort from users to form a query and evaluate results. We carried out an end user study with 23 users who were asked to find information, i.e., subtopics and supporting abstracts, for a given theme. We used a balanced within-subjects design and report on the effectiveness, efficiency and usability of the support tools from the end user perspective. Conclusions: We found significant differences in efficiency but did not find significant differences in effectiveness between the three search engines. Dynamic user support tools requiring less effort led to higher efficiency. Fewer searches were needed and more documents were found per search when both query reformulation and result review tools dynamically adjust to the user query. The query reformulation tool that provided a long list of keywords, dynamically adjusted to the user query, was used most often and led to more subtopics. As hypothesized, the dynamic result review tools were used more often and led to more subtopics than static ones. These results were corroborated by the usability questionnaires, which showed that support tools that dynamically optimize output were preferred. © 2006 Elsevier Ireland Ltd. All rights reserved.
  • Qin, J., Zhou, Y., Reid, E., Lai, G., & Chen, H. (2007). Analyzing terror campaigns on the internet: Technical sophistication, content richness, and Web interactivity. International Journal of Human Computer Studies, 65(1), 71-84.
    More info
    Abstract: Terrorists and extremists are increasingly utilizing Internet technology to enhance their ability to influence the outside world. Due to the lack of multi-lingual and multimedia terrorist/extremist collections and advanced analytical methodologies, our empirical understanding of their Internet usage is still very limited. To address this research gap, we explore an integrated approach for identifying and collecting terrorist/extremist Web contents. We also propose a Dark Web Attribute System (DWAS) to enable quantitative Dark Web content analysis from three perspectives: technical sophistication, content richness, and Web interactivity. Using the proposed methodology, we identified and examined the Internet usage of major Middle Eastern terrorist/extremist groups. More than 200,000 multimedia Web documents were collected from 86 Middle Eastern multi-lingual terrorist/extremist Web sites. In our comparison of terrorist/extremist Web sites to US government Web sites, we found that terrorists/extremist groups exhibited similar levels of Web knowledge as US government agencies. Moreover, terrorists/extremists had a strong emphasis on multimedia usage and their Web sites employed significantly more sophisticated multimedia technologies than government Web sites. We also found that the terrorists/extremist groups are as effective as the US government agencies in terms of supporting communications and interaction using Web technologies. Advanced Internet-based communication tools such as online forums and chat rooms are used much more frequently in terrorist/extremist Web sites than government Web sites. Based on our case study results, we believe that the DWAS is an effective tool to analyse the technical sophistication of terrorist/extremist groups' Internet usage and could contribute to an evidence-based understanding of the applications of Web technologies in the global terrorism phenomena. © 2006 Elsevier Ltd. All rights reserved.
  • Quiñones, K. D., Hua, S. u., Marshall, B., Eggers, S., & Chen, H. (2007). User-centered evaluation of Arizona BioPathway: An information extraction, integration, and visualization system. IEEE Transactions on Information Technology in Biomedicine, 11(5), 527-536.
    More info
    PMID: 17912969;Abstract: Explosive growth in biomedical research has made automated information extraction, knowledge integration, and visualization increasingly important and critically needed. The Arizona BioPathway (ABP) system extracts and displays biological regulatory pathway information from the abstracts of journal articles. This study uses relations extracted from more than 200 PubMed abstracts presented in a tabular and graphical user interface with built-in search and aggregation functionality. This paper presents a task-centered assessment of the usefulness and usability of the ABP system focusing on its relation aggregation and visualization functionalities. Results suggest that our graph-based visualization is more efficient in supporting pathway analysis tasks and is perceived as more useful and easier to use as compared to a text-based literature-viewing method. Relation aggregation significantly contributes to knowledge-acquisition efficiency. Together, the graphic and tabular views in the ABP Visualizer provide a flexible and effective interface for pathway relation browsing and analysis. Our study contributes to pathway-related research and biological information extraction by assessing the value of a multiview, relation-based interface that supports user-controlled exploration of pathway information across multiple granularities. © 2007 IEEE.
  • Raghu, T. S., & Chen, H. (2007). Cyberinfrastructure for homeland security: Advances in information sharing, data mining, and collaboration systems. Decision Support Systems, 43(4), 1321-1323.
    More info
    Abstract: In summary, the special issue papers address very interesting and relevant issues related to Cyberinfrastructure for homeland security. It has been a privilege to guest edit this issue and be involved in the intellectual endeavors of researchers at the fore front of these efforts. We especially thank Professor Andrew Whinston, Editor-in-chief of Decision Support Systems, for giving us this opportunity and thank all the reviewers for their diligent effort in ensuring the quality of the papers. We thank all the authors for contributing their work to the special issue and bearing with us on some delays in the review process. We hope the readers share our enthusiasm for the papers published in this issue and for their relevance in advancing novel innovations in information systems specifically targeted to counterterrorism efforts. © 2006 Elsevier B.V. All rights reserved.
  • Reid, E. F., & Chen, H. (2007). Mapping the contemporary terrorism research domain. International Journal of Human Computer Studies, 65(1), 42-56.
    More info
    Abstract: A systematic view of terrorism research to reveal the intellectual structure of the field and empirically discern the distinct set of core researchers, institutional affiliations, publications, and conceptual areas can help us gain a deeper understanding of approaches to terrorism. This paper responds to this need by using an integrated knowledge-mapping framework that we developed to identify the core researchers and knowledge creation approaches in terrorism. The framework uses three types of analysis: (a) basic analysis of scientific output using citation, bibliometric, and social network analyses, (b) content map analysis of large corpora of literature, and (c) co-citation analysis to analyse linkages among pairs of researchers. We applied domain visualization techniques such as content map analysis, block-modeling, and co-citation analysis to the literature and author citation data from the years 1965 to 2003. The data were gathered from ten databases such as the ISI Web of Science. The results reveal: (1) the names of the top 42 core terrorism researchers (e.g., Brian Jenkins, Bruce Hoffman, and Paul Wilkinson) as well as their institutional affiliations; (2) their influential publications; (3) clusters of terrorism researchers who work in similar areas; and (4) that the research focus has shifted from terrorism as a low-intensity conflict to a strategic threat to world powers with increased focus on Osama Bin Laden. © 2006 Elsevier Ltd. All rights reserved.
  • Schroeder, J., Jennifer, X. u., Chen, H., & Chau, M. (2007). Automated criminal link analysis based on domain knowledge. Journal of the American Society for Information Science and Technology, 58(6), 842-855.
    More info
    Abstract: Link (association) analysis has been used in the criminal justice domain to search large datasets for associations between crime entities in order to facilitate crime investigations. However, link analysis still faces many challenging problems, such as information overload, high search complexity, and heavy reliance on domain knowledge. To address these challenges, this article proposes several techniques for automated, effective, and efficient link analysis. These techniques include the co-occurrence analysis, the shortest path algorithm, and a heuristic approach to identifying associations and determining their importance. We developed a prototype system called CrimeLink Explorer based on the proposed techniques. Results of a user study with 10 crime investigators from the Tucson Police Department showed that our system could help subjects conduct link analysis more efficiently than traditional single-level link analysis tools. Moreover, subjects believed that association paths found based on the heuristic approach were more accurate than those found based solely on the co-occurrence analysis and that the automated link analysis system would be of great help in crime investigations.
  • Schumaker, R. P., & Chen, H. (2007). Leveraging Question Answer technology to address terrorism inquiry. Decision Support Systems, 43(4), 1419-1430.
    More info
    Abstract: This paper investigates the potential use of dialog-based ALICEbots in disseminating terrorism information to the general public. In particular, we study the acceptance and response satisfaction of ALICEbot responses in both the general conversation and terrorism domains. From our analysis of three different knowledge sets: general conversation, terrorism, and combined, we found that users were more favorable to the systems that exhibited conversational flow. We also found that the system that incorporated both conversation and terrorism knowledge performed better than systems with only conversation or terrorism knowledge alone. Lastly, we were interested in what types of questions were the most prevalently used and discovered that questions beginning with 'wh*' words were the most popular method to start an interrogative sentence. However, 'wh* sentence starters surprisingly proved to be in a very narrow majority. © 2006 Elsevier B.V. All rights reserved.
  • Schumaker, R. P., Ginsburg, M., Chen, H., & Liu, Y. (2007). An evaluation of the chat and knowledge delivery components of a low-level dialog system:The AZ-ALICE experiment. Decision Support Systems, 42(4), 2236-2246.
    More info
    Abstract: An effective networked knowledge delivery platform is one of the Holy Grails of Web computing. Knowledge delivery approaches range from the heavy and narrow to the light and broad. This paper explores a lightweight and flexible dialog framework based on the ALICE system, and evaluates its performance in chat and knowledge delivery using both a conversational setting and a specific telecommunications knowledge domain. Metrics for evaluation are presented, and the evaluations of three experimental systems (a pure dialog system, a domain knowledge system, and a hybrid system combining dialog and domain knowledge) are presented and discussed. Our study of 257 subjects shows approximately a 20% user correction rate on system responses. Certain error classes (such as nonsense replies) were particular to the dialog system, while others (such as mistaking opinion questions for definition questions) were particular to the domain system. A third type of error, wordy and awkward responses, is a basic system property and spans all three experimental systems. We also show that the highest response satisfaction results are obtained when coupling domain-specific knowledge together with conversational dialog. © 2006 Elsevier B.V. All rights reserved.
  • Schumaker, R. P., Liu, Y., Ginsburg, M., & Chen, H. (2007). Evaluating the efficacy of a terrorism question/answer system. Communications of the ACM, 50(7), 74-80.
    More info
    Abstract: The TARA Project examined how a trio of modified chatterbots could be used to disseminate terrorism-related information to the general public. © 2007 ACM.
  • Tianjun, F. u., Abbasi, A., & Chen, H. (2007). Interaction coherence analysis for dark web forums. ISI 2007: 2007 IEEE Intelligence and Security Informatics, 343-350.
    More info
    Abstract: Interaction coherence analysis (ICA) attempts to accurately identify and construct interaction networks by using various features and techniques. It is useful to identify user roles, user's social and information value, as well as the social network structure of Dark Web communities. In this study, we applied interaction coherence analysis for Dark Web forums using the Hybrid Interaction Coherence (HIC) algorithm. Our algorithm utilizes both system features such as header information and quotations, and linguistic features such as direct address and lexical relation. Furthermore, several similarity-based methods, for example Vector Space Model, Dice equation, and sliding window, are used to address various types of noises. Two experiments have been conducted to compare our HIC algorithm with traditional linkage-based method, similarity-based method, and a simplified HIC method that does not address noise issues. The results demonstrate the effectiveness of our HIC algorithm for identifying interactions in Dark Web forums. © 2007 IEEE.
  • Wang, G. A., Kaza, S., Joshi, S., Chang, K., Atabakhsh, H., & Chen, H. (2007). The Arizona IDMatcher: A probabilistic identity matching system. ISI 2007: 2007 IEEE Intelligence and Security Informatics, 229-235.
    More info
    Abstract: Various law enforcement and intelligence tasks require managing identity information in an effective and efficient way. However, the quality issues of identity information make this task non-trivial. Various heuristic based systems have been developed to tackle the identity matching problem. However, deploying such systems may require special expertise in system configuration and customization for optimal system performance. In this paper, we propose an alternative system called the Arizona IDMatcher. The system relies on a machine learning algorithm to automatically generate a decision model for identity matching. Such a system requires minimal human configuration effort. Experiments show that the Arizona IDMatcher is very efficient in detecting matching identity records. Compared to IBM Identity Resolution (a commercial, heuristic-based system), the Arizona IDMatcher achieves better recall and overall F-measures in identifying matching identities in two large-scale real-world datasets. © 2007 IEEE.
  • Xin, L. i., Chen, H., Huang, Z., & Roco, M. C. (2007). Patent citation network in nanotechnology (1976-2004). Journal of Nanoparticle Research, 9(3), 337-352.
    More info
    Abstract: The patent citation networks are described using critical node, core network, and network topological analysis. The main objective is understanding of the knowledge transfer processes between technical fields, institutions and countries. This includes identifying key influential players and subfields, the knowledge transfer patterns among them, and the overall knowledge transfer efficiency. The proposed framework is applied to the field of nanoscale science and engineering (NSE), including the citation networks of patent documents, submitting institutions, technology fields, and countries. The NSE patents were identified by keywords "full-text" searching of patents at the United States Patent and Trademark Office (USPTO). The analysis shows that the United States is the most important citation center in NSE research. The institution citation network illustrates a more efficient knowledge transfer between institutions than a random network. The country citation network displays a knowledge transfer capability as efficient as a random network. The technology field citation network and the patent document citation network exhibit a less efficient knowledge diffusion capability than a random network. All four citation networks show a tendency to form local citation clusters. © 2007 Springer Science+Business Media, Inc.
  • Xin, L. i., Chen, H., Huang, Z., Hua, S. u., & Martinez, J. D. (2007). Global mapping of gene/protein interactions in PubMed abstracts: A framework and an experiment with P53 interactions. Journal of Biomedical Informatics, 40(5), 453-464.
    More info
    PMID: 17317333;PMCID: PMC2047827;Abstract: Gene/protein interactions provide critical information for a thorough understanding of cellular processes. Recently, considerable interest and effort has been focused on the construction and analysis of genome-wide gene networks. The large body of biomedical literature is an important source of gene/protein interaction information. Recent advances in text mining tools have made it possible to automatically extract such documented interactions from free-text literature. In this paper, we propose a comprehensive framework for constructing and analyzing large-scale gene functional networks based on the gene/protein interactions extracted from biomedical literature repositories using text mining tools. Our proposed framework consists of analyses of the network topology, network topology-gene function relationship, and temporal network evolution to distill valuable information embedded in the gene functional interactions in the literature. We demonstrate the application of the proposed framework using a testbed of P53-related PubMed abstracts, which shows that the literature-based P53 networks exhibit small-world and scale-free properties. We also found that high degree genes in the literature-based networks have a high probability of appearing in the manually curated database and genes in the same pathway tend to form local clusters in our literature-based networks. Temporal analysis showed that genes interacting with many other genes tend to be involved in a large number of newly discovered interactions. © 2007 Elsevier Inc. All rights reserved.
  • Xin, L. i., Chen, H., Zhang, Z., & Jiexun, L. i. (2007). Automatic patent classification using citation network information: An experimental study in nanotechnology. Proceedings of the ACM International Conference on Digital Libraries, 419-427.
    More info
    Abstract: Classifying and organizing documents in repositories is an active research topic in digital library studies. Manually classifying the large volume of patents and patent applications managed by patent offices is a labor-intensive task. Many previous studies have employed patent contents for patent classification with the aim of automating this process. In this research we propose to use patent citation information, especially the citation network structure information, to address the patent classification problem. We adopt a kernel-based approach and design kernel functions to capture content information and various citation-related information in patents. These kernels. performances are evaluated on a testbed of patents related to nanotechnology. Evaluation results show that our proposed labeled citation graph kernel, which utilized citation network structures, significantly outperforms the kernels that use no citation information or only direct citation information. Copyright 2007 ACM.
  • Xin, L. i., Lin, Y., Chen, H., & Roco, M. C. (2007). Worldwide nanotechnology development: A comparative study of USPTO, EPO, and JPO patents (1976-2004). Journal of Nanoparticle Research, 9(6), 977-1002.
    More info
    Abstract: To assess worldwide development of nanotechnology, this paper compares the numbers and contents of nanotechnology patents in the United States Patent and Trademark Office (USPTO), European Patent Office (EPO), and Japan Patent Office (JPO). It uses the patent databases as indicators of nanotechnology trends via bibliographic analysis, content map analysis, and citation network analysis on nanotechnology patents per country, institution, and technology field. The numbers of nanotechnology patents published in USPTO and EPO have continued to increase quasi-exponentially since 1980, while those published in JPO stabilized after 1993. Institutions and individuals located in the same region as a repository's patent office have a higher contribution to the nanotechnology patent publication in that repository ("home advantage" effect). The USPTO and EPO databases had similar high-productivity contributing countries and technology fields with large number of patents, but quite different high-impact countries and technology fields after the average number of received cites. Bibliographic analysis on USPTO and EPO patents shows that researchers in the United States and Japan published larger numbers of patents than other countries, and that their patents were more frequently cited by other patents. Nanotechnology patents covered physics research topics in all three repositories. In addition, USPTO showed the broadest representation in coverage in biomedical and electronics areas. The analysis of citations by technology field indicates that USPTO had a clear pattern of knowledge diffusion from highly cited fields to less cited fields, while EPO showed knowledge exchange mainly occurred among highly cited fields. © 2007 Springer Science+Business Media B.V.
  • Xin, L. i., Zhang, Z., Chen, H., & Jiexun, L. i. (2007). Graph kernel-based learning for gene function prediction from gene interaction network. Proceedings - 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007, 368-373.
    More info
    Abstract: Prediction of gene functions is a major challenge to biologists in the post-genomic era. Interactions between genes and their products compose networks and can be used to infer gene functions. Most previous studies used heuristic approaches based on either local or global information of gene interaction networks to assign unknown gene functions. In this study, we propose a graph kernel-based method that can capture the structure of gene interaction networks to predict gene functions. We conducted an experimental study on a test-bed of P53-related genes. The experimental results demonstrated better performance for our proposed method as compared with baseline methods. © 2007 IEEE.
  • Yang, C. C., Ng, T. D., Wang, J., Wei, C., & Chen, H. (2007). Analyzing and visualizing gray Web forum structure. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4430 LNCS, 21-33.
    More info
    Abstract: Web is a platform for users to search for information to fulfill their information needs but it is also an ideal platform to express personal opinions and comments. A virtual community is formed when a number of members participate in this kind of communication. Nowadays, teenagers are spending extensive amount of time to communicate with strangers in these virtual communities. At the same time, criminals and terrorists are also taking advantages of these virtual communities to recruit members and identify victims. Many Web forum users may not be aware that their participation in these virtual communities have violated the laws in their countries, for example, downloading pirated software or multimedia contents. Police officers cannot combat against this kind of criminal activities using the traditional approaches. We must rely on computing technologies to analyze and visualize the activities within these virtual communities to identify the suspects and extract the active groups. In this work, we introduce the social network analysis technique and information visualization technique for the Gray Web Forum - forum that may threaten public safety. © Springer-Verlag Berlin Heidelberg 2007.
  • Zhou, Y., Qin, J., Lai, G., & Chen, H. (2007). Collection of U.S. extremist online forums: A web mining approach. Proceedings of the Annual Hawaii International Conference on System Sciences.
    More info
    Abstract: Extremists' exploitation of computer-mediated communications such as online forums has recently gained much attention from academia and the government. However, due to the covert nature of these forums and the dynamic nature of the Internet, there have been no systematic methodologies developed for collection and analysis of online information created by extremists. In this study, we propose a systematic Web mining approach to collecting and monitoring extremist forums. Our proposed approach identifies extremist forums from various resources, addresses practical issues faced by researchers and experts in the extremist forum collection process. Such collection provides a foundation for quantitative forum analysis. Using the proposed approach, we created a collection of 110 U.S. domestic extremist forums containing more than 640,000 documents. We also report our findings on the multimedia usage pattern, participant distribution, and posting activity distribution. The collection building results demonstrate the effectiveness and feasibility of our approach. Furthermore, the extremist forum collection we created could serve as an invaluable data source to enable a better understanding of the extremists' movements. © 2007 IEEE.
  • Abbasi, A., & Chen, H. (2006). Visualizing authorship for identification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 60-71.
    More info
    Abstract: As a result of growing misuse of online anonymity, researchers have begun to create visualization tools to facilitate greater user accountability in online communities. In this study we created an authorship visualization called Writeprints that can help identify individuals based on their writing style. The visualization creates unique writing style patterns that can be automatically identified in a manner similar to fingerprint biometric systems. Writeprints is a principal component analysis based technique that uses a dynamic feature-based sliding window algorithm, making it well suited at visualizing authorship across larger groups of messages. We evaluated the effectiveness of the visualization across messages from three English and Arabic forums in comparison with Support Vector Machines (SVM) and found that Writeprints provided excellent classification performance, significantly outperforming SVM in many instances. Based on our results, we believe the visualization can assist law enforcement in identifying cyber criminals and also help users authenticate fellow online members in order to deter cyber deception. © Springer-Verlag Berlin Heidelberg 2006.
  • Chau, M., Huang, Z., Qin, J., Zhou, Y., & Chen, H. (2006). Building a scientific knowledge web portal: The NanoPort experience. Decision Support Systems, 42(2), 1216-1238.
    More info
    Abstract: There has been a tremendous growth in the amount of information and resources on the World Wide Web that are useful to researchers and practitioners in science domains. While the Web has made the communication and sharing of research ideas and results among scientists easier and faster than ever, its dynamic and unstructured nature also makes the scientists faced with such problems as information overload, vocabulary difference, and lack of analysis tools. To address these problems, it is highly desirable to have an integrated, "one-stop shopping" Web portal to support effective information searching and analysis as well as to enhance communication and collaboration among researchers in various scientific fields. In this paper, we review existing information retrieval techniques and related literature, and propose a framework for developing integrated Web portals that support information searching and analysis for scientific knowledge. Our framework incorporates collection building, meta-searching, keyword suggestion, and various content analysis techniques such as document summarization, document clustering, and topic map visualization. Patent analysis techniques such as citation analysis and content map analysis are also incorporated. To demonstrate the feasibility of our approach, we developed based on our architecture a knowledge portal, called NanoPort, in the field of nanoscale science and engineering. We report our experience and explore the various issues of relevance to developing a Web portal for scientific domains. The system was compared to other search systems in the field and several design issues were identified. An evaluation study was conducted and the results showed that subjects were more satisfied with the NanoPort system than with Scirus, a leading search engine for scientific articles. Through our prototype system, we demonstrated the feasibility of using such an integrated approach and the study brought insight into applying the proposed domain-independent architecture to different areas of science and engineering in the future. © 2006 Elsevier B.V. All rights reserved.
  • Chen, H. (2006). Intelligence and security informatics: Information systems perspective. Decision Support Systems, 41(3), 555-559.
  • Chen, H., & Jennifer, X. u. (2006). Intelligence and Security Informatics. Annual Review of Information Science and Technology, 40, 229-289.
  • Chen, H., Atabakhsh, H., Wang, A. G., Kaza, S., Tseng, L. C., Wang, Y., Joshi, S., Petersen, T., & Violette, C. (2006). COPLINK center: Social network analysis and identity deception detection for law enforcement and homeland security intelligence and security informatics: A crime data mining approach to developing border safe research. ACM International Conference Proceeding Series, 151, 49-50.
    More info
    Abstract: In this paper, we describe the highlights of the COPLINK Center for law enforcement and homeland security project. Two new components of the project are described, namely, identity resolution and mutual information.
  • Chung, W., Bonillas, A., Lai, G., Wei, X. i., & Chen, H. (2006). Supporting non-English Web searching: An experiment on the Spanish business and the Arabic medical intelligence portals. Decision Support Systems, 42(3), 1697-1714.
    More info
    Abstract: Although non-English-speaking online populations are growing rapidly, support for searching non-English Web content is much weaker than for English content. Prior research has implicitly assumed English to be the primary language used on the Web, but this is not the case for many non-English-speaking regions. This research proposes a language-independent approach that uses meta-searching, statistical language processing, summarization, categorization, and visualization techniques to build high-quality domain-specific collections and to support searching and browsing of non-English information. Based on this approach, we developed SBizPort and AMedPort for the Spanish business and Arabic medical domains respectively. Experimental results showed that the portals achieved significantly better search accuracy, information quality, and overall satisfaction than benchmark search engines. Subjects strongly favored the portals' search and browse functionality and user interface. This research thus contributes to developing and validating a useful approach to non-English Web searching and providing an example of supporting decision-making in non-English Web domains. © 2006 Elsevier B.V. All rights reserved.
  • Chung, W., Chen, H., Chang, W., & Chou, S. (2006). Fighting cybercrime: A review and the Taiwan experience. Decision Support Systems, 41(3), 669-682.
    More info
    Abstract: Cybercrime is becoming ever more serious. Findings from the 2002 Computer Crime and Security Survey show an upward trend that demonstrates a need for a timely review of existing approaches to fighting this new phenomenon in the information age. In this paper, we define different types of cybercrime and review previous research and current status of fighting cybercrime in different countries that rely on legal, organizational, and technological approaches. We focus on a case study of fighting cybercrime in Taiwan and discuss problems faced. Finally, we propose several recommendations to advance the work of fighting cybercrime. © 2004 Elsevier B.V. All rights reserved.
  • Hsu, F., Hu, P. J., & Chen, H. (2006). Examining the business-technology alignment in government agencies: A study of electronic record management systems in Taiwan. PACIS 2006 - 10th Pacific Asia Conference on Information Systems: ICT and Innovation Economy, 1090-1106.
    More info
    Abstract: For e-government to succeed, government agencies must manage their records and archives of which the sheer volume and diversity necessitate the use of electronic record management systems (ERMS). Using an established business-technology alignment model, we analyze an agency's strategic alignment choice and examine the outcomes and agency performance associated with that alignment. The specific research questions addressed in the study are as follows: (1) Do strategic alignment choices vary among agencies that differ in purpose or position within the overall government hierarchy? (2) Do agencies' alignment choices lead to different outcomes? and (3) Does performance in implementing, operating, and using ERMS vary among agencies that follow different alignment choices? We conducted a large-scale survey study of 3,319 government agencies in Taiwan. Our data support the propositions tested. Based on the findings, we discuss their implications for digital government research and practice.
  • Huang, Z., Chen, H., Guo, F., Xu, J. J., Soushan, W. u., & Chen, W. (2006). Expertise visualization: An implementation and study based on cognitive fit theory. Decision Support Systems, 42(3), 1539-1557.
    More info
    Abstract: Expertise management systems are being widely adopted in organizations to manage tacit knowledge. These systems have successfully applied many information technologies developed for document management to support collection, processing, and distribution of expertise information. In this paper, we report a study on the potential of applying visualization techniques to support more effective and efficient exploration of the expertise information space. We implemented two widely applied dimensionality reduction visualization techniques, the self-organizing map (SOM) and multidimensional scaling (MDS), to generate compact but distorted (due to the dimensionality reduction) map visualizations for an expertise data set. We tested cognitive fit theory in our context by comparing the SOM and MDS displays with a standard table display for five tasks selected from a low-level, domain-independent visual task taxonomy. The experimental results based on a survey data set of research expertise of the business school professors suggested that using both SOM and MDS visualizations is more efficient than using the table display for the associate, compare, distinguish, and cluster tasks, but not the rank task. Users generally achieved comparable effectiveness for all tasks using the tabular and map displays in our study. © 2006 Elsevier B.V. All rights reserved.
  • Huang, Z., Chen, H., Xin, L. i., & Roco, M. C. (2006). Connecting NSF funding to patent innovation in nanotechnology (2001-2004). Journal of Nanoparticle Research, 8(6), 859-879.
    More info
    Abstract: Nanotechnology research has experienced growth rapid in knowledge and innovations; it also attracted significant public funding in recent years. Several countries have recognized nanotechnology as a critical research domain that promises to revolutionize a wide range of fields of applications. In this paper we present an analysis of the funding for nanoscale science and engineering (NSE) at the National Science Foundation (NSF) and its implications on technological innovation (number of patents) in this field from 2001 to 2004. Using a combination of basic bibliometric analysis and content visualization tools we identify growth trends research topic distribution and the evolution in NSF funding and commercial patenting activities recorded at the United States Patent Office (USPTO). The patent citations are used to compare the impact of the NSF-funded research on nanotechnology development with research supported by other sources in the United States and abroad. The analysis shows that the NSF-funded researchers and patents authored by them have significantly higher impact based on patent citation measures in the four-year period than other comparison groups. The NSF-authored patent impact is growing faster with the lifetime of a patent indicating the long-term importance of fundamental research. © Springer Science+Business Media Inc. 2006.
  • Jennifer, X. u., Chen, H., Zhou, Y., & Qin, J. (2006). On the topology of the dark web of terrorist groups. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 367-376.
    More info
    Abstract: In recent years, terrorist groups have used the WWW to spread their ideologies, disseminate propaganda, and recruit members. Studying the terrorist websites may help us understand the characteristics of these websites and predict terrorist activities. In this paper, we propose to apply network topological analysis methods on systematically collected the terrorist website data and to study the structural characteristics at the Web page level. We conducted a case study using the methods on three collections of Middle-Eastern, US domestic, and Latin-American terrorist websites, We found that these three networks have the small-world and scale-free characteristics. We also found that smaller size websites which share same interests tend to make stronger inter-website linkage relationships. © Springer-Verlag Berlin Heidelberg 2006.
  • Jiexun, L. I., Zheng, R., & Chen, H. (2006). From fingerprint to writeprint. Communications of the ACM, 49(4), 76-82.
    More info
    Abstract: Writeprint-based identification is getting very popular in crime investigations due to increasing cybercrime incidents, and unavailability of fingerprints in cybercrime. Writeprint is composed of multiple features, such as vocabulary richness, length of sentence, use of function words, layout of paragraphs, and keywords. These writeprint features can represent an author's writing style, which is usually consistent across his or her writings, and become the basis of authorship analysis. A GA-baased feature selection model to identify writeprint features, can generate different combinations of features to achieve the highest fitness value. These selected key feature of writeprint, corresponding to the high accuracy of classification, is able to effectively represent the distinct writing style of author and can assist in identifying the authorship of online messages.
  • Jiexun, L. i., Wang, G., & Chen, H. (2006). Identity matching based on probabilistic relational models. Association for Information Systems - 12th Americas Conference On Information Systems, AMCIS 2006, 3, 1457-1464.
    More info
    Abstract: Identity management is critical to various organizational practices ranging from citizen services to crime investigation. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. In this study we propose a probabilistic relational model (PRM) based approach to match identities in databases. By exploring a database relational structure, we derive three categories of features, namely personal identity features, social activity features, and social relationship features. Based on these derived features, a probabilistic prediction model can be constructed to make a matching decision on a pair of identities. An experimental study using a real criminal dataset demonstrates the effectiveness of the proposed PRM-based approach. By incorporating social activity features, the average precision of identity matching increased from 53.73 % to 54.64%; furthermore, the incorporation of social relation features increased the average precision to 68.27%.
  • Jiexun, L. i., Xin, L. i., Hua, S. u., Chen, H., & Galbraith, D. W. (2006). A framework of integrating gene relations from heterogeneous data sources: An experiment on Arabidopsis thaliana. Bioinformatics, 22(16), 2037-2043.
    More info
    PMID: 16820427;Abstract: One of the most important goals of biological investigation is to uncover gene functional relations. In this study we propose a framework for extraction and integration of gene functional relations from diverse biological data sources, including gene expression data, biological literature and genomic sequence information. We introduce a two-layered Bayesian network approach to integrate relations from multiple sources into a genome-wide functional network. An experimental study was conducted on a test-bed of Arabidopsis thaliana. Evaluation of the integrated network demonstrated that relation integration could improve the reliability of relations by combining evidence from different data sources. Domain expert judgments on the gene functional clusters in the network confirmed the validity of our approach for relation integration and network inference. © 2006 Oxford University Press.
  • Kaza, S., Wang, Y., & Chen, H. (2006). Suspect vehicle identification for border safety with modified mutual information. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 308-318.
    More info
    Abstract: The Department of Homeland Security monitors vehicles entering and leaving the country at land ports of entry. Some vehicles are targeted to search for drugs and other contraband. Customs and Border Protection agents believe that vehicles involved in illegal activity operate in groups. If the criminal links of one vehicle are known then their border crossing patterns can be used to identify other partner vehicles. We perform this association analysis by using mutual information (MI) to identify pairs of vehicles that are potentially involved in criminal activity. Domain experts also suggest that criminal vehicles may cross at certain times of the day to evade inspection. We propose to modify the mutual information formulation to include this heuristic by using cross-jurisdictional criminal data from border-area jurisdictions. We find that the modified MI with time heuristics performs better than classical MI in identifying potentially criminal vehicles. © Springer-Verlag Berlin Heidelberg 2006.
  • Kaza, S., Wang, Y., & Chen, H. (2006). Target vehicle identification for border safety with modified mutual information. ACM International Conference Proceeding Series, 151, 410-411.
    More info
    Abstract: In recent years border security has been identified as a critical part of homeland security. The Department of Homeland Security monitors vehicles entering and leaving the country at land borders. Some vehicles are targeted to search for drugs and other contraband. Customs and Border Protection agents believe that vehicles involved in illegal activity operate in groups. If the criminal links of one vehicle are known then their border crossing patterns can be used to identify other partner vehicles. We perform this association analysis by using mutual information (MI) to identify pairs of vehicles that are potentially involved in criminal activity. Domain experts also suggest that criminal vehicles may cross at certain times of the day to evade inspection. We propose to modify the MI formulation to include this heuristic by using cross-jurisdictional criminal data from border-area jurisdictions.
  • Marshall, B. B., Chen, H., Shen, R., & Fox, E. A. (2006). Moving digital libraries into the student learning space: The GetSmart experience. ACM Journal on Educational Resources in Computing, 6(1).
    More info
    Abstract: The GetSmart system was built to support theoretically sound learning processes in a digital library environment by integrating course management, digital library, and concept mapping components to support a constructivist, six-step, information search process. In the fall of 2002 more than 100 students created 1400 concept maps as part of selected computing classes offered at the University of Arizona and Virginia Tech. Those students conducted searches, obtained course information, created concept maps, collaborated in acquiring knowledge, and presented their knowledge representations. This article connects the design elements of the GetSmart system to targeted concept-map-based learning processes, describes our system and research testbed, and analyzes our system usage logs. Results suggest that students did in fact use the tools in an integrated fashion, combining knowledge representation and search activities. After concept mapping was included in the curriculum, we observed improvement in students' online quiz scores. Further, we observed that students in groups collaboratively constructed concept maps with multiple group members viewing and updating map details. © 2007 ACM.
  • Marshall, B., & Chen, H. (2006). Using importance flooding to identify interesting networks of criminal activity. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 14-25.
    More info
    Abstract: In spite of policy concerns and high costs, the law enforcement community is investing heavily in data sharing initiatives. Cross-jurisdictional criminal justice information (e.g., open warrants and convictions) is important, but different data sets are needed for investigational activities where requirements are not as clear and policy concerns abound. The community needs sharing models that employ obtainable data sets and support real-world investigational tasks. This work presents a methodology for sharing and analyzing investigation-relevant data. Our importance flooding application extracts interesting networks of relationships from large law enforcement data sets using user-controlled investigation heuristics and spreading activation. Our technique implements path-based interestingness rules to help identify promising associations to support creation of investigational link charts. In our experiments, the importance flooding approach outperformed relationship-weight-only models in matching expert-selected associations. This methodology is potentially useful for large cross-jurisdictional data sets and investigations. © Springer-Verlag Berlin Heidelberg 2006.
  • Marshall, B., Chen, H., & Madhusudan, T. (2006). Matching knowledge elements in concept maps using a similarity flooding algorithm. Decision Support Systems, 42(3), 1290-1306.
    More info
    Abstract: Concept mapping systems used in education and knowledge management emphasize flexibility of representation to enhance learning and facilitate knowledge capture. Collections of concept maps exhibit terminology variance, informality, and organizational variation. These factors make it difficult to match elements between maps in comparison, retrieval, and merging processes. In this work, we add an element anchoring mechanism to a similarity flooding (SF) algorithm to match nodes and substructures between pairs of simulated maps and student-drawn concept maps. Experimental results show significant improvement over simple string matching with combined recall accuracy of 91% for conceptual nodes and concept → link → concept propositions in student-drawn maps. © 2005 Elsevier B.V. All rights reserved.
  • Marshall, B., Hua, S. u., McDonald, D., Eggers, S., & Chen, H. (2006). Aggregating automatically extracted regulatory pathway relations. IEEE Transactions on Information Technology in Biomedicine, 10(1), 100-108.
    More info
    PMID: 16445255;Abstract: Automatic tools to extract information from biomedical texts are needed to help researchers leverage the vast and increasing body of biomedical literature. While several biomedical relation extraction systems have been created and tested, little work has been done to meaningfully organize the extracted relations. Organizational processes should consolidate multiple references to the same objects over various levels of granularity, connect those references to other resources, and capture contextual information. We propose a feature decomposition approach to relation aggregation to support a five-level aggregation framework. Our BioAggregate tagger uses this approach to identify key features in extracted relation name strings. We show encouraging feature assignment accuracy and report substantial consolidation in a network of extracted relations. © 2006 IEEE.
  • Mcdonald, D. M., & Chen, H. (2006). Summary in context: Searching versus browsing. ACM Transactions on Information Systems, 24(1), 111-141.
    More info
    Abstract: The use of text summaries in information-seeking research has focused on query-based summaries. Extracting content that resembles the query alone, however, ignores the greater context of the document. Such context may be central to the purpose and meaning of the document. We developed a generic, a query-based, and a hybrid summarizer, each with differing amounts of document context. The generic summarizer used a blend of discourse information and information obtained through traditional surface-level analysis. The query-based summarizer used only query-term information, and the hybrid summarizer used some discourse information along with query-term information. The validity of the generic summarizer was shown through an intrinsic evaluation using a well-established corpus of human-generated summaries. All three summarizers were then compared in an information-seeking experiment involving 297 subjects. Results from the information-seeking experiment showed that the generic summaries outperformed all others in the browse tasks, while the query-based and hybrid summaries outperformed the generic summary in the search tasks. Thus, the document context of generic summaries helped users browse, while such context was not helpful in search tasks. Such results are interesting given that generic summaries have not been studied in search tasks and the that majority of Internet search engines rely solely on query-based summaries. © 2006 ACM.
  • Qin, J., Zhou, Y., Chau, M., & Chen, H. (2006). Multilingual web retrieval: An experiment in English-Chinese business intelligence. Journal of the American Society for Information Science and Technology, 57(5), 671-683.
    More info
    Abstract: As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIR techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-by-word translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise. © 2006 Wiley Periodicals, Inc.
  • Qin, J., Zhou, Y., Reid, E., Lai, G., & Chen, H. (2006). Unraveling International Terrorist Groups' exploitation of the Web: Technical sophistication, media richness, and web interactivity. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3917 LNCS, 4-15.
    More info
    Abstract: Terrorists and extremists have become mainstream exploiters of the Internet beyond routine communication operations and dramatically increased their own ability to influence the outside world. Although this alternate side of the Internet, referred to as the "Dark Web," has recently received extensive government and media attention, the terrorists/extremists' Internet usage is still under-researched because of the lack of systematic Dark Web content collection and analysis methodologies. To address this research gap, we explore an integrated approach for identifying and collecting terrorist/extremist Web contents. We also propose a framework called the Dark Web Attribute System (DWAS) to enable quantitative Dark Web content analysis from three perspectives: technical sophistication, media richness, and Web interactivity. Using the proposed methodology, we collected and examined more than 200,000 multimedia Web documents created by 86 Middle Eastern multi-lingual terrorist/extremist organizations. In our comparison of terrorist/extremist Web sites to U.S. government Web sites, we found that terrorists/extremist groups exhibited similar levels of Web knowledge as U.S. government agencies. We also found that the terrorists/extremist groups are as effective as the U.S. government agencies in terms of supporting communications and interaction using Web technologies. Based on our case study results, we believe that the DWAS is an effective framework to analyze the technical sophistication of terrorist/extremist groups' Internet usage and our Dark Web analysis methodology could contribute to an evidence-based understanding of the applications of Web technologies in the global terrorism phenomena. © Springer-Verlag Berlin Heidelberg 2006.
  • Salem, A., Reid, E., & Chen, H. (2006). Content analysis of Jiliadi extremist groups' videos. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 615-620.
    More info
    Abstract: This paper presents an exploratory study of jihadi extremist groups' videos using content analysis and a multimedia coding tool to explore the types of videos, groups' modus operandi, and production features. The videos convey messages powerful enough to mobilize members, sympathizers, and even new recruits to launch attacks that will once again be captured and disseminated via the Internet. The content collection and analysis of the groups' videos can help policy makers, intelligence analysts, and researchers better understand the groups' terror campaigns and modus operandi, and help suggest counter-intelligence strategies and tactics for troop training. © Springer-Verlag Berlin Heidelberg 2006.
  • Schumaker, R. P., & Chen, H. (2006). Textual analysis of stock market prediction using financial news articles. Association for Information Systems - 12th Americas Conference On Information Systems, AMCIS 2006, 3, 1422-1430.
    More info
    Abstract: This paper examines the role of financial news articles on three different textual representations; Bag of Words, Noun Phrases, and Named Entities and their ability to predict discrete number stock prices twenty minutes after an article release. Using a Support Vector Machine (SVM) derivative, we show that our model had a statistically significant impact on predicting future stock prices compared to linear regression. We further demonstrate that using a Noun Phrase representation scheme performs better than the de facto standard of Bag of Words.
  • Schumaker, R. P., Liu, Y., Ginsburg, M., & Chen, H. (2006). Evaluating mass knowledge acquisition using the ALICE chatterbot: The AZ-ALICE dialog system. International Journal of Human Computer Studies, 64(11), 1132-1140.
    More info
    Abstract: In this paper, we evaluate mass knowledge acquisition using modified ALICE chatterbots. In particular we investigate the potential of allowing subjects to modify chatterbot responses to see if distributed learning from a web environment can succeed. This experiment looks at dividing knowledge into general conversation and domain specific categories for which we have selected telecommunications. It was found that subject participation in knowledge acquisition can contribute a significant improvement to both the conversational and telecommunications knowledge bases. We further found that participants were more satisfied with domain-specific responses rather than general conversation. © 2006 Elsevier Ltd. All rights reserved.
  • Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A multi-layer Naïve Bayes model for approximate identity matching. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 479-484.
    More info
    Abstract: Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a Naïve Bayes identity matching model that improves existing techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique and achieves higher precision than the record comparison technique, In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 30% labeled instances, our model achieves a performance comparable to that of a fully supervised learning. © Springer-Verlag Berlin Heidelberg 2006.
  • Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A probabilistic model for approximate identity matching. ACM International Conference Proceeding Series, 151, 462-463.
    More info
    Abstract: Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Naïve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.
  • Wang, G. A., Xu, J. J., & Chen, H. (2006). Using social contextual information to match criminal identities. Proceedings of the Annual Hawaii International Conference on System Sciences, 4, 81b.
    More info
    Abstract: Criminal identity matching is crucial to crime investigation in law enforcement agencies. Existing techniques match identities that refer to the same individuals based on simple identity features. These techniques are subject to several problems. First, there is an effectiveness trade-off between the false negative and false positive rates. The improvement of one rate usually lowers the other. Second, in some situations such as identity theft, simple-feature-based techniques are unable to match identities that have completely different identity feature values. We argue that the information about the social context of an individual may provide additional information for revealing the individual's identity, helping improve the effectiveness of identity matching techniques. We define two types of social contextual features: role-based personal features and social group features. Experiments showed that social contextual features, especially the structural similarity and the relational similarity, significantly improved the precision without lowering the recall of criminal identity matching tasks. © 2006 IEEE.
  • Wang, J., Tianjun, F. u., Lin, H., & Chen, H. (2006). A framework for exploring Gray Web Forums: Analysis of forum-based communities in Taiwan. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 498-503.
    More info
    Abstract: This paper examines the "Gray Web Forums" in Taiwan. We study their characteristics and develop an analysis framework for assisting investigations on forum communities. Based on the statistical data collected from online forums, we found that the relationship between a posting and its responses is highly correlated to the forum nature. In addition, hot threads extracted based on the proposed metric can be used to assist analysts in identifying illegal or inappropriate contents. Furthermore, members' roles and activities in a virtual community can be identified by member level analysis. © Springer-Verlag Berlin Heidelberg 2006.
  • Zheng, R., Jiexun, L. i., Chen, H., & Huang, Z. (2006). A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology, 57(3), 378-393.
    More info
    Abstract: With the rapid proliferation of Internet technologies and applications, misuse of online messages for inappropriate or illegal purposes has become a major concern for society. The anonymous nature of online-message distribution makes identity tracing a critical problem. We developed a framework for authorship identification of online messages to address the identity-tracing problem. In this framework, four types of writing-style features (lexical, syntactic, structural, and content-specific features) are extracted and inductive learning algorithms are used to build feature-based classification models to identify authorship of online messages. To examine this framework, we conducted experiments on English and Chinese online-newsgroup messages. We compared the discriminating power of the four types of features and of three classification techniques: decision trees, backpropagation neural networks, and support vector machines. The experimental results showed that the proposed approach was able to identify authors of online messages with satisfactory accuracy of 70 to 95%. All four types of message features contributed to discriminating authors of online messages. Support vector machines outperformed the other two classification techniques in our experiments. The high performance we achieved for both the English and Chinese datasets showed the potential of this approach in a multiple-language context.
  • Zhou, Y., Qin, J., & Chen, H. (2006). CMedPort: An integrated approach to facilitating Chinese medical information seeking. Decision Support Systems, 42(3), 1431-1448.
    More info
    Abstract: As the number of non-English resources available on the Web is increasing rapidly, developing information retrieval techniques for non-English languages is becoming an urgent and challenging issue. In this research to facilitate information seeking in a multilingual world, we focused on discovering how search-engine techniques developed for English could be generalized for use with other languages. We proposed a general framework incorporating a focused collection-building technique, a generic language processing ability, an integration of information resources, and a post-retrieval analysis module. Based on this approach, we developed CMedPort, a Chinese Web portal in the medical domain that not only allows users to search for Web pages from local collections and meta-search engines but also provides encoding conversion between simplified and traditional Chinese to support cross-regional search and document summarization and categorization. User studies were conducted to compare the effectiveness and efficiency of CMedPort with those of three major Chinese search engines. Results indicate that CMedPort achieved similar accuracy for search tasks, but exhibited significantly higher recall than each of the three search engines as well as higher precision than two of the search engines for browse tasks. There were no significant differences among the efficiency measures for CMedPort and benchmarks systems. A post-questionnaire regarding system usability indicated that CMedPort achieved significantly higher user satisfaction than any of the three benchmark systems. The subjects especially liked CMedPort's categorizer, commenting that it helped improve understanding of search results. These encouraging outcomes suggest a promising future for applying our approach to Internet searching and browsing in a multilingual world. © 2005 Elsevier B.V. All rights reserved.
  • Zhou, Y., Qin, J., Lai, G., Reid, E., & Chen, H. (2006). Exploring the dark side of the Web: Collection and analysis of U.S. extremist online forums. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3975 LNCS, 621-626.
    More info
    Abstract: Contents in extremist online forums are invaluable data sources for extremism reseach. In this study, we propose a systematic Web mining approach to collecting and monitoring extremist forums. Our proposed approach identifies extremist forums from various resources, addresses practical issues faced by researchers and experts in the extremist forum collection process. Such collection provides a foundation for quantitative forum analysis. Using the proposed approach, we created a collection of 110 U.S. domestic extremist forums containing more than 640,000 documents. The collection building results demonstrate the effectiveness and feasibility of our approach. Furthermore, the extremist forum collection we created could serve as an invaluable data source to enable a better understanding of the extremism movements. © Springer-Verlag Berlin Heidelberg 2006.
  • Abbasi, A., & Chen, H. (2005). Applying authorship analysis to Arabic web content. Lecture Notes in Computer Science, 3495, 183-197.
    More info
    Abstract: The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world's social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset. © Springer-Verlag Berlin Heidelberg 2005.
  • Abbasi, A., & Chen, H. (2005). Applying authorship analysis to extremist-group Web forum messages. IEEE Intelligent Systems, 20(5), 67-75.
    More info
    Abstract: The advantages of applying authorship analysis to extremist-group Web forum messages are discussed. Evaluating the linguistic features of Web messages and comparing them to known writing styles offers the intelligence community a tool for identifying patterns of terrorist communication. Authorship characterization attempts to formulate an author profile by making inferences about gender, education, and cultural backgrounds on the basis of writing style. The research showed significant discrimination power in the application of authorship identification techniques to English and Arabic extremist group forum messages.
  • Chau, M., Qin, J., Zhou, Y., Tseng, C., & Chen, H. (2005). SpidersRUs: Automated development of vertical search engines in different domains and languages. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 110-111.
    More info
    Abstract: In this paper we discuss the architecture of a tool designed to help users develop vertical search engines in different domains and different languages. The design of the tool is presented and an evaluation study was conducted, showing that the system is easier to use than other existing tools. Copyright 2005 ACM.
  • Chen, H. (2005). Digital library development in the Asia Pacific. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3815 LNCS, 509-524.
    More info
    Abstract: Over the past decade the development of digital library activities within Asia Pacific has been steadily increasing. Through a meta-analysis of the publications and content within International Conference on Asian Digital Libraries (ICADL) and other major regional digital library conferences over the past few years, we see an increase in the level of activity in Asian digital library research. This reflects high continuous interest among digital library researchers and practitioners internationally. Digital library research in the Asia Pacific is uniquely positioned to help develop digital libraries of significant cultural heritage and indigenous knowledge and advance cross-cultural and cross-lingual digital library research. © Springer-Verlag Berlin Heidelberg 2005.
  • Chen, H. (2005). Introduction to the special topic issue: Intelligence and security informatics. Journal of the American Society for Information Science and Technology, 56(3), 217-220.
  • Chen, H., & Wang, F. (2005). Artificial intelligence for homeland security. IEEE Intelligent Systems, 20(5), 12-16.
  • Chen, H., Atabakhsh, H., Tseng, C., Marshall, B., Kaza, S., Eggers, S., Gowda, H., Shah, A., Petersen, T., & Violette, C. (2005). Visualization in law enforcement. Conference on Human Factors in Computing Systems - Proceedings, 1268-1271.
    More info
    Abstract: Visualization techniques have proven to be critical in helping crime analysis. By interviewing and observing Criminal Intelligence Officers (CIO) and civilian crime analysts at the Tucson Police Department (TPD), we found that two types of tasks are important for crime analysis: crime pattern recognition and criminal association discovery. We developed two separate systems that provide automatic visual assistance on these tasks. To help identify crime patterns, a Spatial Temporal Visualization (STV) system was designed to integrate a synchronized view of three types of visualization techniques: a GIS view, a timeline view and a periodic pattern view. The Criminal Activities Network (CAN) system extracts, visualizes and analyzes criminal relationships using spring-embedded and blockmodeling algorithms. This paper discusses the design and functionality of these two systems and the lessons learned from the development process and interaction with law enforcement officers.
  • Chen, H., Marshall, B., Su, H., McDonald, D., & Chen, H. -. (2005). Linking ontological resources using aggregatable substance identifiers to organize extracted relations. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.
    More info
    Systems that extract biological regulatory pathway relations from free-text sources are intended to help researchers leverage vast and growing collections of research literature. Several systems to extract such relations have been developed but little work has focused on how those relations can be usefully organized (aggregated) to support visualization systems or analysis algorithms. Ontological resources that enumerate name strings for different types of biomedical objects should play a key role in the organization process. In this paper we delineate five potentially useful levels of relational granularity and propose the use of aggregatable substance identifiers to help reduce lexical ambiguity. An aggregatable substance identifier applies to a gene and its products. We merged 4 extensive lexicons and compared the extracted strings to the text of five million MEDLINE abstracts. We report on the ambiguity within and between name strings and common English words. Our results show an 89% reduction in ambiguity for the extracted human substance name strings when using an aggregatable substance approach.
  • Chung, W., Chen, H., & Nunamaker Jr., J. F. (2005). A visual framework for knowledge discovery on the web: An empirical study of business intelligence exploration. Journal of Management Information Systems, 21(4), 57-84.
    More info
    Abstract: Information overload often hinders knowledge discovery on the Web. Existing tools lack analysis and visualization capabilities. Search engine displays often overwhelm users with irrelevant information. This research proposes a visual framework for knowledge discovery on the Web. The framework incorporates Web mining, clustering, and visualization techniques to support effective exploration of knowledge. Two new browsing methods were developed and applied to the business intelligence domain: Web community uses a genetic algorithm to organize Web sites into a tree format; knowledge map uses a multidimensional scaling algorithm to place Web sites as points on a screen. Experimental results show that knowledge map out-performed Kartoo, a commercial search engine with graphical display, in terms of effectiveness and efficiency. Web community was found to be more effective, efficient, and usable than result list. Our visual framework thus helps to alleviate information overload on the Web and offers practical implications for search engine developers. © 2005 M.E. Sharpe, Inc.
  • Chung, W., Chen, H., Chaboya, L. G., O'Toole, C. D., & Atabakhsh, H. (2005). Evaluating event visualization: A usability study of COPLINK Spatio-Temporal Visualizer. International Journal of Human Computer Studies, 62(1), 127-157.
    More info
    Abstract: Event visualization holds the promise of alleviating information overload in human analysis and numerous tools and techniques have been developed and evaluated. However, previous work does not specifically address either the coordination of event dimensions with the types of tasks involved or the way that visualizing different event dimensions can benefit human analysis. In this paper, we propose a taxonomy of event visualization and present a methodology for evaluating a coordinated event visualization tool called COPLINK Spatio-Temporal Visualizer (STV). The taxonomy encompasses various event dimensions, application domains, visualization metaphors, evaluation methods and performance measures. The evaluation methodology examines different event dimensions and different task types, thus juxtaposing two important elements of evaluating a tool. To achieve both internal and external validity, a laboratory experiment with students and a field study with crime analysis experts were conducted. Findings of our usability study show that STV could support crime analysis involving multiple, coordinated event dimensions as effectively as it could analyze individual, uncoordinated event dimensions. STV was significantly more effective and efficient than Microsoft Excel in performing coordinated tasks and was significantly more efficient in doing uncoordinated tasks related to temporal, spatial and aggregated information. Also, STV had compared favorably with Excel in completing uncoordinated tasks related to temporal and spatial information, respectively. Subjects' comments showed STV to be intuitive, useful and preferable to existing crime analysis methods. © 2004 Elsevier Ltd. All rights reserved.
  • Chung, W., Elhourani, T., Bonillas, A., Lai, G., Wei, X. i., & Chen, H. (2005). Supporting information seeking in multinational organizations: A knowledge portal approach. Proceedings of the Annual Hawaii International Conference on System Sciences, 272-.
    More info
    Abstract: As multinational organizations increasingly use the Web to seek information, there is a need for better support of searching the Web across different regions. However, support for Internet searching in non-English speaking regions is much weaker than that in English-speaking regions. To alleviate the problems, we propose a knowledge portal approach to supporting cross-regional searching of multinational organizations. The approach was used to build two Web portals in the Spanish business and Arabic medical domains. Experimental results show that our portals achieved significantly better performance (in terms of search accuracy and user satisfaction) than existing search engines in the corresponding domains. The encouraging findings point to a promising future of the approach to facilitating cross-regional searching in multinational organizations.
  • Chung, W., Lai, G., Bonillas, A., Elhourani, T., Tseng, T. (., & Chen, H. (2005). Building web directories in different languages for decision support: A semi-automatic approach. Association for Information Systems - 11th Americas Conference on Information Systems, AMCIS 2005: A Conference on a Human Scale, 1, 467-475.
    More info
    Abstract: Web directories organize voluminous information into hierarchical structures, helping users to quickly locate relevant information and to support decision-making. The development of existing Web directories either relies on expert participation that may not be available or uses automatic approaches that lack precision. As more users access the Web in their native languages, better approaches to organizing and developing non-English Web directories are needed. In this paper, we have proposed a semi-automatic approach to building domain-specific Web directories in different languages by combining human precision and machine efficiency. Using the approach, we have built Web directories in the Spanish business (SBiz) and Arabic medical (AMed) domains. Experimental results show that the SBiz and AMed directories achieved significantly better recall, F value, and satisfaction rating than benchmark directories. These encouraging results show that the approach can be used to build high-quality Web directories to sup ort decision-making.
  • Hu, P. J., Lin, C., & Chen, H. (2005). User acceptance of intelligence and security informatics technology: A study of COPLINK. Journal of the American Society for Information Science and Technology, 56(3), 235-244.
    More info
    Abstract: The importance of Intelligence and Security Informatics (ISI) has significantly increased with the rapid and large-scale migration of local/national security information from physical media to electronic platforms, including the Internet and information systems. Motivated by the significance of ISI in law enforcement (particularly in the digital government context) and the limited investigations of officers' technology-acceptance decision-making, we developed and empirically tested a factor model for explaining law-enforcement officers' technology acceptance. Specifically, our empirical examination targeted the COPLINK technology and involved more than 280 police officers. Overall, our model shows a good fit to the data collected and exhibits satisfactory power for explaining law-enforcement officers' technology acceptance decisions. Our findings have several implications for research and technology management practices in law enforcement, which are also discussed.
  • Huang, Z., Chen, H., Yan, L., & Roco, M. C. (2005). Longitudinal nanotechnology development (1991 - 2002): National science foundation funding and its impact on patents. Journal of Nanoparticle Research, 7(4-5), 343-376.
    More info
    Abstract: Nanotechnology holds the promise to revolutionize a wide range of products, processes and applications. It is recognized by over sixty countries as critical for their development at the beginning of the 21st century. A significant public investment of over $1 billion annually is devoted to nanotechnology research in the United States. This paper provides an analysis of the National Science Foundation (NSF) funding of nanoscale science and engineering (NSE) and its relationship to the innovation as reflected in the United States Patent and Trade Office (USPTO) patent data. Using a combination of bibliometric analysis and visualization tools, we have identified several general trends, the key players, and the evolution of technology topics in the NSF funding and commercial patenting activities. This study documents the rapid growth of innovation in the field of nanotechnology and its correlation to funding. Statistical analysis shows that the NSF-funded researchers and their patents have higher impact factors than other private and publicly funded reference groups. This suggests the importance of fundamental research on nanotechnology development. The number of cites per NSF-funded inventor is about 10 as compared to 2 for all inventors of NSE-related patents recorded at USPTO, and the corresponding Authority Score is 20 as compared to 1.8. © Springer 2005.
  • Huang, Z., Xin, L. i., & Chen, H. (2005). Link prediction approach to collaborative filtering. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 141-142.
    More info
    Abstract: Recommender systems can provide valuable services in a digital library environment, as demonstrated by its commercial success in book, movie, and music industries. One of the most commonly-used and successful recommendation algorithms is collaborative filtering, which explores the correlations within user-item interactions to infer user interests and preferences. However, the recommendation quality of collaborative filtering approaches is greatly limited by the data sparsity problem. To alleviate this problem we have previously proposed graph-based algorithms to explore transitive user-item associations. In this paper, we extend the idea of analyzing user-item interactions as graphs and employ link prediction approaches proposed in the recent network modeling literature for making collaborative filtering recommendations. We have adapted a wide range of linkage measures for making recommendations. Our preliminary experimental results based on a book recommendation dataset show that some of these measures achieved significantly better performance than standard collaborative filtering algorithms. © 2005 ACM.
  • Jennifer, X., & Chen, H. (2005). CRIMINAL network analysis and visualization. Communications of the ACM, 48(6), 100-107.
    More info
    Abstract: The introduction of various technological solutions for uncovering terrorist networks to enhance public safety and national security is discussed. Under the first generation: manual approach, an analyst must first construct an association matrix by identifying criminal associations from raw data, based on which a link chart for visualization purposes can be drawn. Under the second generation: graphic-based approach, most existing network tools of which include Analysts' Notebook, Netmap, and XANALYS LINK Explorer, the tools automatically produce graphical representations of criminal networks. The third generation: social network analysis (SNA) approach is expected to provide more advanced analytical functionality to assist crime investigation.
  • Kaza, S., Marshall, B., Jennifer, X. u., Wang, A. G., Gowda, H., Atabakhsh, H., Petersen, T., Violette, C., & Chen, H. (2005). Border Safe: Cross-jurisdictional information sharing, analysis, and visualization. Lecture Notes in Computer Science, 3495, 669-670.
  • Kaza, S., Wang, T., Gowda, H., & Chen, H. (2005). Target vehicle identification for border safety using mutual information. IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, 2005, 1141-1146.
    More info
    Abstract: The security of border and transportation systems is a critical component of the national strategy for homeland security. The security concerns at the border are not independent of law enforcement in border-area jurisdictions because information known by local law enforcement agencies may provide valuable leads useful for securing the border and transportation infrastructure. The combined analysis of law enforcement information and data generated by vehicle license plate readers at the international borders can be used to identify suspicious vehicles at ports of entry. This not only generates better quality leads for border protection agents but may also serve to reduce wait times for commerce, vehicles, and people as they cross the border. In this paper we use the mutual information concept to identify vehicles that frequently cross the border with vehicles that are involved in criminal activity. We find that the mutual information measure can be used to identify vehicles that can be potentially targeted at the border. © 2005 IEEE.
  • Leroy, G., & Chen, H. (2005). Genescene: An ontology-enhanced integration of linguistic and Co-occurrence based relations in biomedical Texts. Journal of the American Society for Information Science and Technology, 56(5), 457-468.
    More info
    Abstract: The increasing amount of publicly available literature and experimental data in biomedicine makes it hard for biomedical researchers to stay up-to-date. Genescene is a toolkit that will help alleviate this problem by providing an overview of published literature content. We combined a linguistic parser with Concept Space, a co-occurrence based semantic net. Both techniques extract complementary biomedical relations between noun phrases from MEDLINE abstracts. The parser extracts precise and semantically rich relations from individual abstracts. Concept Space extracts relations that hold true for the collection of abstracts. The Gene Ontology, the Human Genome Nomenclature, and the Unified Medical Language System, are also integrated in Genescene. Currently, they are used to facilitate the integration of the two relation types, and to select the more interesting and high-quality relations for presentation. A user study focusing on p53 literature is discussed. All MEDLINE abstracts discussing p53 were processed in Genescene. Two researchers evaluated the terms and relations from several abstracts of interest to them. The results show that the terms were precise (precision 93%) and relevant, as were the parser relations (precision 95%). The Concept Space relations were more precise when selected with ontological knowledge (precision 78%) than without (60%). © 2005 Wiley Periodicals, Inc.
  • Li, J. J., Hua, S. u., & Chen, H. (2005). Optimal search-based gene selection for cancer prognosis. Association for Information Systems - 11th Americas Conference on Information Systems, AMCIS 2005: A Conference on a Human Scale, 6, 2672-2679.
    More info
    Abstract: Gene array data have been widely used for cancer diagnosis in recent years. However, high dimensionality has been a major problem for gene array-based classification. Gene selection is critical for accurate classification and for identifying the marker genes to discriminate different tumor types. This paper created a framework of gene selection methods based on previous studies. We focused on optimal search-based gene subset selection methods that evaluate the group performance of genes and help to pinpoint global optimal set of marker genes. Notably, this study is the first to introduce tabu search to gene selection from high dimensional gene array data. Experimental studies on several gene array datasets demonstrated the effectiveness of optimal search-based gene subset selection to identify marker genes.
  • Marshall, B., Hua, S. u., McDonald, D., & Chen, H. (2005). Linking ontological resources using aggregatable substance identifiers to organize extracted relations. Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005, 162-173.
    More info
    PMID: 15759623;Abstract: Systems that extract biological regulatory pathway relations from free-text sources are intended to help researchers leverage vast and growing collections of research literature. Several systems to extract such relations have been developed but little work has focused on how those relations can be usefully organized (aggregated) to support visualization systems or analysis algorithms. Ontological resources that enumerate name strings for different types of biomedical objects should play a key role in the organization process. In this paper we delineate five potentially useful levels of relational granularity and propose the use of aggregatable substance identifiers to help reduce lexical ambiguity. An aggregatable substance identifier applies to a gene and its products. We merged 4 extensive lexicons and compared the extracted strings to the text of five million MEDLINE abstracts. We report on the ambiguity within and between name strings and common English words. Our results show an 89% reduction in ambiguity for the extracted human substance name strings when using an aggregatable substance approach.
  • Marshall, B., Quiñones, K., Hua, S. u., Eggers, S., & Chen, H. (2005). Visualizing aggregated biological pathway relations. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 67-68.
    More info
    Abstract: The Genescene development team has constructed an aggregation interface for automatically-extracted biomedical pathway relations that is intended to help researchers identify and process relevant information from the vast digital library of abstracts found in the National Library of Medicine's PubMed collection. Users view extracted relations at various levels of relational granularity in an interactive and visual node-link interface. Anecdotal feedback reported here suggests that this multigranular visual paradigm aligns well with various research tasks, helping users find relevant articles and discover new information. Copyright 2005 ACM.
  • McDonald, D. M., Chen, H., & Schumaker, R. P. (2005). Transforming open-source documents to terror networks: The arizona terrornet. AAAI Spring Symposium - Technical Report, SS-05-01, 62-69.
    More info
    Abstract: Homeland security researchers and analysts more than ever must process large volumes of textual information. Information extraction techniques have been proposed to help alleviate the burden of information overload. Information extraction techniques, however, require retraining and/or knowledge re-engineering when document types vary as in the homeland security domain. Also, while effectively reducing the volume of the information, information extraction techniques do not point researchers to unanticipated interesting relationships identified within the text. We present the Arizona TerrorNet, a system that utilizes less specified information extraction rules to extract less choreographed relationships between known terrorists. Extracted relations are combined in a network and visualized using a network visualizer. We processed 200 unseen documents using the TerrorNet which extracted over 500 relationships between known terrorists. An Al Qaeda network expert made a preliminary inspection of the network and confirmed many of the network links.
  • Ong, T., Chen, H., Sung, W., & Zhu, B. (2005). Newsmap: A knowledge map for online news. Decision Support Systems, 39(4), 583-597.
    More info
    Abstract: Information technology has made possible the capture and accessing of a large number of data and knowledge bases, which in turn has brought about the problem of information overload. Text mining to turn textual information into knowledge has become a very active research area, but much of the research remains restricted to the English language. Due to the differences in linguistic characteristics and methods of natural language processing, many existing text analysis approaches have yet to be shown to be useful for the Chinese language. This research focuses on the automatic generation of a hierarchical knowledge map NewsMap, based on online Chinese news, particularly the finance and health sections. Whether in print or online, news still represents one important knowledge source that people produce and consume on a daily basis. The hierarchical knowledge map can be used as a tool for browsing business intelligence and medical knowledge hidden in news articles. In order to assess the quality of the map, an empirical study was conducted which shows that the categories of the hierarchical knowledge map generated by NewsMap are better than those generated by regular news readers, both in terms of recall and precision, on the sub-level categories but not on the top-level categories. NewsMap employs an improved interface combining a 1D alphabetical hierarchical list and a 2D Self-Organizing Map (SOM) island display. Another empirical study compared the two visualization displays and found that users' performances can be improved by taking advantage of the visual cues of the 2D SOM display. © 2004 Elsevier B.V. All rights reserved.
  • Qin, J., & Chen, H. (2005). Using genetic algorithm in building domain-specific collections: An experiment in the nanotechnology domain. Proceedings of the Annual Hawaii International Conference on System Sciences, 102-.
    More info
    Abstract: As the key technique to build domain-specific search engines, focused crawling has drawn a lot of attention from researchers in the past decade. However, as Web structure analysis techniques advance, several problems in traditional focused crawler design were revealed and they could result in domain-specific collections with low quality. In this work, we studied the problems of focused crawling that are caused by using local search algorithms. We also proposed to use a global search algorithm, the Genetic Algorithm, in focused crawling to address the problems. We conducted evaluation experiments to examine the effectiveness of our approach. The results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques. Furthermore, we used the concept of Web communities to evaluate how comprehensively the focused crawlers could traverse the Web search space, which could be a good complement to the traditional focused crawler evaluation methods.
  • Qin, J., Xu, J. J., Daning, H. u., Sageman, M., & Chen, H. (2005). Analyzing terrorist networks: A case study of the global salafi jihad network. Lecture Notes in Computer Science, 3495, 287-304.
    More info
    Abstract: It is very important for us to understand the functions and structures of terrorist networks to win the battle against terror. However, previous studies of terrorist network structure have generated little actionable results. This is mainly due to the difficulty in collecting and accessing reliable data and the lack of advanced network analysis methodologies in the field. To address these problems, we employed several advance network analysis techniques ranging from social network analysis to Web structural mining on a Global Salafi Jihad network dataset collected through a large scale empirical study. Our study demonstrated the effectiveness and usefulness of advanced network techniques in terrorist network analysis domain. We also introduced the Web structural mining technique into the terrorist network analysis field which, to the best our knowledge, has never been used in this domain. More importantly, the results from our analysis provide not only insights for terrorism research community but also empirical implications that may help law-reinforcement, intelligence, and security communities to make our nation safer. © Springer-Verlag Berlin Heidelberg 2005.
  • Qin, J., Zhou, Y., Lai, G., Reid, E., Sageman, M., & Chen, H. (2005). The dark web portal project: Collecting and analyzing the presence of terrorist groups on the web. Lecture Notes in Computer Science, 3495, 623-624.
  • Qin, J., Zhou, Y., Xu, J. J., & Chen, H. (2005). Studying the structure of terrorist networks: A web structural mining approach. Association for Information Systems - 11th Americas Conference on Information Systems, AMCIS 2005: A Conference on a Human Scale, 2, 523-530.
    More info
    Abstract: Because terrorist organizations often operate in network forms where individual terrorists collaborate with each other to carry out attacks, we could gain valuable knowledge about the terrorist organizations by studying structural properties of such terrorist networks. However, previous studies of terrorist network structure have generated little actionable results. This is due to the difficulty in collecting and accessing reliable data and the lack of advanced network analysis methodologies in the field. To address these problems, we introduced the Web structural mining technique into the terrorist network analysis field which, to the best our knowledge, has never been done before. We employed the proposed technique on a Global Salafi Jihad network dataset collected through a large scale empirical study. Results from our analysis not only provide insights for terrorism research community but also support decision making in law-reinforcement, intelligence, and security domains to make our nation safer.
  • Reid, E., & Chen, H. (2005). Mapping the contemporary terrorism research domain: Researchers, publications, and institutions analysis. Lecture Notes in Computer Science, 3495, 322-339.
    More info
    Abstract: The ability to map the contemporary terrorism research domain involves mining, analyzing, charting, and visualizing a research area according to experts, institutions, topics, publications, and social networks. As the increasing flood of new, diverse, and disorganized digital terrorism studies continues, the application of domain visualization techniques are increasingly critical for understanding the growth of scientific research, tracking the dynamics of the field, discovering potential new areas of research, and creating a big picture of the field's intellectual structure as well as challenges. In this paper, we present an overview of contemporary terrorism research by applying domain visualization techniques to the literature and author citation data from the years 1965 to 2003. The data were gathered from ten databases such as the ISI Web of Science then analyzed using an integrated knowledge mapping framework that includes selected techniques such as self-organizing map (SOM), content map analysis, and co-citation analysis. The analysis revealed (1) 42 key terrorism researchers and their institutional affiliations; (2) their influential publications; (3) a shift from focusing on terrorism as a low-intensity conflict to an emphasis on it as a strategic threat to world powers with increased focus on Osama Bin Laden; and (4) clusters of terrorism researchers who work in similar research areas as identified by co-citation and block-modeling maps. © Springer-Verlag Berlin Heidelberg 2005.
  • Reid, E., Qin, J., Zhou, Y., Lai, G., Sageman, M., Weimann, G., & Chen, H. (2005). Collecting and analyzing the presence of terrorists on the Web: A case study of Jihad Websites. Lecture Notes in Computer Science, 3495, 402-411.
    More info
    Abstract: The Internet which has enabled global businesses to flourish has become the very same channel for mushrooming 'terrorist news networks.' Terrorist organizations and their sympathizers have found a cost-effective resource to advance their courses by posting high-impact Websites with short shelf-lives. Because of their evanescent nature, terrorism research communities require unrestrained access to digitally archived Websites to mine their contents and pursue various types of analyses. However, organizations that specialize in capturing, archiving, and analyzing Jihad terrorist Websites employ different, manual-based analyses techniques that are inefficient and not scalable. This study proposes the development of automated or semi-automated procedures and systematic methodologies for capturing Jihad terrorist Website data and its subsequent analyses. By analyzing the content of hyperlinked terrorist Websites and constructing visual social network maps, our study is able to generate an integrated approach to the study of Jihad terrorism, their network structure, component clusters, and cluster affinity. © Springer-Verlag Berlin Heidelberg 2005.
  • Schumaker, R. P., Chen, H., Wang, T., & Wilkerson, J. (2005). Terror tracker system: A web portal for terrorism research. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 416-.
  • Schumaker, R., & Chen, H. (2005). Question answer TARA: A terrorism activity resource application. Lecture Notes in Computer Science, 3495, 619-620.
  • Wang, A. G., Atabakhsh, H., Petersen, T., & Chen, H. (2005). Discovering identity problems: A case study. Lecture Notes in Computer Science, 3495, 368-373.
    More info
    Abstract: Identity resolution is central to fighting against crime and terrorist activities in various ways. Current information systems and technologies deployed in law enforcement agencies are neither adequate nor effective for identity resolution. In this research we conducted a case study in a local police department on problems that produce difficulties in retrieving identity information. We found that more than half (55.5%) of the suspects had either a deceptive or an erroneous counterpart existing in the police system. About 30% of the suspects had used a false identity (i.e., intentional deception), while 42% had records alike due to various types of unintentional errors. We built a taxonomy of identity problems based on our findings. © Springer-Verlag Berlin Heidelberg 2005.
  • Xiang, Y., Chau, M., Atabakhsh, H., & Chen, H. (2005). Visualizing criminal relationships: Comparison of a hyperbolic tree and a hierarchical list. Decision Support Systems, 41(1), 69-83.
    More info
    Abstract: In crime analysis, law enforcement officials have to process a large amount of criminal data and figure out their relationships. It is important to identify different associations among criminal entities. In this paper, we propose the use of a hyperbolic tree view and a hierarchical list view to visualize criminal relationships. A prototype system called COPLINK Criminal Relationship Visualizer was developed. An experiment was conducted to test the effectiveness and the efficiency of the two views. The results show that the hyperbolic tree view is more effective for an "identify" task and more efficient for an "associate" task. The participants generally thought it was easier to use the hierarchical list, with which they were more familiar. When asked about the usefulness of the two views, about half of the participants thought that the hyperbolic tree was more useful, while the other half thought otherwise. Our results indicate that both views can help in criminal relationship visualization. While the hyperbolic tree view performs better in some tasks, the users' experiences and preferences will impact the decision on choosing the visualization technique. © 2004 Elsevier B.V. All rights reserved.
  • Xu, J. J., & Chen, H. (2005). CrimeNet explorer: A framework for criminal network knowledge discovery. ACM Transactions on Information Systems, 23(2), 201-226.
    More info
    Abstract: Knowledge about the structure and organization of criminal networks is important for both crime investigation and the development of effective strategies to prevent crimes. However, except for network visualization, criminal network analysis remains primarily a manual process. Existing tools do not provide advanced structural analysis techniques that allow extraction of network knowledge from large volumes of criminal-justice data. To help law enforcement and intelligence agencies discover criminal network knowledge efficiently and effectively, in this research we proposed a framework for automated network analysis and visualization. The framework included four stages: network creation, network partition, structural analysis, and network visualization. Based upon it, we have developed a system called CrimeNet Explorer that incorporates several advanced techniques: a concept space approach, hierarchical clustering, social network analysis methods, and multidimensional scaling. Results from controlled experiments involving student subjects demonstrated that our system could achieve higher clustering recall and precision than did untrained subjects when detecting subgroups from criminal networks. Moreover, subjects identified central members and interaction patterns between groups significantly faster with the help of structural analysis functionality than with only visualization functionality. No significant gain in effectiveness was present, however. Our domain experts also reported that they believed CrimeNet Explorer could be very useful in crime investigation. © 2005 ACM.
  • Zhang, P., Sun, J., & Chen, H. (2005). Frame-based argumentation for group decision task generation and identification. Decision Support Systems, 39(4), 643-659.
    More info
    Abstract: One of the most important stages of group decision-making is the generation and identification of decision tasks. In this paper, we define a decision task with five elements: decision makers, decision executors, decision objectives, decision problems and decision constrains. Based on this distinction, we present a conceptual model for generation and identification of group decision tasks in an organization. In addition, we describe a prototype of a group argumentation support system (GASS) that applies frame-based information structure in electronic brainstorming (EBS) and argumentation to support group decision task generation and identification. Using four group performance indicators, the prototype was evaluated in a lab experiment to determine its effectiveness and efficiency. © 2004 Elsevier B.V. All rights reserved.
  • Zhou, Y., Qin, J., Chen, H., & Nunamaker, J. F. (2005). Multilingual Web retrieval: An experiment on a multilingual business intelligence portal. Proceedings of the Annual Hawaii International Conference on System Sciences, 43-.
    More info
    Abstract: The amount of non-English information on the Web has proliferated so rapidly in recent years that it often is difficult for a user to retrieve documents in an unfamiliar language. In this study, we report the design and evaluation of a multilingual Web portal in the business domain in English, Chinese, Japanese, Spanish, and German. Web pages relevant to the domain were collected. Search queries were translated using bilingual dictionaries, while phrasal translation and co-occurrence analysis were used for query translation disambiguation. Pivot translations were also used for language-pairs where bilingual dictionaries were not available. A user evaluation study showed that on average, multilingual performance achieved 72.99% of monolingual performance. In evaluating pivot translation, we found that it achieved 40% performance of monolingual retrieval, which was not as good as direct translation. Overall, our results are encouraging and show promise of successful application of MLIR techniques to Web retrieval.
  • Zhou, Y., Qin, J., Lai, G., Reid, E., & Chen, H. (2005). Building knowledge management system for researching terrorist groups on the web. Association for Information Systems - 11th Americas Conference on Information Systems, AMCIS 2005: A Conference on a Human Scale, 5, 2524-2536.
    More info
    Abstract: Nowadays, terrorist organizations have found a cost-effective resource to advance their courses by posting high-impact Web sites on the Internet. This alternate side of the Web is referred to as the "Dark Web." While counterterrorism researchers seek to obtain and analyze information from the Dark Web, several problems prevent effective and efficient knowledge discovery: the dynamic and hidden character of terrorist Web sites, information overload, and language barrier problems. This study proposes an intelligent knowledge management system to support the discovery and analysis of multilingual terrorist-created Web data. We developed a systematic approach to identify, collect and store up-to-date multilingual terrorist Web data. We also propose to build an intelligent Web-based knowledge portal integrated with advanced text and Web mining techniques such as summarization, categorization and cross-lingual retrieval to facilitate the knowledge discovery from Dark Web resources. We believe our knowledge portal provide counterterrorism research communities with valuable datasets and tools in knowledge discovery and sharing.
  • Zhou, Y., Qin, J., Reid, E., Lai, G., & Chen, H. (2005). Studying the presence of terrorism on the web: A knowledge portal approach. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 402-.
  • Zhu, B., & Chen, H. (2005). Information visualization. Annual Review of Information Science and Technology, 39, 139-177.
  • Zhu, B., & Chen, H. (2005). Using 3D interfaces to facilitate the spatial knowledge retrieval: A geo-referenced knowledge repository system. Decision Support Systems, 40(2), 167-182.
    More info
    Abstract: Retrieving knowledge from a knowledge repository includes both the process of finding information of interest and the process of converting incoming information to a person's own knowledge. This paper explores the application of 3D interfaces in supporting the retrieval of spatial knowledge by presenting the development and the evaluation of a geo-referenced knowledge repository system. As computer screen is crowded with high volume of information available, 3D interface becomes a promising candidate to better use the screen space. A 3D interface is also more similar to the 3D terrain surface it represents than its 2D counterpart. However, almost all previous empirical studies did not find any supportive evidence for the application of 3D interface. Realizing that those studies required users to observe the 3D object from a given perspective by providing one static interface, we developed 3D interfaces with interactive animation, which allows users to control how a visual object should be displayed. The empirical study demonstrated that this is a promising approach to facilitate the spatial knowledge retrieval. © 2004 Elsevier B.V. All rights reserved.
  • Atabakhsh, H., Larson, C., Petersen, T., Violette, C., & Chen, H. (2004). Information sharing and collaboration policies within government agencies. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3073, 467-475.
    More info
    Abstract: This paper describes the necessity for government agencies to share data as well as obstacles to overcome in order to achieve information sharing. We study two domains: law enforcement and disease informatics. Some of the ways in which we were able to overcome the obstacles, such as data security and privacy issues, are explained. We conclude by highlighting the lessons learned while working towards our goals. © Springer-Verlag Berlin Heidelberg 2004.
  • Chau, M., & Chen, H. (2004). Using content-based and link-based analysis in building vertical search engines. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3334, 515-518.
    More info
    Abstract: This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches. © Springer-Verlag Berlin Heidelberg 2004.
  • Chen, H. (2004). Biomedical informatics and security informatics research in digital library. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3334, 01-12.
    More info
    Abstract: The Internet is changing the way we live and do business. It offers a tremendous opportunity for libraries, governments, and businesses to better deliver its contents and services and interact with its many constituents. After ten years of active research, there appears to be a need towards advancing the science of "informatics" in digital library, especially in several non-traditional but critical application areas. In this paper, we introduce two promising informatics research areas for digital library researchers, namely, Biomedical Informatics and Security Informatics. We discuss some common research elements between these two areas and present several case studies that aim to highlight the relevance and importance of such research in digital library. © Springer-Verlag Berlin Heidelberg 2004.
  • Chen, H. (2004). Digital library research in the US: An overview with a knowledge management perspective. Program, 38(3), 157-167.
    More info
    Abstract: The provision of information resources and services is now readily available online via digital libraries furnished by a wide variety of information providers. Information is no longer just text and pictures, and is now available in a wide variety of multimedia formats. Digital libraries represent a new form of information technology in which content management, service delivery and social impact matter as much as technological advancement. In addition, for digital library researchers there is a need to transform information access to knowledge creation and management. Based on research in the USA in the Digital Libraries Initiative and the National Science Digital Library programmes, a review is provided of significant past and emerging digital library research activities, and research based on new knowledge management concepts and technologies is suggested.
  • Chen, H., & Chau, M. (2004). Web Mining: Machine Learning for Web Applications. Annual Review of Information Science and Technology, 38, 289-329+xvii-xviii.
  • Chen, H., Chung, W., Xu, J. J., Wang, G., Qin, Y., & Chau, M. (2004). Crime data mining: A general framework and some examples. Computer, 37(4), 50-56.
    More info
    Abstract: A general framework for crime data mining that draws on experience gained with the Coplink project at the University of Arizona is presented. By increasing efficiency and reducing errors, this scheme facilitates police work and enables investigators to allocate their time to other valuable tasks.
  • Chung, W., Zhang, Y., Huang, Z., Wang, G., Ong, T., & Chen, H. (2004). Internet searching and browsing in a multilingual world: An experiment on the Chinese business intelligence portal (CBizPort). Journal of the American Society for Information Science and Technology, 55(9), 818-831.
    More info
    Abstract: The rapid growth of the non-English-speaking Internet population has created a need for better searching and browsing capabilities in languages other than English. However, existing search engines may not serve the needs of many non-English-speaking Internet users. In this paper, we propose a generic and integrated approach to searching and browsing the Internet in a multilingual world. Based on this approach, we have developed the Chinese Business Intelligence Portal (CBizPort), a meta-search engine that searches for business information of mainland China, Taiwan, and Hong Kong. Additional functions provided by CBizPort include encoding conversion (between Simplified Chinese and Traditional Chinese), summarization, and categorization. Experimental results of our user evaluation study show that the searching and browsing performance of CBizPort was comparable to that of regional Chinese search engines, and CBizPort could significantly augment these search engines. Subjects' verbal comments indicate that CBizPort performed best in terms of analysis functions, cross-regional searching, and user-friendliness, whereas regional search engines were more efficient and more popular. Subjects especially liked CBizPort's summarizer and categorizer, which helped in understanding search results. These encouraging results suggest a promising future of our approach to Internet searching and browsing in a multilingual world.
  • Huang, Z., Chen, H., Chen, Z., & Roco, M. C. (2004). International nanotechnology development in 2003: Country, institution, and technology field analysis based on USPTO patent database. Journal of Nanoparticle Research, 6(4), 325-354.
    More info
    Abstract: Nanoscale science and engineering (NSE) have seen rapid growth and expansion in new areas in recent years. This paper provides an international patent analysis using the U.S. Patent and Trademark Office (USPTO) data searched by keywords of the entire text: title, abstract, claims, and specifications. A fraction of these patents fully satisfy the National Nanotechnology Initiative definition of nanotechnology (which requires exploiting specific phenomena and direct manipulation at the nanoscale), while others only make use of NSE tools and methods of investigation. In previous work we proposed an integrated patent analysis and visualization framework of patent content mapping for the NSE field and of knowledge flow pattern identification until 2002. In this paper, the results are updated for 2003, and the new trends are presented. The number of USPTO patents originated from all countries that include nanotechnology-related keywords in 2003 is about 8600, an increase of about 50% over the last 3 years, which is significantly larger than the increase of about 4% for patents in all technology fields (USPTO, 2004). The top five countries are U.S. (5228 patents in 2004), Japan (926), Germany (684), Canada (244) and France (183). Fastest growing are the Republic of Korea (84 patents in 2003) and Netherlands (81). For the first time in 2003, four electronic companies have reached the top five institutions: IBM (198 patents), Micron Technologies (129), Advanced Micro Devices (128), Intel (90) and University of California (89). However, overall, the single technology field "Chemistry: molecular biology and microbiology" and chemical industry remain in the lead. The citation networks show an increase of international interactions, and a relative change of the role of various countries, institutions and technological fields in time.
  • Huang, Z., Chen, H., Guo, F., Xu, J. J., Soushan, W. u., & Chen, W. (2004). Visualizing the expertise space. Proceedings of the Hawaii International Conference on System Sciences, 37, 585-594.
    More info
    Abstract: Expertise management systems are being widely adopted in organizations to manage tacit knowledge embedded in employees' heads. These systems have successfully applied many information technologies developed in fields such as information retrieval and document management to support expertise information collection, processing, and distribution. In this paper, we investigate the potentials of applying visualization techniques to support exploration of an expertise space. We implemented two widely applied dimensionality reduction visualization techniques, the self-organizing map and multidimensional scaling, to generate expert map and expertise field map visualizations based on an expertise data set. Our proposed approach is generic for automatic mapping of expertise space of an organization, research field, scientific domain, etc. Our initial analysis on the visualization results indicated that the expert map and expertise field map captured useful underlying structures of the expertise space and had the potential to support more efficient and effective expertise information searching and browsing.
  • Huang, Z., Chen, H., Hsu, C., Chen, W., & Soushan, W. u. (2004). Credit rating analysis with support vector machines and neural networks: A market comparative study. Decision Support Systems, 37(4), 543-558.
    More info
    Abstract: Corporate credit rating analysis has attracted lots of research interests in the literature. Recent studies have shown that Artificial Intelligence (AI) methods achieved better performance than traditional statistical methods. This article introduces a relatively new machine learning technique, support vector machines (SVM), to the problem in attempt to provide a model with better explanatory power. We used backpropagation neural network (BNN) as a benchmark and obtained prediction accuracy around 80% for both BNN and SVM methods for the United States and Taiwan markets. However, only slight improvement of SVM was observed. Another direction of the research is to improve the interpretability of the AI-based models. We applied recent research results in neural network model interpretation and obtained relative importance of the input financial variables from the neural network models. Based on these results, we conducted a market comparative analysis on the differences of determining factors in the United States and Taiwan markets. © 2003 Elsevier B.V. All rights reserved.
  • Huang, Z., Chung, W., & Chen, H. (2004). A graph model for e-commerce recommender systems. Journal of the American Society for Information Science and Technology, 55(3), 259-274.
    More info
    Abstract: Information overload on the Web has created enormous challenges to customers selecting products for online purchases and to online businesses attempting to identify customer's preferences efficiently. Various recommender systems employing different data representations and recommendation methods are currently used to address these challenges. In this research, we developed a graph model that provides a generic data representation and can support different recommendation methods. To demonstrate its usefulness and flexibility, we developed three recommendation methods: direct retrieval, association mining, and high-degree association retrieval. We used a data set from an online bookstore as our research test-bed. Evaluation results showed that combining product content information and historical customer transaction information achieved more accurate predictions and relevant recommendations than using only collaborative information. However, comparisons among different methods showed that high-degree association retrieval did not perform significantly better than the association mining method or the direct retrieval method in our test-bed.
  • Jennifer, X. u., Marshall, B., Kaza, S., & Chen, H. (2004). Analyzing and visualizing criminal network dynamics: A case study. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3073, 359-377.
    More info
    Abstract: Dynamic criminal network analysis is important for national security but also very challenging. However, little research has been done in this area. In this paper we propose to use several descriptive measures from social network analysis research to help detect and describe changes in criminal organizations. These measures include centrality for individuals, and density, cohesion, and stability for groups. We also employ visualization and animation methods to present the evolution process of criminal networks. We conducted a field study with several domain experts to validate our findings from the analysis of the dynamics of a narcotics network. The feedback from our domain experts showed that our approaches and the prototype system could be very helpful for capturing the dynamics of criminal organizations and assisting crime investigation and criminal prosecution. © Springer-Verlag Berlin Heidelberg 2004.
  • Lim, E., Chen, H., Neuhold, E. J., Sugimoto, S., & Jianzhong, L. i. (2004). Introduction to the journal on digital libraries special issue on Asian digital libraries. International Journal on Digital Libraries, 4(4), 245-246.
  • Lin, C., Hu, P. J., & Chen, H. (2004). Technology Implementation Management in Law Enforcement: COPLINK System Usability and User Acceptance Evaluations. Social Science Computer Review, 22(1), 24-36.
    More info
    Abstract: Increasingly, government agencies are facing the challenge of effective implementation of information technologies critical to their digital government programs and initiatives. This article reports two user-centric evaluation studies of COPLINK, an integrated knowledge management system that supports and enhances law enforcement officers' crime-fighting activities. Specifically, the evaluations concentrate on system usability and user acceptance in the law enforcement setting. The article describes the study designs, highlights the analysis results, and discusses their implications for digital government research and practices. Findings from these studies provide valuable insights into digital government system evaluation and, at the same time, shed light on how government agencies can design adequate management interventions to foster technology acceptance and use.
  • Lin, M., Chau, M., Nunamaker Jr., J. F., & Chen, H. (2004). Segmentation of lecture videos based on text: A method combining multiple linguistic features. Proceedings of the Hawaii International Conference on System Sciences, 37, 23-32.
    More info
    Abstract: In multimedia-based e-Learning systems, there are strong needs for segmenting lecture videos into topic units in order to organize the videos for browsing and to provide search capability. Automatic segmentation is highly desired because of the high cost of manual segmentation. While a lot of research has been conducted on topic segmentation of transcribed spoken text, most attempts rely on domain-specific cues and formal presentation format, and require extensive training; none of these features exist in lecture videos with unscripted and spontaneous speech. In addition, lecture videos usually have few scene changes, which implies that the visual information that most video segmentation methods rely on is not available. Furthermore, even when there are scene changes, they do not match with the topic transitions. In this paper, we make use of the transcribed speech text extracted from the audio track of video to segment lecture videos into topics. We review related research and propose a new segmentation approach. Our approach utilizes features such as noun phrases and combines multiple content-based and discourse-based features. Our preliminary results show that the noun phrases are salient features and the combination of multiple features is promising to improve segmentation accuracy.
  • Marshall, B., Kaza, S., Jennifer, X. u., Atabakhsh, H., Petersen, T., Violette, C., & Chen, H. (2004). Cross-jurisdictional Criminal Activity Networks to support border and transportation security. IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, 100-105.
    More info
    Abstract: Border and transportation security is a critical part of the Department of Homeland Security's (DHS) national strategy. DHS strategy calls for the creation of "smart borders" where information from local, state, federal, and international sources can be combined to support risk-based management tools for border-management agencies. This paper proposes a framework for effectively integrating such data to create cross-jurisdictional Criminal Activity Networks (CAN)s. Using the approach outlined in the framework, we created a CAN system as part of the DHS-funded BorderSafe project This paper describes the system, reports on feedback received from investigating officers, and highlights key issues and challenges.
  • Marshall, B., McDonald, D., Chen, H., & Chung, W. (2004). EBizPort: Collecting and analyzing business intelligence information. Journal of the American Society for Information Science and Technology, 55(10), 873-891.
    More info
    Abstract: To make good decisions, businesses try to gather good intelligence information. Yet managing and processing a large amount of unstructured information and data stand in the way of greater business knowledge. An effective business intelligence tool must be able to access quality information from a variety of sources in a variety of forms, and it must support people as they search for and analyze that information. The EBizPort system was designed to address information needs for the business/IT community. EBizPort's collection-building process is designed to acquire credible, timely, and relevant information. The user interface provides access to collected and metasearched resources using innovative tools for summarization, categorization, and visualization. The effectiveness, efficiency, usability, and information quality of the EBizPort system were measured. EBizPort significantly outperformed Brint, a business search portal, in search effectiveness, information quality, user satisfaction, and usability. Users particularly liked EBizPort's clean and user-friendly interface. Results from our evaluation study suggest that the visualization function added value to the search and analysis process, that the generalizable collection-building technique can be useful for domain-specific information searching on the Web, and that the search interface was important for Web search and browse support.
  • McDonald, D. M., Chen, H., Hua, S. u., & Marshall, B. B. (2004). Extracting gene pathway relations using a hybrid grammar: The Arizona Relation Parser. Bioinformatics, 20(18), 3370-3378.
    More info
    PMID: 15256411;Abstract: Motivation: Text-mining research in the biomedical domain has been motivated by the rapid growth of new research findings. Improving the accessibility of findings has potential to speed hypothesis generation. Results: We present the Arizona Relation Parser that differs from other parsers in its use of a broad coverage syntaxsemantic hybrid grammar. While syntax grammars have generally been tested over more documents, semantic grammars have outperformed them in precision and recall. We combined access to syntax and semantic information from a single grammar. The parser was trained using 40 PubMed abstracts and then tested using 100 unseen abstracts, half for precision and half for recall. Expert evaluation showed that the parser extracted biologically relevant relations with 89% precision. Recall of expert identified relations with semantic filtering was 35 and 61% before semantic filtering. Such results approach the higher-performing semantic parsers. However, the AZ parser was tested over a greater variety of writing styles and semantic content. © Oxford University Press 2004; all rights reserved.
  • Reid, E., Qin, J., Chung, W., Jennifer, X. u., Zhou, Y., Schumaker, R., Sageman, M., & Chen, H. (2004). Terrorism Knowledge Discovery Project: A knowledge discovery approach to addressing the threats of terrorism. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3073, 125-145.
    More info
    Abstract: Ever since the 9-11 incident, the multidisciplinary field of terrorism has experienced tremendous growth. As the domain has benefited greatly from recent advances in information technologies, more complex and challenging new issues have emerged from numerous counter-terrorism-related research communities as well as governments of all levels. In this paper, we describe an advanced knowledge discovery approach to addressing terrorism threats. We experimented with our approach in a project called Terrorism Knowledge Discovery Project that consists of several custom-built knowledge portals. The main focus of this project is to provide advanced methodologies for analyzing terrorism research, terrorists, and the terrorized groups (victims). Once completed, the system can also become a major learning resource and tool that the general community can use to heighten their awareness and understanding of global terrorism phenomenon, to learn how best they can respond to terrorism and, eventually, to garner significant grass root support for the government's efforts to keep America safe. © Springer-Verlag Berlin Heidelberg 2004.
  • Wang, G., Chen, H., & Atabakhsh, H. (2004). Automatically detecting deceptive criminal identities. Communications of the ACM, 47(3), 70-76.
    More info
    Abstract: The uncovering patterns of criminal identity deception based on actual criminal records and algorithmic approach to reveal deceptive identities are discussed. The testing results shows that no false positive errors occurs which shows the effectiveness of the algorithm. The errors occurs in the false negative category in which unrelated suspects are recognized as being related. The threshold value is set to capture maximum possible true similar records. Adaptive threshold is required for making an automated process in the future research.
  • Wang, G., Chen, H., & Atabakhsh, H. (2004). Criminal identity deception and deception detection in law enforcement. Group Decision and Negotiation, 13(2), 111-127.
    More info
    Abstract: Criminals often falsify their identities intentionally in order to deter police investigations. In this paper we focus on uncovering patterns of criminal identity deception observed through a case study performed at a local law enforcement agency. We define criminal identity deception based on an understanding of the various theories of deception. We interview a police detective expert and discuss the characteristics of criminal identity deception. A taxonomy for criminal identity deception was built to represent the different patterns that were identified in the case study. We also discuss methods currently employed by law enforcement agencies to detect deception. Police database systems contain little information that can help reveal deceptive identities. Thus, in order to identify deception, police officers rely mainly on investigation. Current methods for detecting deceptive criminal identities are neither effective nor efficient. Therefore we propose an automated solution to help solve this problem.
  • Xu, J. J., & Chen, H. (2004). Fighting organized crimes: Using shortest-path algorithms to identify associations in criminal networks. Decision Support Systems, 38(3), 473-487.
    More info
    Abstract: Effective and efficient link analysis techniques are needed to help law enforcement and intelligence agencies fight organized crimes such as narcotics violation, terrorism, and kidnapping. In this paper, we propose a link analysis technique that uses shortest-path algorithms, priority-first-search (PFS) and two-tree PFS, to identify the strongest association paths between entities in a criminal network. To evaluate effectiveness, we compared the PFS algorithms with crime investigators' typical association-search approach, as represented by a modified breadth-first-search (BFS). Our domain expert considered the association paths identified by PFS algorithms to be useful about 70% of the time, whereas the modified BFS algorithm's precision rates were only 30% for a kidnapping network and 16.7% for a narcotics network. Efficiency of the two-tree PFS was better for a small, dense kidnapping network, and the PFS was better for the large, sparse narcotics network. © Published by Elsevier B.V.
  • Buetow, T., Chaboya, L., O'Toole, C., Cushna, T., Daspit, D., Petersen, T., Atabakhsh, H., & Chen, H. (2003). A spatio temporal visualizer for law enforcement. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2665, 181-194.
    More info
    Abstract: Analysis of crime data has long been a labor-intensive effort. Crime analysts are required to query numerous databases and sort through results manually. To alleviate this, we have integrated three different visualization techniques into one application called the Spatio Temporal Visualizer (STV). STV includes three views: a timeline; a periodic display; and a Geographic Information System (GIS). This allows for the dynamic exploration of criminal data and provides a visualization tool for our ongoing COPLINK project. This paper describes STV, its various components, and some of the lessons learned through interviews with target users at the Tucson Police Department. © Springer-Verlag Berlin Heidelberg 2003.
  • Chang, W., Chung, W., Chen, H., & Chou, S. (2003). An international perspective on fighting cybercrime. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2665, 379-384.
    More info
    Abstract: Cybercrime is becoming ever more serious. Findings from the 2002 Computer Crime and Security Survey show an upward trend that demonstrates a need for a timely review of existing approaches to fighting this new phenomenon in the information age. In this paper, we provide an overview of cybercrime and present an international perspective on fighting cybercrime. We review current status of fighting cybercrime in different countries, which rely on legal, organizational, and technological approaches, and recommend four directions for governments, lawmakers, intelligence and law enforcement agencies, and researchers to combat cybercrime. © Springer-Verlag Berlin Heidelberg 2003.
  • Chau, M., & Chen, H. (2003). Comparison of three vertical search spiders. Computer, 36(5), 56-62+4.
    More info
    Abstract: The Web's dynamic,.unstructured nature makes locating resources difficult. Vertical search engines solve part of the problem by keeping indexes only in specific domains. They also offer more opportunity to apply domain knowledge in the spider applications that collect content for their databases. The authors used three approaches to investigate algorithms for improving the performance of vertical search engine spiders: a breadth-first graph-traversal algorithm with no heuristics to refine the search process, a best-first traversal algorithm that uses a hyperlink-analysis heuristic, and a spreading-activation algorithm based on modeling the Web as a neural network.
  • Chau, M., Huang, Z., & Chen, H. (2003). Teaching key topics in computer science and information systems through a web search engine project. ACM Journal on Educational Resources in Computing, 3(3).
    More info
    Abstract: Advances in computer and Internet technologies have made it more and more important for information technology professionals to acquire experience in a variety of aspects, including new technologies, system integration, database administration, and project management. To provide students with a chance to acquire such skills, we designed a project called "Build Your Search Engine in 90 Days," in which students were required to build a domain-specific Web search engine in a semester. In this paper we review the tools and resources available to students and report our experiences in having students to work on this project in a course at the University of Arizona. We also review two tools, called AI Spider and AI Indexer, we developed for students in this project. We highlight a few search engines that were created by the students and suggest some future directions in improving the tools and expanding the project.
  • Chen, H. (2003). Digital Government: Technologies and practices. Decision Support Systems, 34(3), 223-227.
  • Chen, H. (2003). Introduction to the JASIST special topic section on web retrieval and mining: A machine learning perspective. Journal of the American Society for Information Science and Technology, 54(7), 621-624.
  • Chen, H. (2003). Special issue: "Web retrieval and mining". Decision Support Systems, 35(1), 1-5.
  • Chen, H., Lally, A. M., Zhu, B., & Chau, M. (2003). HelpfulMed: Intelligent searching for medical information over the Internet. Journal of the American Society for Information Science and Technology, 54(7), 683-694.
    More info
    Abstract: Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with fine-grained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders on a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
  • Chen, H., Schroeder, J., Hauck, R. V., Ridgeway, L., Atabakhsh, H., Gupta, H., Boarman, C., Rasmussen, K., & Clements, A. W. (2003). COPLINK connect: Information and knowledge management for law enforcement. Decision Support Systems, 34(3), 271-285.
    More info
    Abstract: Information and knowledge management in a knowledge-intensive and time-critical environment presents a challenge to information technology professionals. In law enforcement, multiple data sources are used, each having different user interfaces. COPLINK Connect addresses these problems by providing one easy-to-use interface that integrates different data sources such as incident records, mug shots and gang information, and allows diverse police departments to share data easily. User evaluations of the application allowed us to study the impact of COPLINK on law-enforcement personnel as well as to identify requirements for improving the system. COPLINK Connect is currently being deployed at Tucson Police Department (TPD). © 2002 Elsevier Science B.V. All rights reserved.
  • Hu, P. J., Lin, C., & Chen, H. (2003). Examining technology acceptance by individual law enforcement officers: An exploratory study. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2665, 209-222.
    More info
    Abstract: Management of technology implementation has been a critical challenge to organizations, public or private. In particular, user acceptance is paramount to the ultimate success of a newly implemented technology in adopting organizations. This study examined acceptance of COPLINK, a suite of IT applications designed to support law enforcement officers' analyses of criminal activities. We developed a factor model that explains or predicts individual officers' acceptance decision-making and empirically tested this model using a survey study that involved more than 280 police officers. Overall, our model shows a reasonably good fit to officers' acceptance assessments and exhibits satisfactory explanatory power. Our analysis suggests a prominent core influence path from efficiency gain to perceived usefulness and then to intention to accept. Subjective norm also appears to have a significant effect on user acceptance through the mediation of perceived usefulness. Several managerial implications derived from our study findings are also discussed. © Springer-Verlag Berlin Heidelberg 2003.
  • Huang, Z., Chen, H., Yip, A., Gavin, N. g., Guo, F., Chen, Z., & Roco, M. C. (2003). Longitudinal patent analysis for nanoscale science and engineering: Country, institution and technology field. Journal of Nanoparticle Research, 5(3-4), 333-363.
    More info
    Abstract: Nanoscale science and engineering (NSE) and related areas have seen rapid growth in recent years. The speed and scope of development in the field have made it essential for researchers to be informed on the progress across different laboratories, companies, industries and countries. In this project, we experimented with several analysis and visualization techniques on NSE-related United States patent documents to support various knowledge tasks. This paper presents results on the basic analysis of nanotechnology patents between 1976 and 2002, content map analysis and citation network analysis. The data have been obtained on individual countries, institutions and technology fields. The top 10 countries with the largest number of nanotechnology patents are the United States, Japan, France, the United Kingdom, Taiwan, Korea, the Netherlands, Switzerland, Italy and Australia. The fastest growth in the last 5 years has been in chemical and pharmaceutical fields, followed by semiconductor devices. The results demonstrate potential of information-based discovery and visualization technologies to capture knowledge regarding nanotechnology performance, transfer of knowledge and trends of development through analyzing the patent documents.
  • Jennifer, X. u., & Chen, H. (2003). Untangling criminal networks: A Case study. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2665, 232-248.
    More info
    Abstract: Knowledge about criminal networks has important implications for crime investigation and the anti-terrorism campaign. However, lack of advanced, automated techniques has limited law enforcement and intelligence agencies' ability to combat crime by discovering structural patterns in criminal networks. In this research we used the concept space approach, clustering technology, social network analysis measures and approaches, and multidimensional scaling methods for automatic extraction, analysis, and visualization of criminal networks and their structural patterns. We conducted a case study with crime investigators from the Tucson Police Department. They validated the structural patterns discovered from gang and narcotics criminal enterprises. The results showed that the approaches we proposed could detect subgroups, central members, and between-group interaction patterns correctly most of the time. Moreover, our system could extract the overall structure for a network that might be useful in the development of effective disruptive strategies for criminal networks. © Springer-Verlag Berlin Heidelberg 2003.
  • Leroy, G., Chen, H., & Martinez, J. D. (2003). A shallow parser based on closed-class words to capture relations in biomedical text. Journal of Biomedical Informatics, 36(3), 145-158.
    More info
    PMID: 14615225;Abstract: Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun phrases automatically from free text has been developed and evaluated. It uses heuristics and a noun phraser to capture entities of interest in the text. Cascaded finite state automata structure the relations between individual entities. The automata are based on closed-class English words and model generic relations not limited to specific words. The parser also recognizes coordinating conjunctions and captures negation in text, a feature usually ignored by others. Three cancer researchers evaluated 330 relations extracted from 26 abstracts of interest to them. There were 296 relations correctly extracted from the abstracts resulting in 90% precision of the relations and an average of 11 correct relations per abstract. © 2003 Elsevier Inc. All rights reserved.
  • Leroy, G., Lally, A. M., & Chen, H. (2003). The use of dynamic contexts to improve casual Internet searching. ACM Transactions on Information Systems, 21(3), 229-253.
    More info
    Abstract: Research has shown that most users' online information searches are suboptimal. Query optimization based on a relevance feedback or genetic algorithm using dynamic query contexts can help casual users search the Internet. These algorithms can draw on implicit user feedback based on the surrounding links and text in a search engine result set to expand user queries with a variable number of keywords in two manners. Positive expansion adds terms to a user's keywords with a Boolean "and," negative expansion adds terms to the user's keywords with a Boolean "not." Each algorithm was examined for three user groups, high, middle, and low achievers, who were classified according to their overall performance. The interactions of users with different levels of expertise with different expansion types or algorithms were evaluated. The genetic algorithm with negative expansion tripled recall and doubled precision for low achievers, but high achievers displayed an opposed trend and seemed to be hindered in this condition. The effect of other conditions was less substantial.
  • Qin, J., Zhou, Y., Chau, M., & Chen, H. (2003). Supporting multilingual information retrieval in Web applications: An English-Chinese Web portal experiment. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2911, 149-152.
    More info
    Abstract: Cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) techniques have been widely studied, but they are not often applied to and evaluated for Web applications. In this paper, we present our research in developing and evaluating a multilingual English-Chinese Web portal in the business domain. A dictionary-based approach has been adopted that combines phrasal translation, co-occurrence analysis, and pre- and post-translation query expansion. The approach was evaluated by domain experts and the results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision when compared with simple word-by-word translation. © Springer-Verlag Berlin Heidelberg 2003.
  • Romano Jr., N. C., Donovan, C., Chen, H., & Nunamaker Jr., J. F. (2003). A methodology for analyzing Web-based qualitative data. Journal of Management Information Systems, 19(4), 213-246.
    More info
    Abstract: The volume of qualitative data (QD) available via the Internet is growing at an increasing pace and firms are anxious to extract and understand users' thought processes, wants and needs, attitudes, and purchase intentions contained therein. An information systems (IS) methodology to meaningfully analyze this vast resource of QD could provide useful information, knowledge, or wisdom firms could use for a number of purposes including new product development and quality improvement, target marketing, accurate "user-focused" profiling, and future sales prediction. In this paper, we present an IS methodology for analysis of Internet-based QD consisting of three steps: elicitation; reduction through IS-facilitated selection, coding, and clustering; and visualization to provide at-a-glance understanding. Outcomes include information (relationships), knowledge (patterns), and wisdom (principles) explained through visualizations and drill-down capabilities. First we present the generic methodology and then discuss an example employing it to analyze free-form comments from potential consumers who viewed soon-to-be-released film trailers provided that illustrates how the methodology and tools can provide rich and meaningful affective, cognitive, contextual, and evaluative information, knowledge, and wisdom. The example revealed that qualitative data analysis (QDA) accurately reflected film popularity. A finding is that QDA also provided a predictive measure of relative magnitude of film popularity between the most popular film and the least popular one, based on actual first week box office sales. The methodology and tools used in this preliminary study illustrate that value can be derived from analysis of Internet-based QD and suggest that further research in this area is warranted.
  • Schroeder, J., Jennifer, X. u., & Chen, H. (2003). CrimeLink Explorer: Using domain knowledge to facilitate automated crime association analysis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2665, 168-180.
    More info
    Abstract: Link (association) analysis has been used in law enforcement and intelligence domains to extract and search associations between people from large datasets. Nonetheless, link analysis still faces many challenging problems, such as information overload, high search complexity, and heavy reliance on domain knowledge. To address these challenges and enable crime investigators to conduct automated, effective, and efficient link analysis, we proposed three techniques which include: the concept space approach, a shortest-path algorithm, and a heuristic approach that captures domain knowledge for determining importance of associations. We implemented a system called CrimeLink Explorer based on the proposed techniques. Results from our user study involving ten crime investigators from the Tucson Police Department showed that our system could help subjects conduct link analysis more efficiently. Additionally, subjects concluded that association paths found based on the heuristic approach were more accurate than those found based on the concept space approach. © Springer-Verlag Berlin Heidelberg 2003.
  • Yang, C. C., Chen, H., & Hong, K. (2003). Visualization of large category map for Internet browsing. Decision Support Systems, 35(1), 89-102.
    More info
    Abstract: Information overload is a critical problem in World Wide Web. Category map developed based on Kohonen's self-organizing map (SOM) has been proven to be a promising browsing tool for the Web. The SOM algorithm automatically categorizes a large Internet information space into manageable sub-spaces. It compresses and transforms a complex information space into a two-dimensional graphical representation. Such graphical representation provides a user-friendly interface for users to explore the automatically generated mental model. However, as the amount of information increases, it is expected to increase the size of the category map accordingly in order to accommodate the important concepts in the information space. It results in increasing of visual load of the category map. Large pool of information is packed closely together on a limited size of displaying window, where local details are difficult to be clearly seen. In this paper, we propose the fisheye views and fractal views to support the visualization of category map. Fisheye views are developed based on the distortion approach while fractal views are developed based on the information reduction approach. The purpose of fisheye views are to enlarge the regions of interest and diminish the regions that are further away while maintaining the global structure. On the other hand, fractal views are an approximation mechanism to abstract complex objects and control the amount of information to be displayed. We have developed a prototype system and conducted a user evaluation to investigate the performance of fisheye views and fractal views. The results show that both fisheye views and fractal views significantly increase the effectiveness of visualizing category map. In addition, fractal views are significantly better than fisheye views but the combination of fractal views and fisheye views do not increase the performance compared to each individual technique. © 2002 Elsevier Science B.V. All rights reserved.
  • Zhao, J. L., Bi, H. H., & Chen, H. (2003). Collaborative workflow management for interagency crime analysis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2665, 266-280.
    More info
    Abstract: To strengthen homeland security, there is a critical need for new tools that can facilitate real time collaboration among various law enforcement agencies. Through a field study, we find that law enforcement work is knowledge intensive and involves complex collaborative processes interrelating a large number of disparate units in a loosely defined virtual organization. To support knowledge intensive collaboration, we propose a new workflow centric framework to seamlessly integrate previously separate techniques from the fields of information retrieval and workflow management. Specifically, we develop a collaborative workflow management framework for interagency crime analysis. The key contribution of our research is that by integrating various state-of-the-art techniques innovatively, the proposed system can support real time collaboration processes in a virtual organization that evolves dynamically. © Springer-Verlag Berlin Heidelberg 2003.
  • Zheng, R., Qin, Y., Huang, Z., & Chen, H. (2003). Authorship analysis in cybercrime investigation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2665, 59-73.
    More info
    Abstract: Criminals have been using the Internet to distribute a wide range of illegal materials globally in an anonymous manner, making criminal identity tracing difficult in the cybercrime investigation process. In this study we propose to adopt the authorship analysis framework to automatically trace identities of cyber criminals through messages they post on the Internet. Under this framework, three types of message features, including style markers, structural features, and content-specific features, are extracted and inductive learning algorithms are used to build feature-based models to identify authorship of illegal messages. To evaluate the effectiveness of this framework, we conducted an experimental study on data sets of English and Chinese email and online newsgroup messages. We experimented with all three types of message features and three inductive learning algorithms. The results indicate that the proposed approach can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies. © Springer-Verlag Berlin Heidelberg 2003.
  • Zhou, Y., Qin, J., & Chen, H. (2003). CMedPort: Intelligent searching for Chinese medical information. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2911, 34-45.
    More info
    Abstract: Most information retrieval techniques have been developed for English and other Western languages. As the second largest Internet language, Chinese provides a good setting for study of how search engine techniques developed for English could be generalized for use in other languages to facilitate Internet searching and browsing in a multilingual world. This paper reviews different techniques used in search engines and proposes an integrated approach to development of a Chinese medical portal: CMedPort. The techniques integrated into CMedPort include meta-search engines, cross-regional search, summarization and categorization. A user study was conducted to compare the effectiveness, efficiency and user satisfaction of CMedPort and three major Chinese search engines. Preliminary results from the user study show that CMedPort achieves similar accuracy in searching tasks, and higher effectiveness and efficiency in browsing tasks than Openfind, a Taiwan search engine portal. We believe that the proposed approach can be used to support Chinese information seeking in Web-based digital library applications. © Springer-Verlag Berlin Heidelberg 2003.
  • Chau, M., Chen, H., Qin, J., Zhou, Y., Qin, Y., Sung, W., & McDonald, D. (2002). Comparison of two approaches to building a vertical search tool: A case study in the nanotechnology domain. Proceedings of the ACM International Conference on Digital Libraries, 135-144.
    More info
    Abstract: As the Web has been growing exponentially, it has become increasingly difficult to search for desired information. In recent years, many domain-specific (vertical) search tools have been developed to serve the information needs of specific fields. This paper describes two approaches to building a domain-specific search tool. We report our experience in building two different tools in the nanotechnology domain - (1) a server-side search engine, and (2) a client-side search agent. The designs of the two search systems are presented and discussed, and their strengths and weaknesses are compared. Some future research directions are also discussed.
  • Chau, M., Chen, H., Qin, J., Zhou, Y., Sung, W., Qin, Y., McDonald, D., Lally, A., Chen, Y., & Landon, M. (2002). NanoPort: A web portal for nanoscale science and technology. Proceedings of the ACM International Conference on Digital Libraries, 373-.
    More info
    Abstract: An integrated Web portal aiming to provide a one-stop shopping service to satisfy the information needs of researchers and practitioners in the field of nanotechnology, or nanoscale science and engineering (NNSE), is described. The portal, so called NanoPort system, features vertical searching, meta-searching, noun phrasing, self-organized topic map, and automatic summarization.
  • Hauck, R. V., Atabakhsh, H., Ongvasith, P., Gupta, H., & Chen, H. (2002). Using coplink to analyze criminal-justice data. Computer, 35(3), 30-37.
    More info
    Abstract: The Coplink project has been initiated to address the problems in criminal justice systems. University of Arizona researchers originally generated the concept space approach to facilitate sematic retrieval of information. User studies show that this system also improves searching and browsing in the engineering and biomedicine domains.
  • Huang, Z., Chung, W., Ong, T., & Chen, H. (2002). A graph-based recommender system for digital library. Proceedings of the ACM International Conference on Digital Libraries, 65-73.
    More info
    Abstract: Research shows that recommendations comprise a valuable service for users of a digital library [11]. While most existing recommender systems rely either on a content-based approach or a collaborative approach to make recommendations, there is potential to improve recommendation quality by using a combination of both approaches (a hybrid approaches). In this paper, we report how we tested the idea of using a graph-based recommender system that naturally combines the content-based and collaborative approaches. Due to the similarity between our problem and a concept retrieval task, a Hopfield net algorithm was used to exploit high-degree book-book, user-user and book-user associations. Sample hold-out testing and preliminary subject testing were conducted to evaluate the system, by which it was found that the system gained improvement with respect to both precision and recall by combining content-based and collaborative approaches. However, no significant improvement was observed by exploiting high-degree associations.
  • Leroy, G., & Chen, H. (2002). Filling preposition-based templates to capture information from medical abstracts.. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 350-361.
    More info
    PMID: 11928489;Abstract: Due to the recent explosion of information in the biomedical field, it is hard for a single researcher to review the complex network involving genes, proteins, and interactions. We are currently building GeneScene, a toolkit that will assist researchers in reviewing existing literature, and report on the first phase in our development effort: extracting the relevant information from medical abstracts. We are developing a medical parser that extracts information, fills basic prepositional-based templates, and combines the templates to capture the underlying sentence logic. We tested our parser on 50 unseen abstracts and found that it extracted 246 templates with a precision of 70%. In comparison with many other techniques, more information was extracted without sacrificing precision. Future improvement in precision will be achieved by correcting three categories of errors.
  • McDonald, D., & Chen, H. (2002). Using sentence-selection heuristics to rank text segments in TXTRACTOR. Proceedings of the ACM International Conference on Digital Libraries, 28-35.
    More info
    Abstract: TXTRACTOR is a tool that uses established sentence-selection heuristics to rank text segments, producing summaries that contain a user-defined number of sentences. The purpose of identifying text segments is to maximize topic diversity, which is an adaptation of the Maximal Marginal Relevance criterion used by Carbonell and Goldstein [5]. Sentence selection heuristics are then used to rank the segments. We hypothesize that ranking text segments via traditional sentence-selection heuristics produces a balanced summary with more useful information than one produced by using segmentation alone. The proposed summary is created in a three-step process, which includes 1) sentence evaluation 2) segment identification and 3) segment ranking. As the required length of the summary changes, low-ranking segments can then be dropped from (or higher ranking segments added to) the summary. We compared the output of TXTRACTOR to the output of a segmentation tool based on the TextTiling algorithm to validate the approach.
  • Zhu, B., & Chen, H. (2002). Visualizing the archive of a computer mediated communication process. Proceedings of the ACM International Conference on Digital Libraries, 385-.
    More info
    Abstract: The archive of computer-mediated communication (CMC) process contains knowledge shared and information about participants' behavior patterns. However, most CMC systems focus only on organizing the content of discussions. We propose to demo a prototype system that integrates a social visualization technique with existing information analysis technologies to graphically summarize both the content and behavior of a CMC process.
  • Chen, H., Hauck, R. V., Atabakhsh, H., Gupta, H., Boarman, C., Schroeder, J., & Ridgeway, L. (2001). COPLINK: Information and knowledge management for law enforcement. Proceedings of SPIE - The International Society for Optical Engineering, 4232, 293-304.
    More info
    Abstract: The problem of information and knowledge management in the knowledge intensive and time critical environment of law enforcement has posed an interesting problem for information technology professionals in the field. Coupled with this challenging environment are issues relating to the integration of multiple systems, each having different functionalities resulting in difficulty for the end user. COPLINK offers a cost-efficient way of web enabling stovepipe law enforcement information sharing systems by employing a model for allowing different police departments to more easily share data amongst themselves through an easy-to-use interface that integrates different data sources. The COPLINK project has two major components: COPLINK Database (DB) Application and COPLINK Concept Space (CS) Application. The COPLINK DB design facilitates retrieval of case details based on known information. COPLINK CS is an investigative tool that captures the relationships between objects (e.g., people, locations, vehicles, organizations, crime types) in the entire database allowing investigators and detectives to perform investigative associations and case analysis. This paper describes how we have applied the design criteria of platform independence, stability, scalability, and an intuitive graphical user interface to develop the COPLINK systems. Results of user evaluations that have been conducted on both applications to study the impact of COPLINK on law enforcement personnel. The COPLINK DB Application is currently being deployed at the Tucson Police Department and the Conctept Space is undergoing further modifications. Future development efforts for COPLINK project will also be discussed.
  • Hauck, R. V., Sewell, R. R., Ng, T. D., & Chen, H. (2001). Concept-based searching and browsing: A geoscience experiment. Journal of Information Science, 27(4), 199-210.
    More info
    Abstract: In the recent literature, we have seen the expansion of information retrieval techniques to include a variety of different collections of information. Collections can have certain characteristics that can lead to different results for the various classification techniques. In addition, the ways and reasons that users explore each collection can affect the success of the information retrieval technique. The focus of this research was to extend the application of our statistical and neural network techniques to the domain of geological science information retrieval. For this study, a test bed of 22,636 geoscience abstracts was obtained through the NSF/DARPA/NASA funded Alexandria Digital Library Initiative project at the University of California at Santa Barbara. This collection was analyzed using algorithms previously developed by our research group: Concept space algorithm for searching and a Kohonen self-organizing map (SOM) algorithm for browsing. Included in this paper are discussions of our technique s, user evaluations and lessons learned.
  • Leroy, G., & Chen, H. (2001). Meeting medical terminology needs - The ontology-enhanced Medical Concept Mapper. IEEE Transactions on Information Technology in Biomedicine, 5(4), 261-270.
    More info
    PMID: 11759832;Abstract: This paper describes the development and testing of the Medical Concept Mapper, a tool designed to facilitate access to online medical information sources by providing users with appropriate medical search terms for their personal queries. Our system is valuable for patients whose knowledge of medical vocabularies is inadequate to find the desired information, and for medical experts who search for information outside their field of expertise. The Medical Concept Mapper maps synonyms and semantically related concepts to a user's query The system is unique because it integrates our natural language processing tool, i.e., the Arizona (AZ) Noun Phraser, with human-created ontologies, the Unified Medical Language System (UMLS) and WordNet, and our computer generated Concept Space, into one system. Our unique contribution results from combining the UMLS Semantic Net with Concept Space in our deep semantic parsing (DSP) algorithm. This algorithm establishes a medical query context based on the UMLS Semantic Net, which allows Concept Space terms to be filtered so as to isolate related terms relevant to the query. We performed two user studies in which Medical Concept Mapper terms were compared against human experts' terms. We conclude that the AZ Noun Phraser is well suited to extract medical phrases from user queries, that WordNet is not well suited to provide strictly medical synonyms, that the UMLS Metathesaurus is well suited to provide medical synonyms, and that Concept Space is well suited to provide related medical terms, especially when these terms are limited by our DSP algorithm.
  • Marchionini, G., Craig, A., Brandt, L., Klavans, J., & Chen, H. (2001). Digital libraries supporting digital government. Proceedings of First ACM/IEEE-CS Joint Conference on Digital Libraries, 395-397.
    More info
    Abstract: An overview of several digital government projects and initiatives that combine the technical and conceptual threads was presented. The project aimed to make Federal statistical data more easily available and usable by the broadest possible audiences. A framework for mapping questions onto interface mechanisms that depended on using metadata as an intermediary between user needs and agency data was also developed.
  • Roussinov, D. G., & Chen, H. (2001). Information navigation on the web by clustering and summarizing query results. Information Processing and Management, 37(6), 789-816.
    More info
    Abstract: We report our experience with a novel approach to interactive information seeking that is grounded in the idea of summarizing query results through automated document clustering. We went through a complete system development and evaluation cycle: designing the algorithms and interface for our prototype, implementing them and testing with human users. Our prototype acted as an intermediate layer between the user and a commercial Internet search engine (Alta Vista), thus allowing searches of the significant portion of World Wide Web. In our final evaluation, we processed data from 36 users and concluded that our prototype improved search performance over using the same search engine (AltaVista) directly. We also analyzed effects of various related demographic and task related parameters. © 2001 Elsevier Science Ltd.
  • Chen, H. (2000). Introduction to the special topic issue: Part 1. Journal of the American Society for Information Science and Technology, 51(3), 213-215.
    More info
    Abstract: Digital libraries represent a form of information technology in which social impact matters as much as technological advancement. It is hard to evaluate a new technology in the absence of real users and large collections. The best way to develop new technology is in multi-year large-scale research projects that use real-world electronic testbeds for actual users and aim at developing new, comprehensive, and user-friendly technologies for digital libraries. Typically, these testbed projects also examine the broad social, economic, legal, ethical, and crosscultural contexts and impacts of digital library research.
  • Chen, H. (2000). Introduction to the special topic issue: Part 2. Journal of the American Society for Information Science and Technology, 51(4), 311-312.
  • Houston, A. L., Chen, H., Schatz, B. R., Hubbard, S. M., Sewell, R. R., & Ng, T. D. (2000). Exploring the use of concept spaces to improve medical information retrieval. Decision Support Systems, 30(2), 171-186.
    More info
    Abstract: This research investigated the application of techniques successfully used in previous information retrieval research, to the more challenging area of medical informatics. It was performed on a biomedical document collection testbed, CANCERLIT, provided by the National Cancer Institute (NCI), which contains information on all types of cancer therapy. The quality or usefulness of terms suggested by three different thesauri, one based on MeSH terms, one based solely on terms from the document collection, and one based on the Unified Medical Language System (UMLS) Metathesaurus, was explored with the ultimate goal of improving CANCERLIT information search and retrieval. Researchers affiliated with the University of Arizona Cancer Center evaluated lists of related terms suggested by different thesauri for 12 different directed searches in the CANCERLIT testbed. The preliminary results indicated that among the thesauri, there were no statistically significant differences in either term recall or precision. Surprisingly, there was almost no overlap of relevant terms suggested by the different thesauri for a given search. This suggests that recall could be significantly improved by using a combined thesaurus approach.
  • Romano Jr., N. C., Bauer, C., Chen, H., & Nunamaker Jr., J. F. (2000). MindMine comment analysis tool for collaborative attitude solicitation, analysis, sense-making and visualization. Proceedings of the Hawaii International Conference on System Sciences, 19-.
    More info
    Abstract: This paper describes a study to explore the integration of Group Support Systems (GSS) and Artificial Intelligence (AI) technology to provide solicitation, analytical, visualization and sense-making support for attitudes from large distributed marketing focus groups. The paper describes two experiments and the concomitant evolutionary design and development of an attitude analysis process and the MindMine Comment Analysis Tool. The analysis process circumvents many of the problems associated with traditional data gathering via closed-ended questionnaires and potentially biased interviews by providing support for online free response evaluative comments. MindMine allows teams of raters to analyze comments from any source, including electronic meetings, discussion groups or surveys, whether they are Web-based or same-place. The analysis results are then displayed as visualizations that enable the team quickly to make sense of attitudes reflected in the comment set, which we believe provide richer information and a more detailed understanding of attitudes.
  • Tolle, K. M., & Chen, H. (2000). Comparing noun phrasing techniques for use with medical digital library tools. Journal of the American Society for Information Science and Technology, 51(4), 352-370.
    More info
    Abstract: In an effort to assist medical researchers and professionals in accessing information necessary for their work, the A1 Lab at the University of Arizona is investigating the use of a natural language processing (NLP) technique called noun phrasing. The goal of this research is to determine whether noun phrasing could be a viable technique to include in medical information retrieval applications. Four noun phrase generation tools were evaluated as to their ability to isolate noun phrases from medical journal abstracts. Tests were conducted using the National Cancer Institute's CANCERLIT database. The NLP tools evaluated were Massachusetts Institute of Technology's (MIT's) Chopper, The University of Arizona's Automatic Indexer, Lingsoft's NPtool, and The University of Arizona's AZ Noun Phraser. In addition, the National Library of Medicine's SPECIALIST Lexicon was incorporated into two versions of the AZ Noun Phraser to be evaluated against the other tools as well as a nonaugmented version of the AZ Noun Phraser. Using the metrics relative subject recall and precision, our results show that, with the exception of Chopper, the phrasing tools were fairly comparable in recall and precision. It was also shown that augmenting the AZ Noun Phraser by including the SPECIALIST Lexicon from the National Library of Medicine resulted in improved recall and precision.
  • Tolle, K. M., Chen, H., & Chow, H. (2000). Estimating drug/plasma concentration levels by applying neural networks to pharmacokinetic data sets. Decision Support Systems, 30(2), 139-151.
    More info
    Abstract: Predicting blood concentration levels of pharmaceutical agents in human subjects can be made difficult by missing data and variability within and between human subjects. Biometricians use a variety of software tools to analyze pharmacokinetic information in order to conduct research about a pharmaceutical agent. This paper is the comparison between using a feedforward backpropagation neural network to predict blood serum concentration levels of the drug tobramycin in pediatric cystic fibrosis and hemotologic-oncologic disorder patients with the most commonly used software for analysis of pharmacokinetics, NONMEM. Mean squared standard error is used to establish the comparability of the two estimation methods. The motivation for this research is the desire to provide clinicians and pharmaceutical researchers a cost effective, user friendly, and timely analysis tool for effectively predicting blood concentration ranges in human subjects.
  • Yang, C. C., Yen, J., & Chen, H. (2000). Intelligent Internet searching agent based on hybrid simulated annealing. Decision Support Systems, 28(3), 269-277.
    More info
    Abstract: The World-Wide Web (WWW) based Internet services have become a major channel for information delivery. For the same reason, information overload also has become a serious problem to the users of such services. It has been estimated that the amount of information stored on the Internet doubled every 18 months. The speed of increase of homepages can be even faster, some people estimated that it doubled every 6 months. Therefore, a scalable approach to support Internet searching is critical to the success of Internet services and other current or future National Information Infrastructure (NII) applications. In this paper, we discuss a modified version of simulated annealing algorithm to develop an intelligent personal spider (agent), which is based on automatic textual analysis of the Internet documents and hybrid simulated annealing.
  • Zhu, B., & Chen, H. (2000). Validating a geographical image retrieval system. Journal of the American Society for Information Science and Technology, 51(7), 625-634.
    More info
    Abstract: This paper summarizes a prototype geographical image retrieval system that demonstrates how to integrate image processing and information analysis techniques to support large-scale content-based image retrieval. By using an image as its interface, the prototype system addresses a troublesome aspect of traditional retrieval models, which require users to have complete knowledge of the low-level features of an image. In addition we describe an experiment to validate the performance of this image retrieval system against that of human subjects in an effort to address the scarcity of research evaluating performance of an algorithm against that of human beings. The results of the experiment indicate that the system could do as well as human subjects in accomplishing the tasks of similarity analysis and image categorization. We also found that under some circumstances texture features of an image are insufficient to represent a geographic image. We believe, however, that our image retrieval system provides a promising approach to integrating image processing techniques and information retrieval algorithms. © 2000 John Wiley & Sons, Inc.
  • Zhu, B., Ramsey, M., & Chen, H. (2000). Creating a large-scale content-based airphoto image digital library. IEEE Transactions on Image Processing, 9(1), 163-167.
    More info
    PMID: 18255383;Abstract: This paper describes a content-based image retrieval digital library that supports geographical image retrieval over a testbed of 800 aerial photographs, each 25 megabytes in size. In addition, this paper also introduces a methodology to evaluate the performance of the algorithms in the prototype system. The major contributions of this paper are two. 1) We suggest an approach that incorporates various image processing techniques including Gabor filters, image enhancement, and image compression, as well as information analysis technique such as self-organizing map (SOM) into an effective large-scale geographical image retrieval system. 2) We present two experiments that evaluate the performance of the Gabor-filter-extracted features along with the corresponding similarity measure against that of human perception, addressing the lack of studies in assessing the consistency between an image representation algorithm or an image categorization method and human mental model.
  • Chen, H. (1999). Semantic research for digital libraries. D-Lib Magazine, 5(10), 52-65.
  • Chen, H., & Houston, A. L. (1999). Digital Libraries: Social Issues and Technological Advances. Advances in Computers, 48(C), 257-314.
    More info
    Abstract: Abstract. The location and provision of information services has dramatically changed over the last ten years. There is no need to leave the home or office to locate and access information now readily available on-line via digital gateways furnished by a wide variety of information providers, (e.g. libraries, electronic publishers, businesses, organizations, individuals). Information access is no longer restricted to what is physically available in the nearest library. It is electronically accessible from a wide variety of globally distributed information repositories-"digital libraries". In this chapter we will focus on digital libraries, starting with a discussion of the historical visionaries, definitions, driving forces and enabling technologies and some key research issues. We will discuss some of the US and international digital library projects and research initiatives. We will then describe some of the emerging techniques for building large-scale digital libraries, including a discussion of semantic interoperability, the "Grand Challenge" of digital library research. Finally, we offer our conclusions and a discussion of some future directions for digital libraries. © 1999 Academic Press Inc.
  • Houston, A. L., Chen, H., Hubbard, S. M., Schatz, B. R., Ng, T. D., Sewell, R. R., & Tolle, K. M. (1999). Medical data mining on the internet: research on a cancer information system. Artificial Intelligence Review, 13(5), 437-466.
    More info
    Abstract: This paper discusses several data mining algorithms and techniques that we have developed at the University of Arizona Artificial Intelligence Lab. We have implemented these algorithms and techniques into several prototypes, one of which focuses on medical information developed in cooperation with the National Cancer Institute (NCI) and the University of Illinois at Urbana-Champaign. We propose an architecture for medical knowledge information systems that will permit data mining across several medical information sources and discuss a suite of data mining tools that we are developing to assist NCI in improving public access to and use of their existing vast cancer information collections.
  • Lin, C., Chen, H., & Nunamaker, J. (1999). Verifying the proximity hypothesis for self-organizing maps. Proceedings of the Hawaii International Conference on System Sciences, 33-.
    More info
    Abstract: The Kohonen Self-Organizing Map (SOM) is an unsupervised learning technique for summarizing high-dimensional data. When applied to textual data, SOM has been shown to be able to group together related concepts in a data collection. This article presents research in which we sought to validate this property of SOM, called the Proximity Hypothesis. We demonstrated that the Kohonen SOM was able to perform concept clustering effectively, based on its concept precision and recall scores judged by human experts. We believe this research has established the Kohonen SOM algorithm a promising textual classification technique for addressing the long-standing `information overload' problem.
  • Lin, C., Chen, H., & Nunamaker, J. F. (1999). Verifying the Proximity and Size Hypothesis for Self-Organizing Maps. Journal of Management Information Systems, 16(3), 57-70.
    More info
    Abstract: The Kohonen Self-Organizing Map (SOM) is an unsupervised learning technique for summarizing high-dimensional data so that similar inputs are, in general, mapped close to one another. When applied to textual data, SOM has been shown to be able to group together related concepts in a data collection and to present major topics within the collection with larger regions. This article presents research in which we sought to validate these properties of SOM, called the Proximity and Size Hypotheses, through a user evaluation study. Building upon our previous research in automatic concept generation and classification, we demonstrated that the Kohonen SOM was able to perform concept clustering effectively, based on its concept precision and recall7 scores as judged by human experts. We also demonstrated a positive relationship between the size of an SOM region and the number of documents contained in the region. We believe this research has established the Kohonen SOM algorithm as an intuitively appealing and promising neural-network-based textual classification technique for addressing part of the longstanding "information overload" problem.
  • McQuaid, M. J., Ong, T., Chen, H., & Nunamaker Jr., J. F. (1999). Multidimensional scaling for group memory visualization. Decision Support Systems, 27(1), 163-176.
    More info
    Abstract: We describe an attempt to overcome information overload through information visualization - in a particular domain, group memory. A brief review of information visualization is followed by a brief description of our methodology. We discuss our system, which uses multidimensional scaling (MDS) to visualize relationships between documents, and which we tested on 60 subjects, mostly students. We found three important (and statistically significant) differences between task performance on an MDS-generated display and on a randomly generated display. With some qualifications, we conclude that MDS speeds up and improves the quality of manual classification of documents and that the MDS display agrees with subject perceptions of which documents are similar and should be displayed together.
  • Ramsey, M. C., Chen, H., Zhu, B., & Schatz, B. R. (1999). A collection of visual thesauri for browsing large collections of geographic images. Journal of the American Society for Information Science, 50(9), 826-834.
    More info
    Abstract: Digital libraries of geo-spatial multimedia content are currently deficient in providing fuzzy, concept-based retrieval mechanisms to users. The main challenge is that indexing and thesaurus creation are extremely laborintensive processes for text documents and especially for images. Recently, 800,000 declassified satellite photographs were made available by the United States Geological Survey. Additionally, millions of satellite and aerial photographs are archived in national and local map libraries. Such enormous collections make human indexing and thesaurus generation methods impossible to utilize. In this article we propose a scalable method to automatically generate visual thesauri of large collections of geo-spatial media using fuzzy, unsupervised machine-learning techniques. © 1999 John Wiley & Sons, Inc.
  • Romano Jr., N. C., Roussinov, D., Nunamaker Jr., J. F., & Chen, H. (1999). Collaborative information retrieval environment: Integration of information retrieval with group support systems. Proceedings of the Hawaii International Conference on System Sciences, 33-.
    More info
    Abstract: How user experiences with information retrieval (IR) system and group support system (GSS) has shed light onto a promising new era of collaborative research and led the development of a prototype that merges the two paradigms into a collaborative information retrieval environment (CIRE) are described. The theory developed from initial user experiences with a prototype and the plans to empirically test the efficacy of this new paradigm through controlled experimentation is discussed.
  • Roussinov, D. G., & Chen, H. (1999). Document clustering for electronic meetings: An experimental comparison of two techniques. Decision Support Systems, 27(1), 67-79.
    More info
    Abstract: In this article, we report our implementation and comparison of two text clustering techniques. One is based on Ward's clustering and the other on Kohonen's Self-organizing Maps. We have evaluated how closely clusters produced by a computer resemble those created by human experts. We have also measured the time that it takes for an expert to `clean up' the automatically produced clusters. The technique based on Ward's clustering was found to be more precise. Both techniques have worked equally well in detecting associations between text documents. We used text messages obtained from group brainstorming meetings.
  • Schatz, B., & Chen, H. (1999). Digital Libraries: Technological Advances and Social Impacts. Computer, 32(2), 45-X.
    More info
    Abstract: Public awareness of the Net as a critical infrastructure in the 1990s has spurred a new revolution in the technologies for information retrieval in digital libraries.
  • Schatz, B., Mischo, W., Cole, T., Bishop, A., Harum, S., Johnson, E., Neumann, L., Chen, H., & Dorbin, N. g. (1999). Federated Search of Scientific Literature. Computer, 32(2), 51-58.
    More info
    Abstract: The Illinois Digital Library Project has developed an infrastructure for federated repositories. The deployed testbed indexes articles from many scientific journals and publishers in a production stream that can be searched as though they form a single collection.
  • Zhu, B., Ramsey, M., Ng, T. D., Chen, H., & Schatz, B. (1999). Creating a Large-Scale Digital Library for Georeferenced Information. D-Lib Magazine, 5(7-8), 51-66.
    More info
    Abstract: Digital libraries with multimedia geographic content present special challenges and opportunities in today's networked information environment. One of the most challenging research issues for geospatial collections is to develop techniques to support fuzzy, concept-based, geographic information retrieval. Based on an artificial intelligence approach, this project presents a Geospatial Knowledge Representation System (GKRS) prototype that integrates multiple knowledge sources (textual, image, and numerical) to support concept-based geographic information retrieval. Based on semantic network and neural network representations, GKRS loosely couples different knowledge sources and adopts spreading activation algorithms for concept-based knowledge inferencing. Both textual analysis and image processing techniques have been employed to create textual and visual geographical knowledge structures. This paper suggests a framework for developing a complete GKRS-based system and describes in detail the prototype system that has been developed so far.
  • Chen, H., Chung, Y., Ramsey, M., & Yang, C. C. (1998). A smart Itsy Bitsy spider for the Web. Journal of the American Society for Information Science, 49(7), 604-618.
    More info
    Abstract: As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent agent approach to Web searching. In this experiment, we developed two Web personal spiders based on best first search and genetic algorithm techniques, respectively. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the Web, based on the links and keyword indexing. A graphical, dynamic, Java-based interface was developed and is available for Web access. A system architecture for implementing such an agent-based spider is presented, followed by detailed discussions of benchmark testing and user evaluation results. In benchmark testing, although the genetic algorithm spider did not outperform the best first search spider, we found both results to be comparable and complementary. In user evaluation, the genetic algorithm spider obtained significantly higher recall value than that of the best first search spider. However, their precision values were not statistically different. The mutation process introduced in genetic algorithm allows users to find other potential relevant homepages that cannot be explored via a conventional local search process. In addition, we found the Java-based interface to be a necessary component for design of a truly interactive and dynamic Web agent.
  • Chen, H., Houston, A. L., Sewell, R. R., & Schatz, B. R. (1998). Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science, 49(7), 582-603.
    More info
    Abstract: The Internet provides an exceptional testbed for developing algorithms that can improve browsing and searching large information spaces. Browsing and searching tasks are susceptible to problems of information overload and vocabulary differences. Much of the current research is aimed at the development and refinement of algorithms to improve browsing and searching by addressing these problems. Our research was focused on discovering whether two of the algorithms our research group has developed, a Kohonen algorithm category map for browsing, and an automatically generated concept space algorithm for searching, can help improve browsing and/or searching the Internet. Our results indicate that a Kohonen self-organizing map (SOM)-based algorithm can successfully categorize a large and eclectic Internet information space (the Entertainment subcategory of Yahoo!) into manageable sub-spaces that users can successfully navigate to locate a homepage of interest to them. The SOM algorithm worked best with browsing tasks that were very broad, and in which subjects skipped around between categories. Subjects especially liked the visual and graphical aspects of the map. Subjects who tried to do a directed search, and those that wanted to use the more familiar mental models (alphabetic or hierarchical organization) for browsing, found that the map did not work well. The results from the concept space experiment were especially encouraging. There were no significant differences among the precision measures for the set of documents identified by subject-suggested terms, thesaurus-suggested terms, and the combination of subject- and thesaurus-suggested terms. The recall measures indicated that the combination of subject- and thesaurus-suggested terms exhibited significantly better recall than subject-suggested terms alone. Furthermore, analysis of the homepages indicated that there was limited overlap between the homepages retrieved by the subject-suggested and thesaurus-suggested terms. Since the retrieved homepages for the most part were different, this suggests that a user can enhance a keyword-based search by using an automatically generated concept space. Subjects especially liked the level of control that they could exert over the search, and the fact that the terms suggested by the thesaurus were "real" (i.e., originating in the homepages) and therefore guaranteed to have retrieval success.
  • Chen, H., Martinez, J., Kirchhoff, A., Ng, T. D., & Schatz, B. R. (1998). Alleviating search uncertainty through concept associations: Automatic indexing, co-occurrence analysis, and parallel computing. Journal of the American Society for Information Science, 49(3), 206-216.
    More info
    Abstract: In this article, we report research on an algorithmic approach to alleviating search uncertainty in a large information space. Grounded on object filtering, automatic indexing, and co-occurrence analysis, we performed a large-scale experiment using a parallel supercomputer (SGI Power Challenge) to analyze 400,000+ abstracts in an INSPEC computer engineering collection. Two system-generated thesauri, one based on a combined object filtering and automatic indexing method, and the other based on automatic indexing only, were compared with the human-generated INSPEC subject thesaurus. Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in concept recall, but in concept precision the 3 thesauri were comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and could be used to significantly increase "variety" in search terms and thereby reduce search uncertainty.
  • Chen, H., Nunamaker Jr., J., Orwig, R., & Titkova, O. (1998). Information visualization for collaborative computing. Computer, 31(8), 75-81.
    More info
    Abstract: A prototype tool classifies output from an electronic meeting system into a manageable list of concepts, topics, or issues that a group can further evaluate. In an experiment with output from the GroupSystems electronic meeting system, the tool's recall ability was comparable to that of a human facilitator, but took roughly a sixth of the time.
  • Chen, H., Shankaranarayanan, G., She, L., & Iyer, A. (1998). A machine learning approach to inductive query by examples: An experiment using relevance feedback, ID3, genetic algorithms, and simulated annealing. Journal of the American Society for Information Science, 49(8), 693-705.
    More info
    Abstract: Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 1980s, knowledge-based techniques also made an impressive contribution to "intelligent" information retrieval and indexing. More recently, information science researchers have turned to other newer inductive learning techniques including symbolic learning, genetic algorithms, and simulated annealing. These newer techniques, which are grounded in diverse paradigms, have provided great opportunities for researchers to enhance the information processing and retrieval capabilities of current information systems. In this article, we first provide an overview of these newer techniques and their use in information retrieval research. In order to familiarize readers with the techniques, we present three promising methods: The symbolic ID3 algorithm, evolution-based genetic algorithms, and simulated annealing. We discuss their knowledge representations and algorithms in the unique context of information retrieval. An experiment using a 8000-record COMPEN database was performed to examine the performances of these inductive query-by-example techniques in comparison with the performance of the conventional relevance feedback method. The machine learning techniques were shown to be able to help identify new documents which are similar to documents initially suggested by users, and documents which contain similar concepts to each other. Genetic algorithms, in particular, were found to out-perform relevance feedback in both document recall and precision. We believe these inductive machine learning techniques hold promise for the ability to analyze users' preferred documents (or records), identify users' underlying information needs, and also suggest alternatives for search for database management systems and Internet applications.
  • Chen, H., Zhang, Y., & Houston, A. L. (1998). Semantic indexing and searching using a Hopfield net. Journal of Information Science, 24(1), 3-18.
    More info
    Abstract: This paper presents a neural network approach to document semantic indexing. A Hopfield net algorithm was used to simulate human associative memory for concept exploration in the domain of computer science and engineering. INSFEC, a collection of more than 320,000 document abstracts from leading journals, was used as the document testbed. Benchmark tests confirmed that three parameters (maximum number of activated nodes, ε - maximum allowable error, and maximum number of iterations] were useful in positively influencing network convergence behavior without negatively impacting central processing unit performance. Another series of benchmark tests was performed to determine the effectiveness of various filtering techniques in reducing the negative impact of noisy input terms. Preliminary user tests confirmed our expectation that the Hopfield net algorithm is potentially useful as an associative memory technique to improve document recall and precision by solving discrepancies between indexer vocabularies and end-user vocabularies.
  • Hsinchun, C., Yi-Ming, C., Ramsey, M., & Yang, C. C. (1998). An intelligent personal spider (agent) for dynamic Internet/Intranet searching. Decision Support Systems, 23(1), 41-58.
    More info
    Abstract: As Internet services based on the World-Wide Web become more popular, information overload has become a pressing research problem. Difficulties with search on Internet will worsen as the amount of on-line information increases. A scalable approach to Internet search is critical to the success of Internet services and other current and future National Information Infrastructure (Nil) applications. As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent personal spider (agent) approach to Internet searching. The approach, which is grounded on automatic textual analysis and general-purpose search algorithms, is expected to be an improvement over the current static and inefficient Internet searches. In this experiment, we implemented Internet personal spiders based on best first search and genetic algorithm techniques. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the web, based on the links and keyword indexing. A plain, static CGI/HTML-based interface was developed earlier, followed by a recent enhancement of a graphical, dynamic Java-based interface. Preliminary evaluation results and two working prototypes (available for Web access) are presented. Although the examples and evaluations presented are mainly based on Internet applications, the applicability of the proposed techniques to the potentially more rewarding Intranet applications should be obvious. In particular, we believe the proposed agent design can be used to locate organization-wide information, to gather new, time-critical organizational information, and to support team-building and communication in Intranets. © 1998 Elsevier Science B.V. All rights reserved.
  • Ramsey, M. C., Ong, T., & Chen, H. (1998). Multilingual input system for the Web - an open multimedia approach of keyboard and handwriting recognition for Chinese and Japanese. Proceedings of the Forum on Research and Technology Advances in Digital Libraries, ADL, 188-194.
    More info
    Abstract: The basic building block of a multilingual information retrieval system is the input system. Chinese and Japanese characters pose great challenges for the conventional 101-key alphabet-based keyboard, because they are radical-based and number in the thousands. This paper reviews the development of various approaches and then presents a framework and working demonstrations of Chinese and Japanese input methods implemented in Java, which allow open deployment over the web to any platform. The demo includes both popular keyboard input methods and neural network handwriting recognition using a mouse or pen. This framework is able to accommodate future extension to other input mediums and languages of interest.
  • Yang, C. C., Yen, J., & Chen, H. (1998). Intelligent Internet searching engine based on hybrid simulated annealing. Proceedings of the Hawaii International Conference on System Sciences, 4, 415-422.
    More info
    Abstract: The World-Wide Web (WWW) based Internet services have become a major channel for information delivery. For the same reason, information overload also has become a serious problem to the users of such services. It has been estimated that the amount of information stored on the Internet doubled every 18 months. The speed of increase of home pages can be even faster, some people estimated that it doubled every six months. Therefore, a scalable approach to support Internet searching is critical to the success of Internet services and other current or future National Information Infrastructure (NII) applications. In this paper, we discuss using a modified version of simulated annealing algorithm to develop an intelligent personal spider (agent). Which is based on automatic textual analysis of the Internet documents and hybrid simulated annealing.
  • Chen, H., Chung, Y., Ramsey, M., Yang, C. C., Ma, P., & Yen, J. (1997). Intelligent spider for Internet searching. Proceedings of the Hawaii International Conference on System Sciences, 4, 178-188.
    More info
    Abstract: As the World-Wide Web (WWW) based Internet services become more popular, information overload also becomes a pressing research problem. Difficulties with searching on Internet get worse as the amount of information that available on the Internet increases. A scalable approach to support Internet search is critical to the success of Internet services and other current or future National Information Infrastructure (NII) applications. A new approach to build intelligent personal spider (agent), which is based on automatic textual analysis of Internet documents, is proposed in this paper. Best first search and genetic algorithm have been tested to develop the intelligent spider. These personal spiders are able to dynamically and intelligently analyze the contents of the users selected homepages as the starting point to search for the most relevant homepages based on the links and indexing. An intelligent spider must have the capability to make adjustments according to progress of searching in order to be an intelligent agent. However, the current searching engines do not have the communication between the users and the robots. The spider presented in this paper use Java to develop the user interface such that the users can adjust the control parameters according to the progress and observe the intermediate results. The performances of the genetic algorithm based and best first search based spiders are also reported.
  • Chen, H., Ng, T. D., Martinez, J., & Schatz, B. R. (1997). A concept space approach to addressing the vocabulary problem in scientific information retrieval: An experiment on the worm community system. Journal of the American Society for Information Science, 48(1), 17-31.
    More info
    Abstract: This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive studies related to the vocabulary problem and vocabulary-based search aids (thesauri) and then discuss techniques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we recently conducted an experiment in the molecular biology domain in which we created a C. elegans worm thesaurus of 7,657 worm-specific terms and a Drosophila fly thesaurus of 15,626 terms. About 30% of these terms overlapped, which created vocabulary paths from one subject domain to the other. Based on a cognitive study of term association involving four biologists, we found that a large percentage (59.6-85.6%) of the terms suggested by the subjects were identified in the conjoined fly-worm thesaurus. However, we found only a small percentage (8.4-18.1%) of the associations suggested by the subjects in the thesaurus. In a follow-up document retrieval study involving eight fly biologists, an actual worm database (Worm Community System), and the conjoined flyworm thesaurus, subjects were able to find more relevant documents (an increase from about 9 documents to 20) and to improve the document recall level (from 32.41 to 65.28%) when using the thesaurus, although the precision level did not improve significantly. Implications of adopting the concept space approach for addressing the vocabulary problem in Internet and digital libraries applications are also discussed.
  • Chen, H., Smith, T. R., Larsgaard, M. L., Hill, L. L., & Ramsey, M. (1997). A geographic knowledge representation system for multimedia geospatial retrieval and analysis. International Journal on Digital Libraries, 1(2), 132-152.
    More info
    Abstract: Digital libraries serving multimedia information that may be accessed in terms of geographic content and relationships are creating special challenges and opportunities for networked information systems. An especially challenging research issue concerning collections of geo-referenced information relates to the development of techniques supporting geographic information retrieval (GIR) that is both fuzzy and concept-based. Viewing the meta-information environment of a digital library as a heterogeneous set of services that support users in terms of GIR, we deffine a geographic knowledge representation system (GKRS) in terms of a core set of services of the meta-information environment that is required in supporting concept-based access to collections of geospatial information. In this paper, we describe an architecture for a GKRS and its implementation in terms of a prototype system. Our GKRS architecture loosely couples a variety of multimedia knowledge sources that are in part represented in terms of the semantic network and neural network representations developed in artifficial intelligence research. Both textual analysis and image processing techniques are employed in creating these textual and iconic geographical knowledge structures. The GKRS also employs spreading activation algorithms in support of concept-based knowledge retrieval. The paper describes implementational details of several of the components of the GKRS as well as discussing both the lessons learned from, and future directions of, our research. © Springer-Verlag 1997.
  • Chow, H., Tolle, K. M., Roe, D. J., Elsberry, V., & Chen, H. (1997). Application of neural networks to population pharmacokinetic data analysis. Journal of Pharmaceutical Sciences, 86(7), 840-845.
    More info
    PMID: 9232526;Abstract: This research examined the applicability of using a neural network approach to analyze population pharmacokinetic data. Such data were collected retrospectively from pediatric patients who had received tobramycin for the treatment of bacterial infection. The information collected included patient- related demographic variables (age, weight, gender, and other underlying illness), the individual's dosing regimens (dose and dosing interval), time of blood drawn, and the resulting tobramycin concentration. Neural networks were trained with this information to capture the relationships between the plasma tobramycin levels and the following factors: patient-related demographic factors, dosing regimens, and time of blood drawn. The data were also analyzed using a standard population pharmacokinetic modeling program, NONMEM. The observed vs predicted concentration relationships obtained from the neural network approach were similar to those from NONMEM. The residuals of the predictions from neural network analyses showed a positive correlation with that from NONMEM. Average absolute errors were 33.9 and 37.3% for neuraL networks and 39.9% for NONMEM. Average prediction errors were found to be 2.59 and -5.01% for neural networks and 17.7% for NONMEM. We concluded that neural networks were capable of capturing the relationships between plasma drug levels and patient-related prognostic factors from routinely collected sparse within-patient pharmacokinetic data. Neural networks can therefore be considered to have potential to become a useful analytical tool for population pharmacokinetic data analysis.
  • Orwig, R. E., Chen, H., & Nunamaker Jr., J. F. (1997). A graphical, self-organizing approach to classifying electronic meeting output. Journal of the American Society for Information Science, 48(2), 157-170.
    More info
    Abstract: This article describes research in the application of a Kohonen Self-Organizing Map (SOM) to the problem of classification of electronic brainstorming output and an evaluation of the results. Electronic brainstorming is one of the most productive tools in the Electronic Meeting System called GroupSystems. A major step in group problem solving involves the classification of electronic brainstorming output into a manageable list of concepts, topics, or issues that can be further evaluated by the group. This step is problematic due to information overload and the cognitive demand of processing a large quantity of textual data. This research builds upon previous work in automating the meeting classification process using a Hopfield neural network. Evaluation of the Kohonen output comparing it with Hopfield and human expert output using the same set of data found that the Kohonen SOM performed as well as a human expert in representing term association in the meeting output and outperformed the Hopfield neural network algorithm. In addition, recall of consensus meeting concepts and topics using the Kohonen algorithm was equivalent to that of the human expert. However, precision of the Kohonen results was poor. The graphical representation of textual data produced by the Kohonen SOM suggests many opportunities for improving information organization of textual information. Increasing uses of electronic mail, computer-based bulletin board systems, and world-wide web services present unique challenges and opportunities for a system-aided classification approach. This research has shown that the Kohonen SOM may be used to automatically create "a picture that can represent a thousand (or more) words.".
  • Chen, H., Houston, A., Nunamaker, J., & Yen, J. (1996). Computer toward intelligent meeting agents. Computer, 29(8), 62-69.
    More info
    Abstract: An experiment with an Al-based software agent shows that it can help users organize and consolidate ideas from electronic brainstorming. The agent recalled concepts as effectively as experienced human meeting facilitators and in a fifth of the time.
  • Chen, H., Schatz, B., Ng, T., Martinez, J., Kirchhoff, A., & Chienting, L. (1996). A parallel computing approach to creating engineering concept spaces for semantic retrieval: The Illinois digital library initiative project. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), 771-782.
    More info
    Abstract: This resGarch presorts preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. We have experimented previously with such a technique for a smaller molecular biology domain (Worm Community System, with 10+ MBs of document collection) with encouraging results. In order to address Ihe scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we recently conducted experiments using the concept space approach on parallel supercomputers. Our test collection included 2+ GBs of computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and cooccurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16processor SGI Power Challenge were promising. Power Challenge was later selected to create a comprehensive computer engineering concept space of about 270,000 terms and 4,000,000+ links using 24.5 hours of CPU time. Our system evaluation involving 12 knowledgeable subjects revealed that the automatically-created computer engineering concept space generated significantly higher concept recall than the human-generated INSPEC computer engineering thesaurus. However, the INSPEC was more precise than the automatic concept space. Our current work mainly involves creating concept spaces for other major engineering domains and developing robust graph matching and traversal algorithms for cross-domain, concept-based retrieval. Future work also will include generating individualized concept spaces for assisting user-specific concept-based information retrieval. © 1996 IEEE.
  • Chen, H., Schuffels, C., & Orwig, R. (1996). Internet Categorization and Search: A Self-Organizing Approach. Journal of Visual Communication and Image Representation, 7(1), 88-102.
    More info
    Abstract: The problems of information overload and vocabulary differences have become more pressing with the emergence of increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search (e.g., the Lycos server at CMU, the Yahoo server at Stanford) or hypertext browsing (e.g., Mosaic and Netscape). This research aims to provide an alternative concept-based categorization and search capability for WWW servers based on selected machine learning algorithms. Our proposed approach, which is grounded on automatic textual analysis of Internet documents (homepages), attempts to address the Internet search problem by first categorizing the content of Internet documents. We report results of our recent testing of a multilayered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize (classify) Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases and improve Internet keyword searching and/or browsing. © 1996 Academic Press, Inc.
  • Lin, C., & Chen, H. (1996). An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 26(1), 75-88.
    More info
    PMID: 18263007;Abstract: An automatic indexing and concept classification approach to a multilingual (Chinese and English) bibliographic database is presented. We introduced a multi-linear term-phrasing technique to extract concept descriptors (terms or keywords) from a Chinese-English bibliographic database. A concept space of related descriptors was then generated using a co-occurrence analysis technique. Like a man-made thesaurus, the system-generated concept space can be used to generate additional semantically-relevant terms for search. For concept classification and clustering, a variant of a Hopfield neural network was developed to cluster similar concept descriptors and to generate a small number of concept groups to represent (summarize) the subject matter of the database. The concept space approach to information classification and retrieval has been adopted by the authors in other scientific databases and business applications, but multilingual information retrieval presents a unique challenge. This research reports our experiment on multilingual databases. Our system was initially developed in the MS-DOS environment, running ETEN Chinese operating system. For performance reasons, it was then tested on a UNIX-based system. Due to the unique ideographic nature of the Chinese language, a Chinese term-phrase indexing paradigm considering the ideographic characteristics of Chinese was developed as a multilingual information classification model. By applying the neural network based concept classification technique, the model presents a novel way of organizing unstructured multilingual information. © 1996 IEEE.
  • Schatz, B. R., Johnson, E. H., Cochrane, P. A., & Chen, H. (1996). Interactive term suggestion for users of digital libraries: Using subject thesauri and co-occurrence lists for information retrieval. Proceedings of the ACM International Conference on Digital Libraries, 126-133.
    More info
    Abstract: The basic problem in information retrieval is that large-scale searches can only match terms specified by the user to terms appearing in documents in the digital library collection. Intermediate sources that support term suggestion can thus enhance retrieval by providing alternative search terms for the user. Term suggestion increases the recall, while interaction enables the user to attempt to not decrease the precision. We are building a prototype user interface that will become the Web interface for the University of Illinois Digital Library Initiative (DLI) testbed. It supports the principle of multiple views, where different kinds of term suggestors can be used to complement search and each other. This paper discusses its operation with two complementary term suggestors, subject thesauri and co-occurrence lists, and compares their utility. Thesauri are generated by human indexers and place selected terms in a subject hierarchy. Co-occurrence lists are generated by computer and place all terms in frequency order of occurrence together. This paper concludes with a discussion of how multiple views can help provide good quality Search for the Net. This is a paper about the design of a retrieval system prototype that allows users to simultaneously combine terms offered by different suggestion techniques, not about comparing the merits of each in a systematic and controlled way. It offers no experimental results.
  • Schatz, B., & Chen, H. (1996). Building large-scale digital libraries. Computer, 29(5), 22-26.
    More info
    Abstract: Digital libraries basically store materials in electronic format and manipulate large collections of those materials effectively. Therefore research into digital libraries is really research into network information systems. The key technological issues are how to search and display desired selections from and across large collections. While practical digital libraries must focus on issues of access costs and digitization technology, digital library research concentrates on how to develop the necessary infrastructure to effectively mass-manipulate the information on the Net.
  • Schatz, B., Mischo, W. H., Cole, T. W., Hardin, J. B., Bishop, A. P., & Chen, H. (1996). Federating diverse collections of scientific literature. Computer, 29(5), 28-35.
    More info
    Abstract: A University of Illinois project is developing an infrastructure for indexing scientific literature so that multiple Internet sources can be searched as a single federated digital library.
  • Chow, H., Chen, H., Ng, T., Myrdal, P., & Yalkowsky, S. H. (1995). Using backpropagation networks for the estimation of aqueous activity coefficients of aromatic organic compounds. Journal of Chemical Information and Computer Sciences, 35(4), 723-728.
    More info
    PMID: 7657730;Abstract: This research examined the applicability of using a neural network approach to the estimation of aqueous activity coefficients of aromatic organic compounds from fragmented structural information. A set of 95 compounds was used to train the neural network, and the trained network was tested on a set of 31 compounds. A comparison was made between the results and those obtained using multiple linear regression analysis. With the proper selection of neural network parameters, the backpropagation network provided a more accurate prediction of the aqueous activity coefficients for testing data than did regression analysis. This research indicates that neural networks have the potential to become a useful analytical technique for quantitative prediction of structure-activity relationships. © 1995 American Chemical Society.
  • Chen, H. (1994). Algorithmic approach to building concept space for a scientific community. Proceedings of the Hawaii International Conference on System Sciences, 4, 201-210.
    More info
    Abstract: This research reports an algorithmic approach to the generation of an organizational (community) memory for a scientific community. The techniques used included object filtering, automatic indexing, and cluster analysis. The testbed for our research was the Worm Community System, which contained various forms of (C. elegans) worm-related knowledge and literature, currently in use by molecular biologists in the C. elegans-related research community. The resulting organizational memory was represented as a knowledge base (or frame-based thesaurus). It included 2,709 researchers' names, 798 gene names, 20 experimental methods, and 4,302 subject descriptors.
  • Chen, H. (1994). Collaborative systems: solving the vocabulary problem. Computer, 27(5), 58-66.
    More info
    Abstract: Vocabulary differences have created difficulties for on-line information retrieval systems and are even more of a problem in computer-supported cooperative work (CSCW), where collaborators with different backgrounds engage in the exchange of ideas and information. Our research group at the University of Arizona has investigated two questions related to the vocabulary problem in CSCW. First, what are the nature and characteristics of the vocabulary problem in collaboration, and are they different from those observed in information retrieval or in human-computer interactions research? Second, how can computer technologies and information systems be designed to help alleviate the vocabulary problem and foster seamless collaboration? We examine the vocabulary problem in CSCW and suggest a robust algorithmic solution to the problem.
  • Chen, H. (1994). Machine learning approach to document retrieval: An overview and an experiment. Proceedings of the Hawaii International Conference on System Sciences, 3, 631-640.
    More info
    Abstract: In this article we first provide an overview of AI techniques and then present a machine learning based document retrieval system we developed. GANNET (Genetic Algorithms and Neural Nets System) performed concept (keyword) optimization for user-selected documents during document retrieval using the genetic algorithms. It then used the optimized concepts to perform concept exploration in a large network of related concepts through the Hopfield net parallel relaxation procedure. Our preliminary experiment showed that GANNET helped improve search recall by identifying the underlying concepts (keywords) which best describe the user-selected documents.
  • Chen, H., & Kim, J. (1994). GANNET: A machine learning approach to document retrieval. Journal of Management Information Systems, 10(4), 7-41.
    More info
    Abstract: Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 1980s, knowledge-based techniques also have made an impressive contribution to "intelligent" information retrieval and indexing. More recently, information science researchers have turned to other, newer artificial intelligence-based inductive learning techniques including neural networks, symbolic learning, and genetic algorithms. The newer techniques have provided great opportunities for researchers to experiment with diverse paradigms for effective information processing and retrieval. In this article we first provide an overview of newer techniques and their usage in information science research. We then present in detail the algorithms we adopted for a hybrid Genetic Algorithms and Neural Nets based system, called GANNET. GANNET performed concept (keyword) optimization for user-selected documents during information retrieval using the genetic algorithms. It then used the optimized concepts to perform concept exploration in a large network of related concepts through the Hopfield net parallel relaxation procedure. Based on a test collection of about 3,000 articles from DIALOG and an automatically created thesaurus, and using Jaccard's score as a performance measure, our experiment showed that GANNET improved the Jaccard's scores by about 50 percent and it helped identify the underlying concepts (keywords) that best describe the user-selected documents. © 1995 M.E. Sharpe, Inc.
  • Chen, H., & She, L. (1994). Inductive query by examples (IQBE): A machine learning approach. Proceedings of the Hawaii International Conference on System Sciences, 3, 428-437.
    More info
    Abstract: This paper presents an incremental, inductive learning approach to query-by-examples for information retrieval (IR) and database management systems (DBMS). After briefly reviewing conventional information retrieval techniques and the prevailing database query paradigms, we introduce the ID5R algorithm, previously developed by Utgoff, for `intelligent' and system-supported query processing.
  • Chen, H., Rinde, P. B., She, L., Sutjahjo, S., Sommer, C., & Neely, D. (1994). Expert prediction, symbolic learning, and neural networks an experiment on greyhound racing. IEEE expert, 9(6), 6pp.
    More info
    Abstract: Uncertainty, an inevitable problem in problem solving can be reduced by seeking the advice of an expert in terms of computer algorithms such as machine learning. Machine learning encompasses different types of solutions. In the present investigation, a different problem-solving scenario called game playing is investigated. For this purpose, greyhound racing, a complex domain that involves almost 50 performance variables for eight competing dogs in a race is considered. For every race, each dog's past history is complete and freely available to bettors. This article discusses the experimental procedures as well as the results obtained in the process.
  • Chen, H. (1992). Knowledge-based document retrieval: Framework and design. Journal of Information Science, 18(4), 293-314.
    More info
    Abstract: This article presents research on the design of knowledge-based document retrieval systems. We adopted a semantic network structure to represent subject knowledge and classification scheme knowledge and modeled experts' search strategies and user modeling capability as procedural knowledge. These functionalities were incorporated into a prototype knowledge-based retrieval system, Metacat. Our system, the design of which was based on the blackboard architecture, was able to create a user profile, identify task requirements, suggest heuristics-based search strategies, perform semantic-based search assistance, and assist online query refinement.
  • Chen, H., Lynch, K. J., Himler, A. K., & Goodman, S. E. (1992). Information management in research collaboration. International Journal of Man-Machine Studies, 36(3), 419-445.
    More info
    Abstract: Much of the work in business and academia is performed by groups of people. While significant advancement has been achieved in enhancing individual productivity by making use of information technology, little has been done to improve group productivity. Prior research suggests that we should know more about individual differences among group members as they respond to technology if we are to develop useful systems that can support group activities. We report results of a cognitive study in which researchers were observed performing three complex information entry and indexing tasks using an Integrated Collaborative Research System. The observations have revealed a taxonomy of knowledge and cognitive processes involved in the indexing and management of information in a research collaboration environment. A detailed comparison of knowledge elements and cognitive processes exhibited by senior researchers and junior researchers has been made in this article. Based on our empirical findings, we have developed a framework to explain the information management process during research collaboration. Directions for improving design of Integrated Collaborative Research Systems are also suggested. © 1992.
  • Roche, E. M., Goodman, S. E., & Chen, H. (1992). The Landscape of International Computing. Advances in Computers, 35(C), 325-371.
  • Chen, H., & Dhar, V. (1991). Cognitive process as a basis for intelligent retrieval systems design. Information Processing and Management, 27(5), 405-432.
    More info
    Abstract: Two studies were conducted to investigate the cognitive processes involved in online document-based information retrieval. These studies led to the development of five computational models of online document retrieval. These models were then incorporated into the design of an "intelligent" document-based retrieval system. Following a discussion of this system, we discuss the broader implications of our research for the design of information retrieval systems. © 1991.
  • Chen, H., & Dhar, V. (1990). Knowledge-based approach to the design of document-based retrieval systems. Array, 281-290.
    More info
    Abstract: This article presents a knowledge-based approach to the design of document-based retrieval systems. We conducted two empirical studies investigating the users' behavior using an online catalog. The studies revealed a range of knowledge elements which are necessary for performing a successful search. We proposed a semantic network based representation to capture these knowledge elements. The findings we derived from our empirical studies were used to construct a knowledge-based retrieval system. We performed a laboratory experiment to evaluate the search performance of our system. The experiment showed that our system out-performed a conventional retrieval system in recall and user satisfaction. The implications of our study to the design of document-based retrieval systems are also discussed in this article.
  • Chen, H., & Dhar, V. (1990). User misconceptions of information retrieval systems. International Journal of Man-Machine Studies, 32(6), 673-692.
    More info
    Abstract: We report results of an investigation where thirty subjects were observed performing subject-based search in an online catalog system. The observations have revealed a range of misconceptions users have when performing subject-based search. We have developed a taxonomy that characterizes these misconceptions and a knowledge representation which explains these misconceptions. Directions for improving search performance are also suggested. © 1990 Academic Press Limited.

Proceedings Publications

  • Chen, H., Ampel, B., Samtani, S., Zhu, H., & Ullman, S. (2020, November). Labeling Hacker Exploits for Proactive Cyber Threat Intelligence: A Deep Transfer Learning Approach. In IEEE International Conference on Intelligence and Security Informatics.
  • Ebrahimi, M., Samtani, S., Chai, Y., & Chen, H. (2020, May). Detecting Cyber Threats in Non-English Hacker Forums: An Adversarial Cross-Lingual Knowledge Transfer Approach. In IEEE Symposium on Security and Privacy (IEEE S&P 2020) Workshop on Deep Learning for Security (DLS).
  • Lazarine, B., Samtani, S., Patton, M., Zhu, H., Ullman, S., Ampel, B., & Chen, H. (2020, November). Identifying Vulnerable GitHub Repositories and Users in Scientific Cyberinfrastructure: An Unsupervised Graph Embedding Approach. In IEEE International Conference on Intelligence and Security Informatics.
  • Lin, F., Liu, Y., Ebrahimi, M., Ahmad-Post, Z., Hu, J., Xin, J., Samtani, S., Li, W., & Chen, H. (2020, November). Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach. In IEEE International Conference on Intelligence and Security Informatics.
  • Liu, Y., Lin, F., Ahmad-Post, Z., Ebrahimi, M., Zhang, N., Hu, J., Li, W., Xin, J., & Chen, H. (2020, November). Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web. In IEEE International Conference on Intelligence and Security Informatics.
  • Ullman, S., Samtani, S., Lazarine, B., Zhu, H., Ampel, B., Patton, M., & Chen, H. (2020, November). Smart Vulnerability Assessment for Scientific Cyberinfrastructure: An Unsupervised Graph Embedding Approach. In IEEE International Conference on Intelligence and Security Informatics.
  • Zhang, N., Ebrahimi, M., Li, W., & Chen, H. (2020, November). A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web. In IEEE International Conference on Intelligence and Security Informatics.
  • Ampel, B., Patton, M. W., & Chen, H. (2019, July). Performance Modeling of Hyperledger Sawtooth Blockchain. In IEEE International Conference on Intelligence and Security Informatics.
  • Arnold, N., Ebrahimi, M., Zhang, N., Lazarine, B., Samtani, S., Patton, M. W., & Chen, H. (2019, July). Dark Net Ecosystem Cyber Threat Intelligence Tool. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H., Zeng, D., Yan, X., & Xing, C. (2019, Summer). Lecture Notes in Computer Science. In International Conference on Smart Health.
  • Du, P. Y., Ebrahimi, M., Zhang, N., Brown, R., Chen, H., & Samtani, S. (2019, July). Identifying High-Impact Opioid Products and Key Sellers in Dark Net Marketplaces: An Interpretable Text Analytics Approach. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H. (2018, July2018). Lecture Notes in Computer Science 10983. In International Conference on Smart Health.
  • Chen, H. (2018, June2018). Motion Sensor-Based Assessment on Fall Risk and Parkinson’s Disease Severity: A Deep Multi-Source Multi-Task Learning (DMML) Approach. In IEEE International Conference on Health Informatics.
  • Chen, H. (2018, Nov2018). Benchmarking Vulnerability Assessment Tools for Enhanced Cyber-Physical Systems Resiliency. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H. (2018, Nov2018). Detecting Cyber Threats in Non-English Dark Net Markets: A Cross-Lingual Transfer Learning Approach: An Exploratory Study. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H. (2018, Nov2018). Identifying, Collecting and Presenting Hacker Community Data: Forums, IRC, Carding Shops and DNMs. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H. (2018, Nov2018). Incremental Hacker Forum Exploit Collection and Classification for Practical Cyber Threat Intelligence: An Exploratory Study. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H. (2018, Nov2018). Vulnerability Assessment, Remediation, and Automated Reporting: Case Studies of Higher Education Institutions. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H., Patton, M. W., Samtani, S., & Harrell, C. (2018, November). Vulnerability Assessment, Remediation, and Automated Reporting: Case Studies of Higher Education Institutions. In IEEE ISI 2018.
  • Chen, H., Patton, M. W., Samtani, S., & Williams, R. (2018, November). Incremental Hacker Forum Exploit Collection and Classification for Proactive Cyber Threat Intelligence: An Exploratory Study. In IEEE ISI 2018.
  • Chen, H., Samtani, S., Patton, M. W., & McMahon, E. (2018, November). Benchmarking Vulnerability Assessment Tools for Enhanced Cyber-Physical System (CPS) Resiliency. In IEEE ISI 2019.
  • Chen, H. (2017, July2017). Accessing Medical Device Vulnerabilities on the Internet of Things. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H. (2017, July2017). Identifying Mobile Malware and Key Threat Actors in Online Hacker Forusm for Proactive Cyber Threat Intelligence. In IEEE International Conference on Intelligence and Security Informatics.
  • Chen, H. (2017, July2017). Identifying Vulnerabilities of Consumer Internet of Things (IoT) Devices: A Scalable Approach. In IEEE International Conference on Intelligence and Security Informatics.
  • El, M., McMahon, E., Samtani, S., Patton, M. W., & Chen, H. (2017, August). Benchmarking vulnerability scanners: An experiment on SCADA devices and scientific instruments. In 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 83 - 88.
  • Grisham, J., Samtani, S., Patton, M. W., Chen, H., Grisham, J., Samtani, S., Patton, M. W., & Chen, H. (2017, August). Identifying mobile malware and key threat actors in online hacker forums for proactive cyber threat intelligence. In 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 13 - 18.
  • McMahon, E., Williams, R., El, M., Samtani, S., Patton, M. W., & Chen, H. (2017, August). Assessing medical device vulnerabilities on the Internet of Things. In 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 176 - 178.
  • William, R., McMaho, E., Samtani, S., Patton, M. W., & Chen, H. (2017, August). Identifying vulnerabilities of consumer Internet of Things (IoT) devices: A scalable approach. In 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 179 - 181.
  • Benjamin, V. A., & Chen, H. (2016, September). Identifying Language Groups within Multilingual Cybercriminal Forums. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Chen, X., Zhang, Y., Xu, J., Xing, C., & Chen, H. (2016, April). Deep Learning Based Topic Identification and Categorization: Mining Diabetes-Related Topics in Chinese Health Websites. In DASFAA.
  • Ercolani, V., & Chen, H. (2016, September). Shodan Visualized. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Fang, Z., Zhao, X., Wei, Q., Chen, G., Zhang, Y., Xing, C., Li, W., & Chen, H. (2016, September). Exploring Key Hackers and Cybersecurity Threats in Chinese Hacker Communities. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Grisham, J., Barreras, C., Afarin, C., Patton, M. W., & Chen, H. (2016, September). Identifying Top Listers in Alphabay Using Latent Dirichlet Allocation. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Huang, S. Y., & Chen, H. (2016, September). Exploring the Online Underground Marketplaces through Topic-Based Social Network and Clustering. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Jiang, S., & Chen, H. (2016, May). NATERGM: A Model for Examining the Role of Nodal Attributes in Dynamic Social Media Networks. In IEEE 32nd International Conference on Data Engineering (ICDE).
  • Jicha, A., Patton, M. W., & Chen, H. (2016, September). SCADA Honeypots: An In-depth Analysis of Conpot. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Jicha, R., Patton, M. W., & Chen, H. (2016, September). Identifying Devices across the IPv4 Address Space. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Li, W., Yin, J., & Chen, H. (2016, December). Identify High Quality Carding Services in Underground Economy using Nonparametric Supervised Topic Model. In the 37th International Conference on Information Systems (ICIS).
  • Li, W., Yin, J., & Chen, H. (2016, December). Identify Key Data Breach Services with Nonparametric Supervised Topic Model. In Proceedings of the 2016 Workshop on Information Technologies and Systems (WITS).
  • Li, W., Yin, J., & Chen, H. (2016, September). Targeting Key Data Breach Services in Underground Supply Chain. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Li, Z., Sun, D., Chen, H., & Huang, S. Y. (2016, September). Identifying the Socio-Spatial Dynamics of Terrorist Attacks in the Middle East. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Rohrmann, R., Patton, M. W., & Chen, H. (2016, September). Anonymous Port Scanning Performing Network -Reconnaissance Through Tor. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Samtani, S., & Chen, H. (2016, September). Using Social Network Analysis to Identify Key Hackers for Keylogging Tools in Hacker Forums. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Samtani, S., Chinn, K., Larson, C., & Chen, H. (2016, September). AZSecure Hacker Assets Portal: Cyber Threat Intelligence and Malware Analysis. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Samtani, S., Yu, S., Zhu, H., Patton, M. W., & Chen, H. (2016, September). Identifying SCADA Vulnerabilities Using Passive and Active Vulnerability Assessment Techniques. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Zhao, K., Zhang, Y., Xing, C., Li, W., & Chen, H. (2016, September). Chinese Underground Market Jargon Analysis Based on Unsupervised Learning. In 2016 IEEE International Conference on Intelligence and Security Informatics.
  • Benjamin, V., & Chen, H. (2015, May). Developing Understanding of Hacker Language through the use of Lexical Semantics. In 2015 IEEE International Conference on Intelligence and Security Informatics, ISI 2015.
  • Benjamin, V., & Chen, H. (2015, May). Exploring Threats and Vulnerabilities in Hacker web: Forums, IRC and Carding Shops. In 2015 IEEE International Conference on Intelligence and Security Informatics, ISI 2015.
  • Chen, X., Zhang, Y., Xu, J., Xing, C., & Chen, H. (2016, January). Health-related Spammer Detection on Chinese Social Media. In International Conference on Smart Health, ICSH 2015.
  • Chuang, J., Maimoon, L., Yu, S., Zhu, H., Nybroe, C., Hsiao, O., Li, U. S., Liu, H., & Chen, H. (2016, January). SilverLink: Smart Home Health Monitoring for Senior Care. In International Conference on Smart Health, ICSH 2015.
  • Jiang, L., Li, Q., Li, P., & Chen, H. (2015, January). Tensor-based Learning for Perceiving Information-driven Stock Movements. In Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), Austin, Texas.
  • Liu, X., & Chen, H. (2015, November). Identifying Novel Adverse Drug Events from Health Social Media Using Distant Learning. In American Medical Informatics Association (AMIA) 2015 Annual Symposium.
  • Liu, X., Zhang, B., Susarla, A., Padma, R., & Chen, H. (2015, December). Improving YouTube Self-case Video Search: A Deep Learning Approach for Patuent Knowledge Extraction. In 2015 Workshop on Information Technologies and Systems (WITS).
  • Samtani, G., Maimoon, L., Chuang, J., Nybroe, C., Liu, X., Wiil, U., Li, S., & Chen, H. (2016, January). DiabeticLink: An Internationally Collaborative Cyber-Enabled Empowerment Platform. In International Conference on Smart Health, ICSH 2015.
  • Samtani, S., Chinn, R., & Chen, H. (2015, May). Exploring Hacker Assets in Underground Forums. In 2015 IEEE International Conference on Intelligence and Security Informatics.
  • Zhang, Y., Zhang, Y., Xu, J., Xing, C., & Chen, H. (2016, January). Sentiment Analysis on Chinese Health Forums: A Preliminary Study of Different Language Models. In International Conference on Smart Health, ICSH 2015.
  • Zhang, Y., Zhang, Y., Yin, Y., Xu, J., Xing, C., & Chen, H. (2016, January). Chronic Disease Related Entity Extraction in Online Chinese Question & Answer Services. In International Conference on Smart Health, ICSH 2015.
  • Chen, H. (2014, July). DiabeticLInk: An Integrated and Intelligent Cyber-enabled Health Social Platform for Diabetic Patients. In International Conference on Smart Health, ICSH 2014, Lecture Notes in Computer Science 8549.
  • Chen, H. (2014, July). Intelligence and Security Informatics - Pacific Asia Workshop, PAISI 2014, Tainan, Taiwan. Proceedings. In PAISI 2014, Lecture Notes in Computer Science 8440.
  • Chen, H. (2014, July). International Conference on Smart Health, ICSH 2014, Beijing, China, July 2014. Proceedings. In International Conference on Smart Health, ICSH 2014, Lecture Notes in Computer Science 8549.
  • Chen, H., & Li, W. (2014, September). Identifying Top Sellers in Underground Economy Using Deep Learning-based Sentiment Analysis. In Proceedings of 2014 IEEE International Conference on Intelligence and Security Informatics, The Netherlands.
  • Chen, H., & Lin, Y. (2014, September). Time-to-event Modeling for Predicting Hacker Community Participant Trajectory. In Proceedings of 2014 IEEE International Conference on Intelligence and Security Informatics, The Netherlands.
  • Chen, H., Abbasi, A., Li, W., & Benjamin, V. (2014, September). Descriptive Analytics: Investigating Expert Hackers in Hacker Forums. In Proceedings of 2014 IEEE International Conference on Intelligence and Security Informatics, The Netherlands.
  • Chen, H., Patton, M. W., Gross, E., Chinn, R., Forbis, S., & Walker, L. (2014, September). Uninvited Connections: A Study of the Vulnerable Devices on the Internet of Things (IoT). In Proceedings of 2014 IEEE International Conference on Intelligence and Security Informatics, The Netherlands.
  • Chen, H., Zheng, X., Zeng, D., Zhang, Y., Xing, C., & Neill, D. B. (2014, July). International Conference on Smart Health, Proceedings. In International Conference on Smart Health.
    More info
    Edited proceedings of conference; I am also conference co-founder.
  • Chen, X., Zhang, Y., Xing, C., Liu, X., Chen, H., Zheng, ., Zeng, D., Chen, H., Zhang, Y., Xing, C., & Neill, D. (2014, July). Diabetes-Related Topic Detection in Chinese Health Websites Using Deep Learning. In SMART HEALTH, ICSH 2014, 8549, 13-24.
    More info
    With 98.4 million people diagnosed with diabetes in China, most of the Chinese health websites provide diabetes related news and articles in diabetes subsection for patients. However, most of the articles are uncategorized and without a clear topic or theme, resulting in time consuming information seeking experience. To address this issue, we propose an advanced deep learning approach to detect topics for diabetes related articles from health websites. Our research framework for topic detection on diabetes related articles in Chinese is the first one to incorporate deep learning in topic detection in Chinese. It can identify topics of diabetes articles with high performance and potentially assist health information seeking. To evaluate our framework, experiment is conducted on a test bed of 12,000 articles. The results showed the framework achieved an accuracy of 70% in detecting topics and significantly outperformed the SVM based approach.
  • Chuang, J., Hsiao, O., Wu, P., Chen, J., Liu, X., De La Cruz, H., Li, S., Chen, H., Zheng, ., Zeng, D., Chen, H., Zhang, Y., Xing, C., & Neill, D. (2014, July). DiabeticLink: An Integrated and Intelligent Cyber-Enabled Health Social Platform for Diabetic Patients. In SMART HEALTH, ICSH 2014, 8549, 63-74.
    More info
    Given the demand of patient-centered care and limited healthcare resources, we believe that the community of diabetic patients is in need of an integrated cyber-enabled patient empowerment and decision support tool to promote diabetes prevention and self-management. Most existing tools are scattered and focused on solving a specific problem from a single angle. DiabeticLink offers an integrated and intelligent web-based platform that enables patient social connectivity and self-management, and offers behavior change aids using advanced health analytics techniques. DiabeticLink released a beta version in Taiwan in July 2013. The next versions of the DiabeticLink system are under active development and will be launched in the U.S., Denmark, and China in 2014. We describe the system functionalities and discuss the user testing and lessons learned from real-world experience. We also describe plans for future development.
  • Li, X., Zhang, T., Song, L., Zhang, Y., Xing, C., Chen, H., Zheng, ., Zeng, D., Chen, H., Zhang, Y., Xing, C., & Neill, D. (2014, July). A Control Study on the Effects of HRV Biofeedback Therapy in Patients with Post-Stroke Depression. In SMART HEALTH, ICSH 2014, 8549, 213-224.
    More info
    The post-stroke is often associated with emotional disorders, among which post-stroke depression (PSD) has a high incidence. We applied Heart Rate Variability (HRV) biofeedback to train PSD patients by a prospective randomized control study. The purpose of this study was to investigate the effectiveness of the HRV biofeedback on stroke patients' emotional improvement, autonomic nerve function and prognostic implications. In the feedback group, the patients had learned to breathe at the resonant frequency to increase their low frequency (LF) as well as adjust their respiration to synchronize with heart rate fluctuations. Our findings suggest that the HRV biofeedback may be a valid treatment especially on the improvement of depression levels and sleep disturbance in PSD patients.
  • Liu, X., Liu, J., Chen, H., Zheng, ., Zeng, D., Chen, H., Zhang, Y., Xing, C., & Neill, D. (2014, July). Identifying Adverse Drug Events from Health Social Media: A Case Study on Heart Disease Discussion Forums. In SMART HEALTH, ICSH 2014, 8549, 25-36.
    More info
    Health social media sites have emerged as major platforms for discussions of treatments and drug side effects, making them a promising source for listening to patients' voices in adverse drug event reporting. However, extracting patient adverse drug event reports from social media continues to be a challenge in health informatics research. To utilize the fertile health social media data for drug safety research, we develop advanced information extraction techniques for identifying adverse drug events in health social media. A case study is conducted on a heart disease discussion forum to evaluate the performance. Our approach achieves an f-measure of 82% in the recognition of medical events and treatments, an f-measure of 69% for identifying adverse drug events and an f-measure of 90% in patient report extraction. Analysis on the extracted adverse drug events suggests that health social media can provide supplemental information for adverse drug events and drug interactions. It provides a less biased insight into the distribution of adverse events among heart disease population compared to data from a drug regulatory agency.
  • Song, X., Jiang, S., Yan, X., Chen, H., Zheng, ., Zeng, D., Chen, H., Zhang, Y., Xing, C., & Neill, D. (2014, July). Collaborative Friendship Networks in Online Healthcare Communities: An Exponential Random Graph Model Analysis. In SMART HEALTH, ICSH 2014, 8549, 75-87.
    More info
    Health 2.0 provides patients an unprecedented way to connect with each other online. However, less attention has been paid to how patient collaborative friendships form in online healthcare communities. This study examines the relationship between collaborative friendship formation and patients' characteristics. Results from Exponential Random Graph Model (ERGM) analysis indicate that gender homophily doesn't appear in CFNs, while health homophily such as treatments homophily and health-status homophily increases the likelihood of collaborative friendship formation. This study provides insights for improving website design to help foster close relationship among patients and deepen levels of engagement.
  • Yin, Y., Zhang, Y., Liu, X., Zhang, Y., Xing, C., Chen, H., Zheng, ., Zeng, D., Chen, H., Zhang, Y., Xing, C., & Neill, D. (2014, July). HealthQA: A Chinese QA Summary System for Smart Health. In SMART HEALTH, ICSH 2014, 8549, 51-62.
    More info
    Although online health expert QA services can provide high quality information for health consumers, there is no Chinese question answering system built on knowledge from existing expert answers, leading to duplicated efforts of medical experts and reduced efficiency. To address this issue, we develop a Chinese QA system for smart health (HealthQA), which provides timely, automatic and valuable QA service. Our HealthQA collects diabetes expert question answer data from three major QA websites in China. We develop a hierarchical clustering method to group similar questions and answers, an extended similarity evaluation algorithm for retrieving relevant answers and a ranking based summarization for representing the answer. ROUGE and manual tests show that our system significantly outperforms the search engine.
  • Yu, S., Zhu, H., Jiang, S., Chen, H., Zheng, ., Zeng, D., Chen, H., Zhang, Y., Xing, C., & Neill, D. (2014, July). Emoticon Analysis for Chinese Health and Fitness Topics. In SMART HEALTH, ICSH 2014, 8549, 1-12.
    More info
    An emoticon is a metacommunicative pictorial representation of facial expressions, which serves to convey information about the sender's emotional state. To complement non-verbal communication, emoticons are frequently used in Chinese online social media, especially in discussions of health and fitness topics. However, limited research has been done to effectively analyze emoticons in a Chinese context. In this study, we developed an emoticon analysis system to extract emoticons from Chinese text and classify them into one of 7 affect categories. The system is based on a kinesics model which divides emoticons into semantic areas (eyes, mouths, etc.), with an improvement for adaption in the Chinese context. Empirical tests were conducted to evaluate the effectiveness of the proposed system in extracting and classifying emoticons, based on a corpus of more than one million sentences of Chinese health-and fitness-related online messages. Results showed the system to be effective in detecting and extracting emoticons from text, and in interpreting the emotion conveyed by emoticons.

Presentations

  • Benjamin, V., Zhang, B., & Chen, H. (2015, November). Predicting Hacker IRC Participation using Discrete-time Duration Modeling with Repeated Events. INFORMS 2015 Annual Meeting. Philadelphia, PA: The Institute for Operations Research and the Management Sciences (INFORMS).
  • Liu, X., Zhang, B., Susarla, A., Padman, R., & Chen, H. (2015, October). Visual Social Media Analytics for Patient Centric Care. Workshop on Health IT and Economics. College Park, Maryland: Center for Health Information and Decision Systems, University of Maryland.
  • Lin, Y., Lin, M., Chen, H., Lin, Y., Lin, M., & Chen, H. (2014, November). Beyond adoption: Does meaningful use of EHR improve quality of care?. INFORMS Conference on Information Systems and Technology. San Francisco, CA: INFORMS.

Others

  • Leroy, G., Chen, H., & Rindflesch, T. C. (2014, MAY-JUN). Smart and Connected Health INTRODUCTION. IEEE INTELLIGENT SYSTEMS.
  • Liu, X., Jiang, S., Chen, H., Larson, C. A., & Roco, M. C. (2014, AUG 31). Nanotechnology knowledge diffusion: measuring the impact of the research networking and a strategy for improvement. JOURNAL OF NANOPARTICLE RESEARCH.
    More info
    Given the global increase in public funding for nanotechnology research and development, it is even more important to support projects with promising return on investment. A main return is the benefit to other researchers and to the entire field through knowledge diffusion, invention, and innovation. The social network of researchers is one of the channels through which this happens. This study considers the scientific publication network in the field of nanotechnology, and evaluates how knowledge diffusion through coauthorship and citations is affected in large institutions by the location and connectivity of individual researchers in the network. The relative position and connectivity of a researcher is measured by various social network metrics, including degree centrality, Bonacich Power centrality, structural holes, and betweenness centrality. Leveraging the Cox regression model, we analyzed the temporal relationships between knowledge diffusion and social network measures of researchers in five leading universities in the United States using papers published from 2000 to 2010. The results showed that the most significant effects on knowledge diffusion in the field of nanotechnology were from the structural holes of the network and the degree centrality of individual researchers. The data suggest that a researcher has potential to perform better in knowledge creation and diffusion on boundary-spanning positions between different communities and when he or she has a high level of connectivity in the knowledge network. These observations may lead to improved strategies in planning, conducting, and evaluating multidisciplinary nanotechnology research. The paper also identifies the researchers who made most significant contributions to nanotechnology knowledge diffusion in the networks of five leading U.S. universities.

Profiles With Related Publications

  • Mark W Patton
  • Joe S Valacich
  • Bin Zhang
  • Jay F Nunamaker
  • Randall A Brown
  • Junming Yin
  • Susan A Brown
  • Denise Roe
  • H-H. Sherry Chow
  • Daniel A McDonald
  • Gondy Leroy
  • David W Galbraith
  • Bernard W Futscher
  • Jesse D Martinez
  • Yong Liu

 Edit my profile

UA Profiles | Home

University Information Security and Privacy

© 2021 The Arizona Board of Regents on behalf of The University of Arizona.