Mining the Social Web to Monitor Public Health and HIV Risk Behaviors

Abstract: Surveillance and monitoring of risk behavior and disease is a top priority of many health-related agencies and organizations, including the UN AIDS, Centers for Disease Control and Prevention (CDC), and local public health and epidemiology departments. Identification and localization of changes in risk behaviors and disease can dramatically improve public health outcomes and reduce health-related costs, by providing data on where interventions are needed and how to direct public health efforts. For example, HIV researchers, public health departments, and government organizations, have attempted to identify and monitor HIV risk behavior (e.g., sexual intercourse and illicit drug use) and HIV outbreaks to improve prevention and treatment efforts and curb a growing HIV epidemic. Social media use has been rapidly increasing, and data from these technologies might be leveraged for identification of HIV risk behaviors, such as sexual- and drug-related risk behaviors. Although researchers have developed methods of using social media to monitor HIV and public health behaviors and outcomes, these methods require extensive manual time, technical expertise, and multiple software platforms to process these big data. Advances in technology, including technology infrastructure, data mining, and machine learning approaches, can be leveraged to provide tools that can be used to create a single automated platform for extracting free-text social media conversations, labeling these conversations to identify health risk-related behaviors, and using these labels to monitor disease outbreaks. We propose to create a single automated platform that collects social media (Twitter) data; identifies, codes, and labels tweets that suggest HIV risk behaviors; and provides an output that is acceptable for HIV researchers, public health workers, and policymakers to monitor HIV risk behaviors and outcomes. The tools developed from this application will be open source, tailored for use for epidemiologists and public health departments, and will be available for integration with other software tools to improve the effectiveness of public health monitoring systems.

Project Number: 5U01HG008488-03