Abstract:  Social “big data” holds information with wide-ranging implications for addressing issues along the HIV care continuum. Social big data refers to information from social media and online platforms on which individuals and communities create, share, and discuss content. One in four people worldwide, or over a billion people, are publically documenting their activities, intentions, moods, opinions, and social interactions on these sites. They are doing so with increasing volume and velocity, including 400 million “tweets” per day on Twitter and 4.75 billion content items shared per day on Facebook. With an increasing number of these platforms supporting access to publicly-available user data, social big data analysis is a promising new approach for attaining organic observations of behavior that can be used to monitor and predict real-world public health problems, such as HIV incidence. New tools such as social data are therefore needed to supplement existing HIV data collection methods. In preliminary research, our team developed the first approach that identifies psychological and behavioral characteristics from social big data (>550 million tweets) found to be associated with HIV diagnoses. Since groups at the highest risk for HIV (e.g., minority populations) are the fastest growing Twitter users, and because social media users have been found to publicly share personal information, we identified and collected tweets suggesting HIV risk behaviors (e.g., drug use, high-risk sexual behaviors, etc.) and modeled them alongside CDC statistics on HIV diagnoses. We found a significant positive relationship between HIV- related tweets and county-level HIV cases, controlling for socioeconomic status measures and other variables. The problem is that this approach is not currently scalable for use by HIV researchers and public health organizations. Although public health agencies are interested in mining social data to address HIV, current tools are not accessible to most health scientists, as the tools require advanced computer science expertise. For example, analyzing 500 million tweets a day requires expertise in big data engineering, advanced machine learning, natural language processing, and artificial intelligence. Developing a single platform for mining social data that has been designed and tested by and for HIV researchers could provide a significant impact on HIV prevention, testing, and treatment. We seek to create a single automated platform that collects social media data; identifies, codes, and labels tweets that suggest HIV-related behaviors; and ultimately predicts regional HIV incidence. Because of the potential ethical issues associated with mining people’s data, we also seek to interview staff at local and regional HIV organization and participants affected by HIV to gain their perspectives on the ethical issues associated with this approach. The software developed from this application will be shared with HIV researchers and health care workers to provide additional tools that can be used to combat the spread of HIV.

Project Number: 1R56AI125105-01A1

https://reporter.nih.gov/search/jRCGXVrkakWONMVPH59sPA/project-details/9317061

 

 

Contact PI/ Project Leader

YOUNG, SEAN, ASSOCIATE PROFESSOR (syoung5@hs.uci.edu)

 

 

Organization

UNIVERSITY OF CALIFORNIA LOS ANGELES

 

 

PUBLIC HEALTH RELEVANCE: Project Narrative Surveillance and monitoring of HIV and related risk behaviors is a top priority. This project is of particularly high impact because it seeks to develop software to allow researchers to analyze real-time conversations from social media big data to monitor HIV diagnoses. It also will provide data on the ethical issues associated with the increasing number of these “social data mining” approaches. The software developed from this application will be shared with HIV researchers and health care workers to provide additional tools that can be used to combat the spread of HIV.

 

 

 

 

Project Start Date: 01-September-2016

Project End Date: 31-August-2019

Budget Start Date: 01-September-2016

Budget End Date: 31-August-2019

 

 

NIH Categorical Spending

Funding IC: NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES / FY Total Cost by IC: $671,438