Feb 9th, 2022
Over 800,000 people have died from drug overdoses since 1999 (ref). The estimated population with Opioid Use Disorder (OUD) is over 2.1 million in the United States (ref). Throughout the COVID-19 pandemic, this opioid crisis has worsened with an estimated 100,306 drug overdose deaths in the US during the 12-month period ending in April 2021, which is an increase of 28.5% from the 78,056 deaths during the same period the year before (ref).
To mitigate the adverse consequences of this opioid crisis, strategies, such as improving access to overdose reversal and medication for addiction treatment, are emerging at the state and local level in the US. Medication for OUD treatment (MOUDs), including methadone hydrochloride, buprenorphine hydrochloride, and extended-release naltrexone hydrochloride, have been clinically proven to be effective at reducing opioid use and adverse outcomes (ref). In this notebook, we will assess the MOUD capacity at the state and county levels across the US, and identify the high risk counties with high drug related death rate and low MOUD accessibility, using the data from project Opioid Environment Policy Scan (OEPS). The OEPS data is created and led by the Healthy Regions and Policies Lab at University of Chicago, part of the Methodology and Advanced Analytics Resource Center (MAARC)
Susan Paykin, Dylan Halpern, Qinyun Lin, Moksha Menghaney, Angela Li, Rachel Vigil, Margot Bolanos Gamez, Alexa Jin, Ally Muszynski, and Marynia Kolak. (2021). GeoDaCenter/opioid-policy-scan: Opioid Environment Policy Scan Data Warehouse (v1.0). Zenodo. https://doi.org/10.5281/zenodo.5842465
Note: To view code clocks, please click the
Show Codebutton at the upper-left corner.
# suppress warnings (to include warnings,set warn to 0)
options(warn=-1)
# import Python libraries
library(sf)
library(tidygeocoder)
library(tmap)
library(tidyverse)
library(readxl)
library(units)
library(repr)
The MOUDs location across the US is stored in a comma-delimited file called us-wide-moudsCleaned.csv. The next two blocks read the file into dataframe, conduct basic data cleanup, and show the first two lines of the data.
Each line shows a record of MOUDs provider including its detailed location and which MOUD (under category column) the provider can prescribe.
# Read the file of MOUD locations
moud_location = '/home/qiong/JINK/PXP-9010_JCOIN_MOUD_notebook/OEPS_download_files/us-wide-moudsCleaned.csv'
class_vector <-rep(c("character"),17)
moud_clinics<-read.csv(moud_location, stringsAsFactors=FALSE,
                       na.strings=c("NA"), colClasses=class_vector)
# Dataframe clean up
names(moud_clinics) <-c(
     "name1",
     "name2" ,
     "street1" ,
     "street2" ,
     "city" ,
     "state" ,
     "zip" ,
     "zip4" ,
     "category",
     "countyGEOID" ,
     "countyName" ,
     "source",
     "geom1"  ,
     "geom2",
     "Longitude",
     "Latitude")
moud_clinics$geom <- paste(moud_clinics$geom1,",",moud_clinics$geom2)
dropcol <- c("geom1", "geom2")
moud_clinics <- moud_clinics[, !(names(moud_clinics) %in% dropcol)]
# show the head of the data frame
head(moud_clinics, n=2)
Below is the summary count table of MOUDs for each category (methadone, buprenorphine, and naltrexone). Keep in mind that a MOUD provider can appear in multiple categories. For instance, one provider can prescribe both buprenorphine and naltrexone.
# Some providers under Naltrexone category have been double registered under both SAMHSA and vivitrolWeb.
# We need to remove duplicates first
# Create a new variable with complete full address
moud_clinics$full_address <- paste(moud_clinics$name1,",",moud_clinics$name2, ",", moud_clinics$street1,",",moud_clinics$street2, ",", moud_clinics$city, ",", moud_clinics$state, ",",moud_clinics$zip)
moud_noduplicate <- moud_clinics %>% 
    arrange(desc(full_address), desc(category)) %>% 
    group_by(full_address) %>% 
    distinct(category, .keep_all = TRUE)
# Count the number of MOUD providers under each category
moud_noduplicate %>% group_by(category) %>% summarise(n = n())
Below we generated a summary count table for each MOUD category in each state. As mentioned above, same provider can appear in more than one category.
# Create a dataframe of MOUD category counts by state
methadone_state <- moud_noduplicate %>% filter(category == 'methadone') %>% group_by(state) %>% summarise(n = n())
colnames(methadone_state)[2] <- "methadone"
buprenorphine_state <- moud_noduplicate %>% filter(category == 'buprenorphine') %>% group_by(state) %>% summarise(n = n())
colnames(buprenorphine_state)[2]<- "buprenorphine"
naltrexone_state <- moud_noduplicate %>% filter(category == 'naltrexone/vivitrol') %>% group_by(state) %>% summarise(n = n())
colnames(naltrexone_state)[2]<- "naltrexone"
merged_df <- merge(methadone_state, buprenorphine_state, by='state', all=TRUE) %>% merge(naltrexone_state, by='state', all=TRUE)
head(merged_df, n=5)
To calculate the accessibilty of MOUDs in each state, we merged the MOUD counts by state dataframe with another dataframe (Health01_S.csv), which contains the estimated state population from 2009-18. Below is the first few lines of a new dataframe MOUD_death_pop with additional columns for MOUDs rate in each state.
# Calculate the MOUDs rate at the state level
# Read the drug related death rate dataset which has the State population number
OD_death_S <-read.csv("/home/qiong/JINK/PXP-9010_JCOIN_MOUD_notebook/OEPS_download_files/Health01_S.csv",
                     stringsAsFactors=FALSE, colClasses=c("STATEFP"="character", "state"="character"))
# Data cleaning
# remove last row
nrow <-dim(OD_death_S)[1]
OD_death_S <- OD_death_S[1:(nrow-1),]
# The population is the aggregation over ten years (09-18)
OD_death_S$pop <- OD_death_S$pop/10
# remove the District of Columbia, only 48 State left in the data frame
OD_death_S <- OD_death_S[!(OD_death_S$STATEFP=="11"), ]
# convert the state full name into 2 letter abbreviation
OD_death_S$state <- state.abb[match(OD_death_S$state,state.name)]
# Merge the MOUD count by state table with state level drug related death rate table
MOUD_death_pop <- merge(merged_df, OD_death_S, by='state', all.y=TRUE)
#Calculate the rate of each MOUD category per 100K population
MOUD_death_pop$methRt <- (MOUD_death_pop$methadone/MOUD_death_pop$pop)* 100000
MOUD_death_pop$bupreRt <- (MOUD_death_pop$buprenorphine/MOUD_death_pop$pop)* 100000
MOUD_death_pop$nalRt <- (MOUD_death_pop$naltrexone/MOUD_death_pop$pop)* 100000
MOUD_death_pop[1:5,]
MOUD_death_pop[,c("rawDeathRt","methRt","bupreRt","nalRt")] %>% summary()
Below is a plot of calculated MOUDs rate and drug related death rate by state. We highlighted the top 5 states for each value (state methadone rate, state buprenorphine rate, state naltrexone rate, and drug related death rate).
# Prepare the dataframe for bar ploting
# Make the 'state' column as row names 
MOUD_death_pop2 <- MOUD_death_pop[,-1]
rownames(MOUD_death_pop2) <- as.character(MOUD_death_pop[,1])
# Convert datafram to matrix
MOUD_death_pop3 <-as.matrix(sapply(MOUD_death_pop2, as.numeric))
rownames(MOUD_death_pop3) <- as.character(MOUD_death_pop[,1])
# Find the top 5 states in drug related death rate, methadone rate, buprenorphine rate, and naltrexone rate
death_top5 <- MOUD_death_pop %>% arrange(desc(rawDeathRt)) %>% subset(select=state) %>% head(n=5) %>% pull() %>% as.character()
methadone_top5<- MOUD_death_pop %>% arrange(desc(methRt)) %>% subset(select=state) %>% head(n=5) %>% pull() %>% as.character()
buprenorphine_top5<- MOUD_death_pop %>% arrange(desc(bupreRt)) %>% subset(select=state) %>% head(n=5) %>% pull() %>% as.character()
naltrexone_top5<- MOUD_death_pop %>% arrange(desc(nalRt)) %>% subset(select=state) %>% head(n=5) %>% pull() %>% as.character()
# Highlight the top 5 states in drug related death rate, methadone provider rate,
# buprenorphine provider rate, and naltrexone provider rate
state_vector <- MOUD_death_pop['state'] %>% pull() %>% as.character()
death_col <- rep(c("#C3C3C3"),each=48)
death_col[state_vector %in% death_top5] <- "#FF0000"
methadone_col <- rep(c("#C3C3C3"),each=48)
methadone_col[state_vector %in% methadone_top5] <- "#4C00FFFF"
buprenorphine_col <- rep(c("#C3C3C3"),each=48)
buprenorphine_col[state_vector %in% buprenorphine_top5] <- "#00FF4DFF"
naltrexone_col <- rep(c("#C3C3C3"),each=48)
naltrexone_col[state_vector %in% naltrexone_top5] <- "#BDFF00FF"
# Adjust plots arrangement and sizes
options(repr.plot.width=17, repr.plot.height=12)
par(mfrow = c(4,1))
barplot(MOUD_death_pop3[,8], main="Methadone Provider Rate by State",cex.main=2,
  ylab="Rate per 100K",col = methadone_col, ylim=c(0, 1.8), yaxt = "n")
axis(2,at=seq(0, 1.8, by=0.2),labels=TRUE)
barplot(MOUD_death_pop3[,9], main="Buprenorphine Provider Rate by State",cex.main=2,
  ylab="Rate per 100K",col = buprenorphine_col, ylim=c(0, 4.7), yaxt = "n")
axis(2,at=seq(0,4.7, by=1),labels=TRUE)
barplot(MOUD_death_pop3[,10], main="Naltrexone Provider Rate by State",cex.main=2,
  ylab="Rate per 100K",col = naltrexone_col, ylim=c(0, 7), yaxt = "n")
axis(2,at=seq(0,7, by=1),labels=TRUE)
barplot(MOUD_death_pop3[,7], main="Drug Related Death Rate by State",cex.main=2,
  ylab="Rate per 100K",col = death_col, ylim=c(0, 30), yaxt = "n")
axis(2,at=seq(0, 30, by=5),labels=TRUE)
As mentioned above, a MOUDs provider can appear more than once in the dataframe because it can prescribe multiple MOUD categories. Next, we tried to extract unique MOUD providers from original dataset and create a new column called all_meds which includes all the MOUDs that each uique provider can prescribe.
# Some of the provider are able to prescribe more than one category of MOUD
# Extract unique provider and create a new variable for all categories of MOUD that each provider is able to prescribe
moud_uniq <- moud_noduplicate %>% arrange(desc(full_address),desc(category)) %>% group_by(full_address) %>% mutate(all_meds = paste0(category, collapse = ",")) 
moud_uniq <- moud_uniq %>% distinct(full_address, .keep_all=TRUE)
moud_uniq[1:5,c(16,17)]
Below is a summary count table for the variable all_meds.
# The counts of MOUD providers that offer multiple MOUD categories
moud_uniq %>% group_by(all_meds) %>% summarise(n = n()) %>% arrange(desc(n))
We first calcualted MOUDs rate in each category at county level. Below is the shp file that was used to define county boundaries on the map.
# Read the shape fiel of US counties
county_sf <- st_read("/home/qiong/JINK/PXP-9010_JCOIN_MOUD_notebook/OEPS_download_files/counties2018.shp")
# Show first 5 counties from IL in county shapefile 
county_sf %>% filter(STATEFP=="17") %>% head(n=5)
dim(county_sf)
We created a dataframe with MOUDS provider counts in each category at county level, and then merged it with the county shape (shp) dataframe. Notice that there are a lot of counties in the US with no MOUD provider.
# calculate the count of MOUD in each category within each county
methadone_county <- moud_noduplicate %>% filter(category=="methadone") %>% group_by(countyGEOID) %>% tally()
buprenorphine_county <- moud_noduplicate %>% filter(category=="buprenorphine") %>% group_by(countyGEOID) %>% tally()
naltrexone_county <- moud_noduplicate %>% filter(category=="naltrexone/vivitrol") %>% group_by(countyGEOID) %>% tally()
colnames(methadone_county)[2]<-"methadone_count"
colnames(buprenorphine_county)[2]<-"buprenorphine_count"
colnames(naltrexone_county)[2]<-"naltrexone_count"
# merge the count of MOUD providers with the county_sf 
county_sf_moud <- merge(county_sf, methadone_county, by.x='GEOID', by.y="countyGEOID", all.x=TRUE) %>% merge(buprenorphine_county, by.x='GEOID', by.y="countyGEOID", all.x=TRUE) %>% merge(naltrexone_county, by.x='GEOID', by.y="countyGEOID", all.x=TRUE)
county_sf_moud %>% head(n=5)
Below is a summary table of MOUDs counts at county level in the US.
# Print the summary of columns of methadone, buprenorphine, and naltrexone counts 
summary(county_sf_moud[,c(7,8,9)] %>% st_set_geometry(NULL))
We also calcualted the number of counties in the US don't have any MOUDs provider.
# combine county level moud providers 
merged_county_moud <- merge(methadone_county, buprenorphine_county, by='countyGEOID', all=TRUE) %>% merge(naltrexone_county, by='countyGEOID', all=TRUE)
3142-dim(merged_county_moud)[1]
We counted the unique MOUDs providers in each county across the country. We merged this dataframe with another table of drug related death rate at county level, which includes estimated county population. The population column was then used to calcualte the MOUDs rate (unique providers) in each county.
# Clean up the unique moud dataset by removing the MOUD without countyGEOID
# For instance county from Puerto Rico doesn't have county GEOID
moud_uniq_subset <- moud_uniq[!(is.na(moud_uniq$countyGEOID)),]
moud_uniq_county_count <- moud_uniq_subset %>% group_by(countyGEOID)%>% tally()
# Read county level drug-related death rate file
OD_death_c <- read.csv("/home/qiong/JINK/PXP-9010_JCOIN_MOUD_notebook/OEPS_download_files/Health01_C.csv",
                      stringsAsFactors=FALSE, colClasses=c("COUNTYFP"="character", "state.code"="character"))
OD_death_c$state.code <- str_pad(OD_death_c$state.code, 2, pad = "0")
# The Population of each county is an aggrevation over 10 years (09-18)
OD_death_c$pop <- OD_death_c$pop/10
OD_death_c[1:5,]
# Calculate the unique MOUD providers rate per 100K population in each county
OD_death_c_uniq_moud <- merge(OD_death_c, moud_uniq_county_count, by.x='COUNTYFP', by.y='countyGEOID', all.x=TRUE)
colnames(OD_death_c_uniq_moud)[8] <- "unique_moud"
OD_death_c_uniq_moud$moud_rate <- (OD_death_c_uniq_moud$unique_moud/OD_death_c_uniq_moud$pop)*100000
OD_death_c_uniq_moud[!(is.na(OD_death_c_uniq_moud$unique_moud)),] %>% arrange(desc(moud_rate)) %>% head(n=5)
We then plotted the unique MOUD provider rate in each county below. The drug related death rate at county level was also plotted to show the disparities between drug-related death rate and OUD treatment capacity.
# Plot county level moud rate and drug related death rate on map
# merge with county_sf dataframe
county_sf_uniq_moud <- merge(county_sf, OD_death_c_uniq_moud, by.x='GEOID', by.y="COUNTYFP", all.x=TRUE)
# Two states and DC to exclude from the map for display
exclude_states <- c("02","11","15")
county_sf_uniq_moud_subset <- county_sf_uniq_moud[!(county_sf_uniq_moud$STATEFP %in% exclude_states),]
# Read the state shape file for plotting state boundaries
state_sf <- st_read("/home/qiong/JINK/PXP-9010_JCOIN_MOUD_notebook/OEPS_download_files/states2018.shp")
state_sf_subset <- state_sf[!(state_sf$STATEFP %in% exclude_states),]
# Plot county level moud rate and death rate
options(repr.plot.width=17, repr.plot.height=12)
tmap_mode("plot")
tm_moud_unique <- tm_shape(county_sf_uniq_moud_subset) + tm_borders(alpha = 0.3) +
    tm_polygons("moud_rate", style="quantile",pal="BuPu",
              title = "Moud Provider Rate per 100K") +
    tm_shape(state_sf_subset) + tm_borders(lwd=1.5) + tm_text("STUSPS", size=1.5)+
    tm_layout(legend.stack = "horizontal") 
tm_death_rate<-tm_shape(county_sf_uniq_moud_subset) + tm_borders(alpha = 0.3) +
    tm_polygons("rawDeathRt", pal="YlOrRd", style="quantile"
               , title = "Drug Related Death Rate per 100K") +
    tm_shape(state_sf_subset) + tm_borders(lwd = 1.5)  + tm_text("STUSPS", size=1.5) +
    tm_layout(legend.stack = "horizontal")
tmap_arrange(tm_moud_unique, tm_death_rate)
The Appalachian region suffers a higher opioid overdose mortality rate comapred to the rest of the country. Below shows zoomed-in plots of unique MOUD providers rate and drug related death rate of Appalacgian counties on map.
# Download a list of counties that belong to the Appalachian region
appalachian_counties <-read_csv("https://gist.githubusercontent.com/akanik/979b38bfd3afa797ec33944fac7e26d7/raw/491add919d0ba147dc0fcc1616575affe5371e3e/arc-gov-appal-counties.csv")
appalachian_counties <- as.data.frame(appalachian_counties)
appalachian_counties$county[appalachian_counties$county=="De Kalb"] <- "DeKalb"
appalachian_counties$county <- paste(appalachian_counties$county, ",",appalachian_counties$state)
# Generate a translation table between county, state and GEOID
county_fips <- county_sf[,c('STATEFP','GEOID','NAME')] %>% st_set_geometry(NULL)
state_fips <- state_sf[,c('GEOID','NAME')] %>% st_set_geometry(NULL)
colnames(state_fips)[2] <- "state_name" 
county_fips <- merge(county_fips, state_fips, by.x='STATEFP', by.y='GEOID', all.x=TRUE)
county_fips$county<- paste(county_fips$NAME, ",", county_fips$state_name)
# Add GEOID to the Appalachian dataframe
appalachian_counties_fips <- merge(appalachian_counties, county_fips, 
                              by='county', all.x=TRUE)
# Filter the county_sf_uniq_moud_subset based on the Appalachian region
appalachian_county_sf_uniq_moud <- county_sf_uniq_moud_subset[(county_sf_uniq_moud_subset$GEOID %in% appalachian_counties_fips$GEOID),]
# Plot Appalachian counties on map with MOUD rate and drug-related death rate
options(repr.plot.width=17, repr.plot.height=12)
tmap_mode("plot")
tm_moud_unique_app <- tm_shape(appalachian_county_sf_uniq_moud) + tm_borders(alpha = 0.3) +
    tm_polygons("moud_rate", style="quantile",pal="BuPu",
              title = "Moud Provider Rate per 100K") +
    tm_shape(state_sf_subset) + tm_borders(lwd=1.5) + tm_text("STUSPS", size=1.5)+
    tm_layout(legend.outside=TRUE,legend.position=c("right", "bottom")) 
tm_death_rate_app<-tm_shape(appalachian_county_sf_uniq_moud) + tm_borders(alpha = 0.3) +
    tm_polygons("rawDeathRt", pal="YlOrRd", style="quantile"
               , title = "Drug Related Death Rate per 100K") +
    tm_shape(state_sf_subset) + tm_borders(lwd = 1.5)  + tm_text("STUSPS", size=1.5) +
    tm_layout(legend.outside=TRUE,legend.position=c("right", "bottom"))
tmap_arrange(tm_moud_unique_app, tm_death_rate_app)
summary(appalachian_county_sf_uniq_moud[,c("moud_rate", "rawDeathRt")], na.rm=TRUE)
Below is the summary of drug related death rate and unique MOUDs rate at county level. We then used the medium number as threshold to identify the high risk counties in the US.
OD_death_c_uniq_moud[,c("rawDeathRt", "moud_rate")] %>% summary(na.rm=TRUE)
We calculated how many counties in the US were considered as high risk county using the threshold mentioned above.
# The counties with moud_rate as NA means there is no moud provider within the county
county_sf_uniq_moud$moud_rate[is.na(county_sf_uniq_moud$moud_rate)] <- 0
high_risk_counties_sf <- county_sf_uniq_moud[!(is.na(county_sf_uniq_moud$rawDeathRt)), ] %>% filter(moud_rate <=5.4243 & rawDeathRt >=15.37)
high_risk_counties_sf %>% dim()
Below shows how many states have at least one high risk county and the leading states with the highest nunber of high risk counties.
high_risk_counties_sf %>% group_by(state) %>% tally() %>% dim()
high_risk_counties_sf %>% group_by(state) %>% tally() %>% arrange(desc(n)) %>% head(n=5)
We summed up the population of all high risk counties from the same state, and calcualted the high risk population percentage for each state. The table below shows the leading states with highest population percentage affected by the high risk of MOUDs accessibility.
high_risk_pop <- high_risk_counties_sf %>% group_by(state) %>% summarise(hs_pop = sum(pop)) %>% st_set_geometry(NULL)
high_risk_pop$state <- state.abb[match(high_risk_pop$state,state.name)]
high_risk_pop <- merge(high_risk_pop, OD_death_S[,c("state","pop")], by.x="state", by.y="state", all.x=TRUE)
# Calculate the percentage of population in high risk counties
high_risk_pop$hs_percent <- (high_risk_pop$hs_pop/high_risk_pop$pop)*100
# Summary of high risk population percent
summary(high_risk_pop$hs_percent, na.rm=TRUE)
# Show states with highest and low portion of high risk county population
high_risk_pop %>% arrange(desc(hs_percent)) %>% head()
high_risk_pop %>% arrange(hs_percent) %>% head()
We then plotted these high risk counties on the map. Each dot represents a high risk county on the map. The size and the color of the dot coresponds to the drug related death rate of higher risk county.
# Three states to exclude from the map
exclude_states <- c("02","11","15")
high_risk_counties_sf_subset <- high_risk_counties_sf[!(high_risk_counties_sf$state.code %in% exclude_states),]
# Plot Appalachian counties on map with MOUD rate and drug related death rate
options(repr.plot.width=20, repr.plot.height=12)
tmap_mode("plot")
tm_high_risk_death<-tm_shape(high_risk_counties_sf_subset) + tm_borders(alpha = 0.3) +
    tm_bubbles(size="rawDeathRt", alpha=0.8, col = "rawDeathRt", pal="YlOrRd", 
               style="quantile") +
    tm_shape(state_sf_subset) + tm_borders(lwd = 1.5)  + tm_text("STUSPS", size=1.5) +
    tm_layout(legend.stack = "horizontal")
tmap_arrange(tm_high_risk_death)
The Center for Spatial Data Science at UChicago has created an Opioid Environment Toolkit. This toolkit provides an introduction to GIS and spatial analysis for opioid environment applications that allows researchers, analysts, and practitioners to support their communities with better data analytics and visualization services.