Skip to content

This research project started as a result of spending hundreds of hours on the phone trying to find a home for someone suffering from severe schizophrenia and no income or insurance, and realizing how difficult it was to get a simple list of where Medicaid would pay for their housing.

This fact -- that we couldn't access a comprehensive list of group homes or assisted living facilities for someone in need -- made us realize that we shouldn't gaslight ourselves.

We had failed over and over again to find reliable housing for this person through byzantine government services designed to help people in need: this problem was therefore bigger than the hundreds of hours we'd spent on the phone for a single person.

This meant we could start collecting such a dataset to help others in need find housing, and help policymakers access this valuable data and start increasing access and analyzing health disparity. Further, new machine learning and data science methods could then be applied to build better policy.

This website contains the first public dataset of assisted living facilities in the United States, covering over 44,000 facilities from all 50 states and the District of Columbia.

What is an assisted living facility?

An assisted living facility is sometimes known as a group home. It is a place where someone can live, have access to social supports such as transportation, and receive assistance with activities of daily living. For example, an individual with serious mental illness who lives in a group home may need staff assistance with taking medications, while an elderly person living in an assisted living facility may require help with activities of daily living, such as getting dressed or eating.

Why does this matter?

In the United States, some states allow people who are disabled or have serious mental illness like schizophrenia to receive money for housing via the Medicaid program. However, to find a group home is extremely difficult and can require many phone calls to social workers, hospitals, and various other services. The fact is, until we created this dataset, there was no way someone with mental illness (or their family or care team) could access a list of addresses where they could live with government support. If we care about a social safety net, having data on what this looks like matters.

Here is the dataset (13MB) as a CSV. Here is the github repo, which contains all scripts used to process, clean, and analyze the data, as well as all raw data gathered from state licensing agencies.

The data was collected over the course of summer 2021. The dataset---along with a conceptual replication of previous work and a geospatial analysis of assisted living accessibility---was accepted at NeurIPS 2021 as part of the Machine Learning in Public (MLPH) workshop.

:page_facing_up: Here is the preprint paper.

Here is a brief presentation on the dataset that was given at the NeurIPS 2021 Machine Learning for Public Health workshop:

{{< rawhtml >}} <center><iframe width="560" height="315" src="" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></center> {{< /rawhtml >}}

Below you can see the full dataset embedded as an Excel. If you first click on the dataset, you can then command + f to search for anything. The dataset may take a couple seconds to load. We're currently working on making this embedding less bad, adding better search functionality, and more.

{{< rawhtml >}} <iframe src="" width="1500" height="450" frameborder="0" scrolling="no"></iframe> <br/><br/> {{< /rawhtml >}}

If you have any questions, feel free to reach out to Anton Stengel at:

astengel at princeton dot edu.

Background about dataset

Since assisted living facilities are not federally regulated, there is no standardized definition of what constitutes a facility. We followed the NCAL's 2019 Assisted Living State Regulatory Review to define which licensing types count for each state.

We collected data by

  1. retrieving CSVs and PDFs from state websites
  2. scraping data from state websites
  3. directly contacting state licensing agencies
  4. submitting Freedom of Information Act requests

Additionally, we augmented the dataset with relevent county-level metrics that were retrieved from the Census. Here are the variables for each facility instance. The Percent Filled column specifies what percent of facilities in the dataset contain the relevant variable.

VariableDescriptionPercent Filled
Facility NameThe name of the facility100%
Facility IDThe facility identification number65%
License NumberThe facility license number48%
AddressThe primary physical address of the facility100%
CityThe city of the address98%
StateThe state of the address100%
Zip CodeThe zip code of the address97%
CountyThe county of the address100%
County FIPSThe FIPS code of the county100%
LatitudeLatitude of the address100%
LongitudeLongitude of the address100%
Facility Type PrimaryThe primary licensing type of the facility100%
Facility Type SecondaryThe secondary licensing type of the facility41%
CapacityThe total capacity of the facility in beds86%
Ownership TypeThe ownership structure of the facility27%
LicenseeThe license holder of the facility48%
Phone NumberPrimary phone number associated with facility98%
Email AddressPrimary email address associated with facility35%
Date AccessedDate that the data was retrieved from state licensing agency100%
Total County AL NeedThe computed need-based metric for county of facility100%
County Percent of Population 65 or OlderRetrieved from 2015-2019 ACS data100%
County Median AgeRetrieved from 2015-2019 ACS data100%
County Homeownership RateRetrieved from 2015-2019 ACS data100%
County College Education or Higher RateRetrieved from 2015-2019 ACS data100%
County Percent Black PopulationRetrieved from 2015-2019 ACS data100%
County Median Home Value of Owned HomesRetrieved from 2015-2019 ACS data100%
County Percent Hispanic PopulationRetrieved from 2015-2019 ACS data100%
County Percent of Population 85 or OlderRetrieved from 2015-2019 ACS data100%
County Medan Household IncomeRetrieved from 2015-2019 ACS data100%
County Unemployment RateRetrieved from 2020 ACS data100%
County Less Than High School Diploma RateRetrieved from 2015-2019 ACS data100%
County Percent Whilte PopulationRetrieved from 2015-2019 ACS data100%
County Gender RatioRetrieved from 2015-2019 ACS data100%
County Poverty RateRetrieved from 2015-2019 ACS data100%