United States census block groups converted to singlepart polygons and with water removed, 2013-2017
dataset
posted on 2024-09-12, 20:13authored byDexter H. Locke
Many block groups (and other Census geographies) are stored as multipart polygons. In other words, a single row in the attribute table corresponds to multiple discrete, non-overlapping polygons in the Geographic Information System (GIS). One concern with multipart polygons is that the area of overlap with another spatial dataset could be distorted because the area of a multipart polygon corresponds to the sum of all sub-polygons combined, which might not necessarily be the ones that overlap target features like drive time polygons, neighborhood files, or other districts. One area is attributed to all of the smaller discrete, non-overlapping pieces. A second concern with multipart polygons is how it affects spatially-explicit socioeconomic and demographic analyses because within a single block group polygon there are non-residential areas, where people do not live, such as areas of water. Therefore, the precision and realism of Census geographies can be improved by erasing out the water area. These data ameliorate both the multipart and water issues. This data publication provides a singlepart polygon of United States census block groups with the water removed for the entire United States between 2013 and 2017. Data are provided as both an Esri shapefile and a geopackage and include: GEOID, state name, county name, area of census block group, area of the single part polygon (or the particular row), and the proportion the single part polygon is of the larger multipart. Also included in an R Markdown file with the script used to complete this conversion. The purpose of these data is to improve estimates derived from overlaying Census data with other polygons. The result of rectifying both the multipart polygon issue and the water issue is an improved analysis-ready set of polygons. It was computationally intensive to do this - which might explain why this isn't widely available already, and why making this available to others might be of value. Further development is encouraged! The script should work with only minor modification for other Census geographies and/or other years of data. For example: the script currently runs batches of counties per state. This is a prime example of a so called "embarrassingly parallel" problem. Each loop iteration is independent of the results before or after, so modification to take advantage of additional cores would decrease the time needed to re-run this process again.
These data were collected using funding from the U.S. Government and can be used without additional permissions or fees. If you use these data in a publication, presentation, or other research product please use the following citation:
Locke, Dexter H. 2022. United States census block groups converted to singlepart polygons and with water removed, 2013-2017. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2022-0054