--- title: Explore and Download keywords: fastai sidebar: home_sidebar summary: "In this tutorial, the basics of Colabs are introduced and an American Community Survey (ACS) dataset is downloaded." description: "In this tutorial, the basics of Colabs are introduced and an American Community Survey (ACS) dataset is downloaded." ---
This Coding Notebook is the first in a series.
An Interactive version can be found here .
This colab and more can be found at https://github.com/BNIA/colabs
Content covered in previous tutorials will be used in later tutorials.
New code and or information should have explanations and or descriptions attached.
Concepts or code covered in previous tutorials will be used without being explaining in entirety.
If content can not be found in the current tutorial and is not covered in previous tutorials, please let me know.
This notebook has been optimized for Google Colabs ran on a Chrome Browser.
Statements found in the index page on view expressed, responsibility, errors and ommissions, use at risk, and licensing extend throughout the tutorial.
In this notebook, the basics of Colabs are introduced.
Instructions: Read all text and execute all code in order.
How to execute code:
If you would like to see the code you are executing, double click the label 'Run: '. Code is accompanied with brief descriptions inlined.
Try It! Go ahead and try running the cell below. What you will be shown as a result is a flow chart of how this current tutorial may be used.
#@title Run: View User Path
%%html
<img src="https://charleskarpati.com/images/viewuserpath_short.png">
Census data comes in 2 flavors:
1) American Community Survey (ACS)
2) Decienial Census
Census data can come in a variety of levels.
These levels define the specificity of the data.
Ie. Weather a data is reporing on individual communities, or entire cities is contingent on the data granularity.
The data we will be downloading in this tutorial, ACS Data, can be found at the Tract level and no closer.
Aggregating Tracts is the way BNIA calculates some of their yearly community indicators!
Each of the bolded words in the content below are levels that are identifiable through a (READ -> 'Geographic Reference Code') .
For more information on Geographic Reference Codes, refer to the table of contents for the section on that matter.
Run the following code to see how these different levels nest into eachother!
#@title Run: Census Granularities
%%html
<img src="https://charleskarpati.com/images/census_granularities.png">
State, County, and Tract ID's are called Geographic Reference Codes.
This information is crucial to know when accessing data.
In order to successfully pull data, Census State and County Codes must be provided.
The code herin is configured by default to pull data on Baltimore City, MD and its constituent Tracts.
In order to find your State and County code:
Either
A) Click the link: https://geocoding.geo.census.gov/geocoder/geographies/address where upon entering a unique address you can locate state and county codes under the associated values 'Counties' and 'State'
OR
B) Conversly, click https://www.census.gov/geographies/reference-files/time-series/geo/tallies.html
Searching for a dataset is the first step in the data processing pipeline.
In this tutorial we plan on processing ACS data in a programmatic fashion.
This tutorial will not just allow you to search/ explore ACS tables and inspect their contents (attributes), but also to download, format, and clean it!
Despite a table explorer section being provided, it is not suggested you use this approach, but rather, explore available data tables and retrieve their ID's using the dedicated websites provided below:
American Fact Finder may assist you in your data locating and download needs: https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml Fact Finder provides a nice interface to explore available datasets. From Fact Finder you can grab a Table's ID and continue the tutorial. Alternately, from Fact Finder, You can download the data for your community directly via an interface. From there, you may continue the tutorial by loading the downloaded dataset as an external resource, instructions on how to do this are provided further below in this tutorial.
Update : 12/18/2019 " American FactFinder (AFF) will remain as an "archive" system for accessing historical data until spring 2020. " - American Fact Finder Website
The New American Fact Finder : https://data.census.gov/cedsci/
This new website is provided by the Census Org. Within its 'Advanced Search' feature exist all the filtering abilities of the older, depricated, (soon discontinued) American Fact Finder Website. It is still a bit buggy to date and may not apply all filters. Filters include years(can only pick on year at a time), geography(state county tract), topic, surveys and Table ID. The filters you apply are shown at the bottom of the query and submitting the search will yield data tables ready for download as well as table ID's that you may snag for use in this tutorial.
Tutorial Notes:
These tables are created by the census and are pre-compiled views of the data.
ACS Website Notes:
Detailed Tables contain the most detailed cross-tabulations, many of which are published down to block groups. The data are population counts. There are over 20,000 variables in this dataset.
Subject Tables provide an overview of the estimates available in a particular topic. The data are presented as population counts and percentages. There are over 18,000 variables in this dataset.
For more Information (via API) Please Visit
You will need to this all installed and imported in order for anything following it to work
pip install ipywidgets geopandas
import ipywidgets as widgets
from IPython.core.interactiveshell import InteractiveShell
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
import urllib.request as urllib
from urllib.parse import urlencode
import socket
import pandas as pd
import json
import numpy as np
from pandas.io.json import json_normalize
import csv
import geopandas as gpd
import psycopg2,pandas,numpy
from shapely import wkb
from shapely.wkt import loads
import os
import sys
import fiona
import matplotlib.pyplot as plt
import glob
import imageio
You can access Google Drive directories:
You can also import file directly into a temporary folder in the virutal colab enviornment
By default you are positioned in the ./content/ folder.
Please Note: The following section details a programmatic way to access and explore the census data catalogs. It is advised that rather than use this portion of the section of the tutorial, you read the section 'Searching For Data' --> 'Search Advice' above and which provide links to dedicated websites hosted by the census bureaue explicitly for your data exploration needs!
Retrieve and search available ACS datasets through the ACS's table directory.
The table directory contains TableId's and Descriptions for each datatable the ACS provides.
By running the next cell, an interactive searchbox will filter the directory for keywords within the description.
Be sure to grab the TableId once you find a table with a description of interest.
response = urllib.urlopen('https://api.census.gov/data/2017/acs/acs5/groups/')
metaDataTable = json_normalize( json.loads(response.read())['groups'] )
metaDataTable.set_index('name', drop=True, inplace=True)
description = input("Search ACS Table Directory by Keyword: ")
metaDataTable[ metaDataTable['description'].str.contains(description.upper()) ]
Once you a table from the explorer has been picked, you can inspect its column names in the next part.
This will help ensure it has the data you need!
tableId = input("Please enter a Table ID to inspect: ")
url = f'https://api.census.gov/data/2017/acs/acs5/groups/{tableId}.json'
metaDataTable = pd.read_json(url).reset_index(inplace = True, drop=False)
metaDataTable = pd.merge(
json_normalize(data=metaDataTable['variables']),
metaDataTable['index'] , left_index=True, right_index=True )
metaDataTable = metaDataTable[['index', 'concept']].dropna(subset=['concept'])
The Data Structure we recieve is different than the prior table.
Intake and processing is different as a result.
Now lets explore what we got, just like before.
Only difference is that the column names are automatically included in this query.
url = 'https://api.census.gov/data/2017/acs/acs5/subject/variables.json'
data = json.loads(urllib.urlopen(url).read())['variables']
objArr = []
for key, value in data.items():
value['name'] = key
objArr.append(value)
metaDataTable = json_normalize(objArr).set_index('name', drop=True, inplace=True)
metaDataTable = metaDataTable[ ['attributes', 'concept', 'group', 'label', 'limit', 'predicateType' ] ]
concept = input("Search ACS Subject Table Directory by Keyword")
metaDataTable[ metaDataTable['concept'].str.contains(concept.upper(), na=False) ]
Intro
Hopefully, by now you know which datatable you would like to download!
The following Python function will do that for you.
Description: This function returns ACS data given appropriate params.
Purpose: Retrieves ACS data from the web
Services
Input:
Output:
How it works
Before our program retrieve the actual data, it will want the table's metadata.
The Function changes the URL it requests data from depending on if it is an S or B type table the user has requested
Multiple calls for data must be made as a single table may have several hundred columns in them.
Our program not just pulls tract level data but the aggregate for the county.
Finally, we will download the data in two different formats if desired.
If we choose to save the data, we save it with the Table IDs + ColumnNames, and once without the TableIDs.
#@title Run: Class Diagram retrieve_acs_data()
%%html
<img src="https://charleskarpati.com/images/class_diagram_retrieve_acs_data.png">
#@title Run: retrieve_acs_data Flow Chart
%%html
<img src="https://charleskarpati.com/images/flow_chart_retrieve_acs_data.png">
#@title Run: Gannt Chart retrieve_acs_data()
%%html
<img src="https://charleskarpati.com/images/gannt_chart_retrieve_acs_data.png">
#@title Run: Sequence Diagram retrieve_acs_data()
%%html
<img src="https://charleskarpati.com/images/sequence_diagram_retrieve_acs_data.png">
Now use this function to Download the Data!
# Our download function will use Baltimore City's tract, county and state as internal paramters
# Change these values in the cell below using different geographic reference codes will change those parameters
tract = '*'
county = '153' # '059' # 153 '510'
state = '51'
# Specify the download parameters the function will receieve here
tableId = 'B19049' # 'B19001'
year = '17'
saveAcs = True
# state, county, tract, tableId, year, saveOriginal, save
df = retrieve_acs_data(state, county, tract, tableId, year, saveAcs, True)
df.head()