--- title: Geo-Data Intake and Operations keywords: fastai sidebar: home_sidebar summary: "This notebook was made to demonstrate how to work with geographic data." description: "This notebook was made to demonstrate how to work with geographic data." ---
{% raw %}
/usr/local/lib/python3.6/dist-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
/content/drive/My Drive/dataplay/dataplay/acsDownload.py:27: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
  pd.set_option('display.max_colwidth', -1)
{% endraw %}

This Coding Notebook is the third in a series.

An Interactive version can be found here Open In Colab.

This colab and more can be found on our webpage.

  • Content covered in previous tutorials will be used in later tutorials.

  • New code and or information should have explanations and or descriptions attached.

  • Concepts or code covered in previous tutorials will be used without being explaining in entirety.

  • The Dataplay Handbook development techniques covered in the Datalabs Guidebook

  • If content can not be found in the current tutorial and is not covered in previous tutorials, please let me know.

  • This notebook has been optimized for Google Colabs ran on a Chrome Browser.

  • Statements found in the index page on view expressed, responsibility, errors and ommissions, use at risk, and licensing extend throughout the tutorial.

About this Tutorial:

Whats Inside?

The Tutorial

In this notebook, the basics of working with geographic data are introduced.

  • Reading in data (points/ geoms) -- Convert lat/lng columns to point coordinates -- Geocoding address to coordinates -- Changing coordinate reference systems -- Connecting to PostGisDB's
  • Basic Operations
  • Saving shape data
  • Get Polygon Centroids
  • Working with Points and Polygons -- Map Points and Polygons -- Get Points in Polygons -- Create Choropleths -- Create Heatmaps (KDE?)

Objectives

By the end of this tutorial users should have an understanding of:

  • How to read in and process geo-data asa geo-dataframe.
  • The Coordinate Reference System and Coordinate Encoding
  • Basic geo-visualization strategies

Background

Datatypes and Geo-data

Geographic data must be encoded properly order to attain the full potential of the spatial nature of your geographic data.

If you have read in a dataset using pandas it's data type will be a Dataframe.

It may be converted into a Geo-Dataframe using Geopandas as demonstrated in the sections below.

You can check a variables at any time using the dtype command:

yourGeoDataframe.dtype

Coordinate Reference Systems (CRS)

Make sure the appropriate spatial Coordinate Reference System (CRS) is used when reading in your data!

ala wiki:

A spatial reference system (SRS) or coordinate reference system (CRS) is a coordinate-based local, regional or global system used to locate geographical entities

CRS 4326 is the CRS most people are familar with when refering to latiude and longitudes.

Baltimore's 4326 CRS should be at (39.2, -76.6)

BNIA uses CRS 2248 internally Additional Information: https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

Ensure your geodataframes' coordinates are using the same CRS using the geopandas command:

yourGeoDataframe.CRS

Coordinate Encoding

When first recieving a spatial dataset, the spatial column may need to be encoded to convert its 'text' data type values into understood 'coordinate' data types before it can be understood/processed accordingly.

Namely, there are two ways to encode text into coordinates:

  • df[geom] = df[geom].apply(lambda x: loads( str(x) ))
  • df[geom] = [Point(xy) for xy in zip(df.x, df.y)]

The first approach can be used for text taking the form "Point(-76, 39)" and will encode the text too coordinates. The second approach is useful when creating a point from two columns containing lat/lng information and will create Point coordinates from the two columns.

More on this later

Raster Vs Vector Data

There exists two types of Geospatial Data, Raster and Vector. Both have different file formats.

This lab will only cover vector data.

Vector Data

Vector Data: Individual points stored as (x,y) coordinates pairs. These points can be joined to create lines or polygons.

Format of Vector data

Esri Shapefile — .shp, .dbf, .shx Description - Industry standard, most widely used. The three files listed above are needed to make a shapefile. Additional file formats may be included.

Geographic JavaScript Object Notation — .geojson, .json Description — Second most popular, Geojson is typically used in web-based mapping used by storing the coordinates as JSON.

Geography Markup Language — .gml Description — Similar to Geojson, GML has more data for the same amount of information.

Google Keyhole Markup Language — .kml, .kmz Description — XML-based and predominantly used for google earth. KMZ is a the newer, zipped version of KML.

Raster Data

Raster Data: Cell-based data where each cell represent geographic information. An Aerial photograph is one such example where each pixel has a color value

Raster Data Files: GeoTIFF — .tif, .tiff, .ovr ERDAS Imagine — .img IDRISI Raster — .rst, .rdc

Information Sourced From: https://towardsdatascience.com/getting-started-with-geospatial-works-1f7b47955438

Vector Data: Census Geographic Data:

Guided Walkthrough

SETUP:

Import Modules

{% raw %}
# @title Run: Install Modules
{% endraw %} {% raw %}
%%capture
! pip install -U -q PyDrive
! pip install geopy
! pip install geopandas
! pip install geoplot
! pip install dexplot
! pip install dataplay
{% endraw %} {% raw %}
{% endraw %}

Configure Enviornment

{% raw %}
# This will just beautify the output

pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# pd.set_option('display.expand_frame_repr', False)
# pd.set_option('display.precision', 2)
# pd.reset_option('max_colwidth')
pd.set_option('max_colwidth', 20)
# pd.reset_option('max_colwidth')
{% endraw %}

(Optional) Local File Access

{% raw %}
# (Optional) Run this cell to gain access to Google Drive (Colabs only) 
from google.colab import drive

# Colabs operates in a virtualized enviornment
# Colabs default directory is at ~/content.
# We mount Drive into a temporary folder at '~/content/drive' 

drive.mount('/content/drive')
{% endraw %} {% raw %}
cd drive/'My Drive'/colabs/DATA
{% endraw %} {% raw %}
ls
{% endraw %}

File Access Conveince Functions

{% raw %}
# Find Relative Path to Files
def findFile(root, file):
    for d, subD, f in os.walk(root):
        if file in f:
            return "{1}/{0}".format(file, d)
            break 

# To 'import' a script you wrote, map its filepath into the sys
def addPath(root, file): sys.path.append(os.path.abspath( findFile( './', file) ))
{% endraw %}

Retrieve GIS Data

Approach 1: Reading in Data Directly

If you are using Geopandas, Direct imports only work with geojson and shape files

{% raw %}
# A Dataset taken from the public database provided by BNIAJFI hosted by Esro / ArcGIS
# BNIA ArcGIS Homepage: https://data-bniajfi.opendata.arcgis.com/
csa_gdf = gpd.read_file("https://opendata.arcgis.com/datasets/b738a8587b6d479a8824d937892701d8_0.geojson");
{% endraw %} {% raw %}
csa_gdf.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb39640860>
{% endraw %}

Approach 2: Converting Pandas into Geopandas

Approach 2: Example 1

This approach loads a map using a geometry column

{% raw %}
# The attributes are what we will use.
in_crs = 2248 # The CRS we recieve our data 
out_crs = 4326 # The CRS we would like to have our data represented as
geom = 'geometry' # The column where our spatial information lives.
{% endraw %} {% raw %}
# A url to a public Dataset
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv'

# Read in the dataframe
tract_df = pd.read_csv(url)

# Convert the geometry column datatype from a string of text into a coordinate datatype
tract_df[geom] = tract_df[geom].apply(lambda x: loads( str(x) ))

# Process the dataframe as a geodataframe with a known CRS and geom column
tract_gdf = GeoDataFrame(tract_df, crs=in_crs, geometry=geom)
{% endraw %} {% raw %}
tract_gdf.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb36b10470>
{% endraw %} {% raw %}
tract_gdf.head()
TRACTCE10 GEOID10 NAME10 CSA Tract geometry
0 151000 24510151000 1510.0 Dorchester/Ashbu... 1510 POLYGON ((-76.67...
1 80700 24510080700 807.0 Greenmount East 807 POLYGON ((-76.58...
2 80500 24510080500 805.0 Clifton-Berea 805 POLYGON ((-76.58...
3 150500 24510150500 1505.0 Greater Mondawmin 1505 POLYGON ((-76.65...
4 120100 24510120100 1201.0 North Baltimore/... 1201 POLYGON ((-76.60...
{% endraw %}

Approach 2: Example 2: BANKS

This example is using data constructed at the end of Tutorial 1.

Be sure to access the menu in the left drawer, hit the 'Files' tab and upload it.

{% raw %}
# Primary Table
# Description: I created a public dataset from a google xlsx sheet 'Bank Addresses and Census Tract' from a workbook of the same name.
# Table: FDIC Baltimore Banks
# Columns: Bank Name, Address(es), Census Tract
left_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTViIZu-hbvhM3L7dIRAG95ISa7TNhUwdzlYxYzc1ygJoaYc3_scaXHe8Rtj5iwNA/pub?gid=1078028768&single=true&output=csv'
left_col = 'Census Tract'

# Alternate Primary Table
# Description: Same workbook, different Sheet: 'Branches per tract' 
# Columns: Census Tract, Number branches per tract
# left_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSHFrRSHva1f82ZQ7Uxwf3A1phqljj1oa2duGlZDM1vLtrm1GI5yHmpVX2ilTfMHQ/pub?gid=1698745725&single=true&output=csv'
# lef_col = 'Number branches per tract'

# Crosswalk Table
# Table: Crosswalk Census Communities
# 'TRACT2010', 'GEOID2010', 'CSA2010'
crosswalk_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv'
use_crosswalk = True
crosswalk_left_col = 'TRACT2010'
crosswalk_right_col = 'GEOID2010'

# Secondary Table
# Table: Baltimore Boundaries
# 'TRACTCE10', 'GEOID10', 'CSA', 'NAME10', 'Tract', 'geometry'
right_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv'
right_col ='GEOID10'

merge_how = 'geometry'
interactive = True
merge_how = 'outer'

banksPd = mergeDatasets( left_ds=left_ds, left_col=left_col, 
              use_crosswalk=use_crosswalk, crosswalk_ds=crosswalk_ds,
              crosswalk_left_col = crosswalk_left_col, crosswalk_right_col = crosswalk_right_col,
              right_ds=right_ds, right_col=right_col, 
              merge_how=merge_how, interactive = interactive )
 Handling Left Dataset
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vTViIZu-hbvhM3L7dIRAG95ISa7TNhUwdzlYxYzc1ygJoaYc3_scaXHe8Rtj5iwNA/pub?gid=1078028768&single=true&output=csv
checkDataSetExists False
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vTViIZu-hbvhM3L7dIRAG95ISa7TNhUwdzlYxYzc1ygJoaYc3_scaXHe8Rtj5iwNA/pub?gid=1078028768&single=true&output=csv
checkDataSetExists True
checkDataSetExists True
checkDataSetExists True
Left Dataset and Columns are Valid

 Handling Right Dataset
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv
checkDataSetExists False
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv
checkDataSetExists True
checkDataSetExists True
checkDataSetExists True
Right Dataset and Columns are Valid

 Checking the merge_how Parameter
merge_how operator is Valid outer
checkDataSetExists False

 Checking the Crosswalk Parameter

 Handling Crosswalk Left Dataset Loading
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv
checkDataSetExists False
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv
checkDataSetExists True
checkDataSetExists True
checkDataSetExists True

 Handling Crosswalk Right Dataset Loading
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv
checkDataSetExists False
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv
checkDataSetExists True
checkDataSetExists True
checkDataSetExists True

 Assessment Completed

 Ensuring Left->Crosswalk compatability
Converting Local Key from float64 to Int

 Ensuring Crosswalk->Right compatability
PERFORMING MERGE LEFT->CROSSWALK
left_on TRACT2010 right_on GEOID2010 how outer

 Local Column Values Not Matched 
[-1321321321321325            400100            401101            401102
            401507            403401            403500            403803
            411306            411406            411408            420100
            420301            420701            420800            430800
            430900            440100            440200            440702
            441101            450300            452000            490601
            490602            491100            491201            491600
            492300            750101]
43

 Crosswalk Unique Column Values
[ 10100  10200  10300  10400  10500  20100  20200  20300  30100  30200
  40100  40200  60100  60200  60300  60400  70100  70200  70300  70400
  80101  80102  80200  80301  80302  80400  80500  80600  80700  80800
  90100  90200  90300  90400  90500  90600  90700  90800  90900 100100
 100200 100300 110100 110200 120100 120201 120202 120300 120400 120500
 120600 120700 130100 130200 130300 130400 130600 130700 130803 130804
 130805 130806 140100 140200 140300 150100 150200 150300 150400 150500
 150600 150701 150702 150800 150900 151000 151100 151200 151300 160100
 160200 160300 160400 160500 160600 160700 160801 160802 170100 170200
 170300 180100 180200 180300 190100 190200 190300 200100 200200 200300
 200400 200500 200600 200701 200702 200800 210100 210200 220100 230100
 230200 230300 240100 240200 240300 240400 250101 250102 250103 250203
 250204 250205 250206 250207 250301 250303 250401 250402 250500 250600
 260101 260102 260201 260202 260203 260301 260302 260303 260401 260402
 260403 260404 260501 260604 260605 260700 260800 260900 261000 261100
 270101 270102 270200 270301 270302 270401 270402 270501 270502 270600
 270701 270702 270703 270801 270802 270803 270804 270805 270901 270902
 270903 271001 271002 271101 271102 271200 271300 271400 271501 271503
 271600 271700 271801 271802 271900 272003 272004 272005 272006 272007
 280101 280102 280200 280301 280302 280401 280402 280403 280404 280500
  10000]
PERFORMING MERGE LEFT->RIGHT
left_col GEOID2010 right_col GEOID10 how outer
/usr/local/lib/python3.6/dist-packages/pandas/core/ops/array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)
{% endraw %} {% raw %}
# The attributes are what we will use.
in_crs = 2248 # The CRS we recieve our data 
out_crs = 4326 # The CRS we would like to have our data represented as
geom = 'geometry' # The column where our spatial information lives.

# Description: This was created in the previous tutorial. 
# Description: It can be accessed via accessing a google drive, or by using the upload file feature in the left hand menu. 
# Columns: Bank Information with TRACT, CSA, and GEOMETRY columns.

# To create this dataset I had to commit a full outer join in the previous tutorial. 
# In this way geometries will be included even if there merge does not have a direct match. 
# What this will do is that it means at least one (near) empty record for each community will exist that includes (at minimum) the geographic information and name of a Community.
# That way if no point level information existed in the community, that during the merge the geoboundaries are still carried over.

# If a user wanted to create a heatmap of this data, they would first have to perform an aggregation of their columns onto unique geometry columns.
# It would be the aggregate of of a column that gets colorized on the heatmap. 
# Aggregation operations can easily be performed using a pivot table in XL. 
# I hope to embed support for this functionality in the future. 
# Heatmaps are covered in the next tutorial. 
# Pre-Aggregated information is required to continue on to the next tutorial.
#url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTViIZu-hbvhM3L7dIRAG95ISa7TNhUwdzlYxYzc1ygJoaYc3_scaXHe8Rtj5iwNA/pub?gid=1078028768&single=true&output=csv'

# Read in the dataframe
#df = pd.read_csv(url)

# Convert the geometry column datatype from a string of text into a coordinate datatype
banksPd[geom] = banksPd[geom].apply(lambda x: loads( str(x) ))

# Process the dataframe as a geodataframe with a known CRS and geom column
banksGdf = GeoDataFrame(banksPd, crs=in_crs, geometry=geom)
{% endraw %} {% raw %}
banksGdf.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb3663c908>
{% endraw %} {% raw %}
banksGdf.head()
Bank Name Address(es) Census Tract GEOID2010 TRACTCE10 GEOID10 NAME10 CSA Tract geometry
0 Arundel Federal ... 333 E. Patapsco ... 250401.0 2.45e+10 250401 24510250401 2504.01 Brooklyn/Curtis ... 2504 POLYGON ((-76.59...
1 NaN 3601 S Hanover St 250401.0 2.45e+10 250401 24510250401 2504.01 Brooklyn/Curtis ... 2504 POLYGON ((-76.59...
2 Bank of America,... 20 N Howard St 40100.0 2.45e+10 40100 24510040100 401.00 Downtown/Seton Hill 401 POLYGON ((-76.61...
3 NaN 100 S Charles St... 40100.0 2.45e+10 40100 24510040100 401.00 Downtown/Seton Hill 401 POLYGON ((-76.61...
4 Branch Banking a... 2 N CHARLES ST 40100.0 2.45e+10 40100 24510040100 401.00 Downtown/Seton Hill 401 POLYGON ((-76.61...
{% endraw %}

Lets aggregate by banks by CSA just for fun, huh?

{% raw %}
banksGdf['CSA'].value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True).head()
Downtown/Seton Hill                     16
Southwest Baltimore                      8
Oldtown/Middle East                      8
Medfield/Hampden/Woodberry/Remington     8
Fells Point                              7
Name: CSA, dtype: int64
{% endraw %}

Thats interesting. Lets see if we can map that!

{% raw %}
# Aggregate dataframe by CSA
banksGdf['banksCount'] = 1
banksCount = banksGdf.groupby('CSA').sum(numeric_only=True) 
banksCount = banksCount.reset_index()
banksCount.head()
CSA Census Tract GEOID2010 TRACTCE10 GEOID10 NAME10 Tract banksCount
0 Allendale/Irving... 0.0 0.00e+00 1333309 147061333309 13333.09 13333 6
1 Beechfield/Ten H... 560802.0 4.90e+10 1091306 98041091306 10913.06 10913 4
2 Belair-Edison 0.0 0.00e+00 680806 98040680806 6808.06 6808 4
3 Brooklyn/Curtis ... 500802.0 4.90e+10 1252304 122551252304 12523.04 12523 5
4 Canton 20500.0 4.90e+10 30800 73530030800 308.00 308 3
{% endraw %}

So now we have the count in a dataframe..

{% raw %}
# A url to a public Dataset
csa_gdf = gpd.read_file("https://opendata.arcgis.com/datasets/b738a8587b6d479a8824d937892701d8_0.geojson");
csa_gdf.head()
csa_gdf.columns
OBJECTID CSA2010 tpop10 male10 female10 paa17 pwhite17 pasi17 p2more17 ppac17 phisp17 racdiv17 age5_17 age18_17 age24_17 age64_17 age65_17 hhs10 femhhs17 fam17 hhsize10 mhhi17 hh25inc17 hh40inc17 hh60inc17 hh75inc17 hhm7517 hhpov17 hhchpov17 Shape__Area Shape__Length geometry
0 1 Allendale/Irving... 16726 7657 9069 90.28 6.53 0.11 1.00 0.00 2.06 20.12 6.61 17.37 9.00 53.29 13.72 6098 71.13 35.20 2.64 39495.63 32.99 17.72 19.91 11.95 17.43 20.70 32.77 6.38e+07 38770.17 POLYGON ((-76.65...
1 2 Beechfield/Ten H... 13391 5985 7406 75.32 18.86 0.42 3.31 0.31 1.78 41.02 7.93 14.58 9.71 55.37 12.42 5076 55.19 26.14 2.40 57572.50 20.42 13.90 18.18 10.87 36.64 10.47 23.92 4.79e+07 37524.95 POLYGON ((-76.69...
2 3 Belair-Edison 17380 7297 10083 85.65 10.03 0.57 1.70 0.81 1.24 27.26 5.42 22.81 8.41 54.45 8.91 6174 77.53 38.27 2.90 39624.48 34.10 16.28 20.07 8.11 21.45 20.27 34.56 4.50e+07 31307.31 POLYGON ((-76.56...
3 4 Brooklyn/Curtis ... 12900 5746 7154 37.96 39.68 2.53 3.61 1.31 14.91 73.93 10.91 16.09 8.25 57.45 7.30 5204 43.39 32.32 2.61 40275.28 31.40 18.32 18.52 9.08 22.68 24.21 46.41 1.76e+08 150987.70 MULTIPOLYGON (((...
4 5 Canton 8326 4094 4232 3.94 85.58 4.38 2.45 0.56 3.09 26.31 5.25 3.12 5.85 75.25 10.53 4310 10.55 11.03 1.86 111891.25 7.41 7.82 9.18 6.43 69.16 3.66 4.02 1.54e+07 23338.61 POLYGON ((-76.57...
Index(['OBJECTID', 'CSA2010', 'tpop10', 'male10', 'female10', 'paa17',
       'pwhite17', 'pasi17', 'p2more17', 'ppac17', 'phisp17', 'racdiv17',
       'age5_17', 'age18_17', 'age24_17', 'age64_17', 'age65_17', 'hhs10',
       'femhhs17', 'fam17', 'hhsize10', 'mhhi17', 'hh25inc17', 'hh40inc17',
       'hh60inc17', 'hh75inc17', 'hhm7517', 'hhpov17', 'hhchpov17',
       'Shape__Area', 'Shape__Length', 'geometry'],
      dtype='object')
{% endraw %} {% raw %}
# merge it to our banks dataset
merged_banks = pd.merge(banksCount, csa_gdf, left_on='CSA', right_on='CSA2010', how='left')
merged_banks.head()
CSA Census Tract GEOID2010 TRACTCE10 GEOID10 NAME10 Tract banksCount OBJECTID CSA2010 tpop10 male10 female10 paa17 pwhite17 pasi17 p2more17 ppac17 phisp17 racdiv17 age5_17 age18_17 age24_17 age64_17 age65_17 hhs10 femhhs17 fam17 hhsize10 mhhi17 hh25inc17 hh40inc17 hh60inc17 hh75inc17 hhm7517 hhpov17 hhchpov17 Shape__Area Shape__Length geometry
0 Allendale/Irving... 0.0 0.00e+00 1333309 147061333309 13333.09 13333 6 1.0 Allendale/Irving... 16726.0 7657.0 9069.0 90.28 6.53 0.11 1.00 0.00 2.06 20.12 6.61 17.37 9.00 53.29 13.72 6098.0 71.13 35.20 2.64 39495.63 32.99 17.72 19.91 11.95 17.43 20.70 32.77 6.38e+07 38770.17 POLYGON ((-76.65...
1 Beechfield/Ten H... 560802.0 4.90e+10 1091306 98041091306 10913.06 10913 4 2.0 Beechfield/Ten H... 13391.0 5985.0 7406.0 75.32 18.86 0.42 3.31 0.31 1.78 41.02 7.93 14.58 9.71 55.37 12.42 5076.0 55.19 26.14 2.40 57572.50 20.42 13.90 18.18 10.87 36.64 10.47 23.92 4.79e+07 37524.95 POLYGON ((-76.69...
2 Belair-Edison 0.0 0.00e+00 680806 98040680806 6808.06 6808 4 3.0 Belair-Edison 17380.0 7297.0 10083.0 85.65 10.03 0.57 1.70 0.81 1.24 27.26 5.42 22.81 8.41 54.45 8.91 6174.0 77.53 38.27 2.90 39624.48 34.10 16.28 20.07 8.11 21.45 20.27 34.56 4.50e+07 31307.31 POLYGON ((-76.56...
3 Brooklyn/Curtis ... 500802.0 4.90e+10 1252304 122551252304 12523.04 12523 5 4.0 Brooklyn/Curtis ... 12900.0 5746.0 7154.0 37.96 39.68 2.53 3.61 1.31 14.91 73.93 10.91 16.09 8.25 57.45 7.30 5204.0 43.39 32.32 2.61 40275.28 31.40 18.32 18.52 9.08 22.68 24.21 46.41 1.76e+08 150987.70 MULTIPOLYGON (((...
4 Canton 20500.0 4.90e+10 30800 73530030800 308.00 308 3 5.0 Canton 8326.0 4094.0 4232.0 3.94 85.58 4.38 2.45 0.56 3.09 26.31 5.25 3.12 5.85 75.25 10.53 4310.0 10.55 11.03 1.86 111891.25 7.41 7.82 9.18 6.43 69.16 3.66 4.02 1.54e+07 23338.61 POLYGON ((-76.57...
{% endraw %} {% raw %}
# Lets check what datatype our geometry column is before we try to convert it!
merged_banks.geometry.dtype
<geopandas.array.GeometryDtype at 0x7fdb49c56320>
{% endraw %} {% raw %}
# Process the dataframe as a geodataframe with a known CRS and geom column
# Since the geometry column is already being interpreted as a geopandas dtype,
# we should readily be able to convert the dataframe without fuss.
banksCountgdf = GeoDataFrame(merged_banks, crs=2248, geometry='geometry')
{% endraw %} {% raw %}
# If you'd like, drop duplicate columns like so.
# merged_df = merged_df.drop('CSA', axis=1)
{% endraw %} {% raw %}
# In order for this choropleth to work, the total number of banks in each csa must be tallied.
# This can be done programmatically, but i havent added the code. 
# The column needs to be changed from CSA to whatever this new tallied column is named.
banksCountgdf.plot( column='banksCount', legend=True)
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb4dda9160>
{% endraw %} {% raw %}
dxp.bar(x='CSA2010', y='banksCount', data=banksCountgdf)
findfont: Font family ['Helvetica'] not found. Falling back to DejaVu Sans.
{% endraw %}

Go ahead and update the form and run the cell to explore the data visually!

{% raw %}
tempBanksNoGeom = banksCountgdf.drop('geometry', axis=1)
{% endraw %} {% raw %}
#@title Form fields to view spatial data
#@markdown Forms support many types of fields.

legendOn = "True"  #@param ['True', 'False']
displayCol = "banksCount" #@param ["banksCount", 'tpop10', 'male10', 'female10', 'paa17', 'pwhite17', 'pasi17', 'p2more17', 'ppac17', 'phisp17', 'racdiv17', 'age5_17', 'age18_17', 'age24_17', 'age64_17', 'age65_17', 'hhs10', 'femhhs17', 'fam17', 'hhsize10', 'mhhi17', 'hh25inc17', 'hh40inc17', 'hh60inc17', 'hh75inc17', 'hhm7517', 'hhpov17', 'hhchpov17"] {allow-input: true}
#@markdown ---

banksCountgdf.plot( column=displayCol, legend=legendOn)
dxp.bar(x='CSA2010', 
        y=displayCol, 
        data=banksCountgdf)
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb3625ec88>
{% endraw %} {% raw %}
banksCountgdf.dropna().CSA2010.values
array(['Allendale/Irvington/S. Hilton', 'Beechfield/Ten Hills/West Hills',
       'Belair-Edison', 'Brooklyn/Curtis Bay/Hawkins Point', 'Canton',
       'Cedonia/Frankford', 'Cherry Hill', 'Chinquapin Park/Belvedere',
       'Claremont/Armistead', 'Clifton-Berea', 'Cross-Country/Cheswolde',
       'Dickeyville/Franklintown', 'Dorchester/Ashburton',
       'Downtown/Seton Hill', 'Edmondson Village', 'Fells Point',
       'Forest Park/Walbrook', 'Glen-Fallstaff',
       'Greater Charles Village/Barclay', 'Greater Govans',
       'Greater Mondawmin', 'Greater Roland Park/Poplar Hill',
       'Greater Rosemont', 'Greenmount East', 'Hamilton',
       'Harbor East/Little Italy', 'Harford/Echodale', 'Highlandtown',
       'Howard Park/West Arlington', 'Inner Harbor/Federal Hill',
       'Lauraville', 'Loch Raven', 'Madison/East End',
       'Medfield/Hampden/Woodberry/Remington', 'Midtown',
       'Midway/Coldstream', 'Morrell Park/Violetville',
       'Mount Washington/Coldspring', 'North Baltimore/Guilford/Homeland',
       'Northwood', 'Oldtown/Middle East',
       'Orangeville/East Highlandtown', 'Patterson Park North & East',
       'Penn North/Reservoir Hill', 'Pimlico/Arlington/Hilltop',
       'Poppleton/The Terraces/Hollins Market',
       'Sandtown-Winchester/Harlem Park', 'South Baltimore',
       'Southeastern', 'Southern Park Heights', 'Southwest Baltimore',
       'The Waverlies', 'Upton/Druid Heights',
       'Washington Village/Pigtown', 'Westport/Mount Winans/Lakeland'],
      dtype=object)
{% endraw %} {% raw %}
#@title Now lets compare some indicators
#@markdown Forms support many types of fields.

legendOn = "True"  #@param ['True', 'False']
displayColx = "banksCount" #@param ["banksCount", 'tpop10', 'male10', 'female10', 'paa17', 'pwhite17', 'pasi17', 'p2more17', 'ppac17', 'phisp17', 'racdiv17', 'age5_17', 'age18_17', 'age24_17', 'age64_17', 'age65_17', 'hhs10', 'femhhs17', 'fam17', 'hhsize10', 'mhhi17', 'hh25inc17', 'hh40inc17', 'hh60inc17', 'hh75inc17', 'hhm7517', 'hhpov17', 'hhchpov17"] {allow-input: true}
displayColy = "female10" #@param ["banksCount", 'tpop10', 'male10', 'female10', 'paa17', 'pwhite17', 'pasi17', 'p2more17', 'ppac17', 'phisp17', 'racdiv17', 'age5_17', 'age18_17', 'age24_17', 'age64_17', 'age65_17', 'hhs10', 'femhhs17', 'fam17', 'hhsize10', 'mhhi17', 'hh25inc17', 'hh40inc17', 'hh60inc17', 'hh75inc17', 'hhm7517', 'hhpov17', 'hhchpov17"] {allow-input: true}
# displayColm = "Westport/Mount Winans/Lakeland" #@param ['Allendale/Irvington/S. Hilton', 'Beechfield/Ten Hills/West Hills', 'Belair-Edison', 'Brooklyn/Curtis Bay/Hawkins Point', 'Canton', 'Cedonia/Frankford', 'Cherry Hill', 'Chinquapin Park/Belvedere', 'Claremont/Armistead', 'Clifton-Berea', 'Cross-Country/Cheswolde', 'Dickeyville/Franklintown', 'Dorchester/Ashburton', 'Downtown/Seton Hill', 'Edmondson Village', 'Fells Point', 'Forest Park/Walbrook', 'Glen-Fallstaff', 'Greater Charles Village/Barclay', 'Greater Govans', 'Greater Mondawmin', 'Greater Roland Park/Poplar Hill', 'Greater Rosemont', 'Greenmount East', 'Hamilton', 'Harbor East/Little Italy', 'Harford/Echodale', 'Highlandtown', 'Howard Park/West Arlington', 'Inner Harbor/Federal Hill', 'Lauraville', 'Loch Raven', 'Madison/East End', 'Medfield/Hampden/Woodberry/Remington', 'Midtown', 'Midway/Coldstream', 'Morrell Park/Violetville', 'Mount Washington/Coldspring', 'North Baltimore/Guilford/Homeland', 'Northwood', 'Oldtown/Middle East', 'Orangeville/East Highlandtown', 'Patterson Park North & East', 'Penn North/Reservoir Hill', 'Pimlico/Arlington/Hilltop', 'Poppleton/The Terraces/Hollins Market', 'Sandtown-Winchester/Harlem Park', 'South Baltimore', 'Southeastern', 'Southern Park Heights', 'Southwest Baltimore', 'The Waverlies', 'Upton/Druid Heights', 'Washington Village/Pigtown', 'Westport/Mount Winans/Lakeland'] {allow-input: true}
#@markdown ---

# print('Comparing', displayColm, "'s Indicators ", displayColx, " and " , displayColy )
dxp.kde(x=displayColx, y=displayColy, data= banksCountgdf.dropna() )
{% endraw %} {% raw %}
# dxp.kde(x=displayColx, y=displayColy, data= banksCountgdf[ banksCountgdf.CSA2010 != displayColm ].dropna() )
{% endraw %}

Approach 3: Method 1: Convert Column(s) to Coordinate

Later in this tutorial we will show how you may find geometric bounds that coorespond with a points location ( points in polygons ). This can be a useful trick when we we want to create a heatmap of point data with specified boundaries.

If a user wanted to create a heatmap of this data...

they would first have to perform an aggregation of their columns onto unique geometry columns

Possible Path: (points in polygons -> crosswalk -> merge GeoJson).

It would be the aggregate of of a column that gets colorized on the heatmap.

Aggregation operations can easily be performed using a pivot table in XL.

I hope to embed support for this functionality in the future.

Heatmaps are covered in the next tutorial.

Pre-Aggregated information is required to continue onto the next tutorial.

Approach 3: Method 1: Example 1:

This is the generic example but it wont work since no URL is given.

{% raw %}
# More Information: https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html#from-longitudes-and-latitudes

# If your data has coordinates in two columns run this cell
# It will create a geometry column from the two.
# A public dataset is not provided for this example and will not run.

# Load DF HERE. Accidently deleted the link. Need to refind. 
# Just rely on example 2 for now. 
exe_df = ''

exe_df['x'] = pd.to_numeric(exe_df['x'], errors='coerce')
exe_df['y'] = pd.to_numeric(exe_df['y'], errors='coerce')
# exe_df = exe_df.replace(np.nan, 0, regex=True)

# An example of loading in an internal BNIA file
crs = {'init' :'epsg:2248'} 
geometry=[Point(xy) for xy in zip(exe_df.x, exe_df.y)]
exe_gdf = gpd.GeoDataFrame( exe_df.drop(['x', 'y'], axis=1), crs=in_crs, geometry=geometry)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-524fd9aa5e67> in <module>()
      9 exe_df = ''
     10 
---> 11 exe_df['x'] = pd.to_numeric(exe_df['x'], errors='coerce')
     12 exe_df['y'] = pd.to_numeric(exe_df['y'], errors='coerce')
     13 # exe_df = exe_df.replace(np.nan, 0, regex=True)

TypeError: string indices must be integers
{% endraw %}

Approach 3: Method 1: Example 2: FOOD BANK PANTRIES

{% raw %}
# Alternate Primary Table
# Table: Food Bank And Pantry Sites, 
# XLSX Sheet: Baltimore City Pantries Tracts
# Columns: X	Y	OBJECTID	Name	Address	City_1	State	Zip	# in Zip	FIPS
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vT3lG0n542sIGE2O-C8fiXx-qUZG2WDO6ezRGcNsS4z8MM30XocVZ90P1UQOIXO2w/pub?gid=1152681223&single=true&output=csv'

# Read in the dataframe
food_df = pd.read_csv(url)
{% endraw %} {% raw %}
food_df['X'] = pd.to_numeric(food_df['X'], errors='coerce')
food_df['Y'] = pd.to_numeric(food_df['Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)

# An example of loading in an internal BNIA file
crs = {'init' :'epsg:2248'} 
geometry=[Point(xy) for xy in zip(food_df['X'], food_df['Y'])]
food_gdf = gpd.GeoDataFrame( food_df.drop(['X', 'Y'], axis=1), crs=in_crs, geometry=geometry)
{% endraw %} {% raw %}
food_gdf.head()
OBJECTID Name Address City_1 State Zip # in Zip FIPS geometry
0 1 Victory Forest -... 10000 Brunswick ... Silver Spring MD 20817 NaN NaN POINT (-77.05673...
1 2 Glassmanor Eleme... 1011 Marcy Avenue Oxon Hill MD 20745 NaN NaN POINT (-76.99036...
2 3 Apple Blossoms 1013 Cresthaven ... Silver Spring MD 20903 NaN NaN POINT (-76.99155...
3 4 Lakeview Apartme... 10250 West Lake Dr. Bethesda MD 20817 NaN NaN POINT (-77.14929...
4 5 Central Gardens 103 Cindy Lane Capitol Heights MD NaN NaN POINT (-76.88974...
{% endraw %} {% raw %}
food_gdf.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb3473c3c8>
{% endraw %}

Approach 3: Method 2: Geocoding Addresses to Coordinates

This function takes a while. The less columns/data/records the faster it execues.

{% raw %}
# More information vist: https://geopy.readthedocs.io/en/stable/#module-geopy.geocoders

# In this example we retrieve and map a dataset with no lat/lng but containing an address

# The url listed below is public.
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTMgdqWykZeIsMwCFllPuG1cd4gGDB6BUqaAOM0Lx9VGdCo2JJy9v_CR8ZaEDWO3Q/pub?gid=290715815&single=true&output=csv'
df = pd.read_csv(url)

# In this example our data is stored in the 'STREET' attribute
addrCol = 'STREET'
geometry = []
geolocator = Nominatim(user_agent="specify_your_app_name_here")

for index, row in df.iterrows():
  # We will try and return an address for each Street Name
  try: 
      # retrieve the geocoded information of our street address
      geol = geolocator.geocode(row[addrCol], timeout=None)
      
      print('Geocoding: ', location.address) 
      # print(location.raw)
      
      # create a mappable coordinate point from the response object's lat/lang values.
      pnt = Point(geol.longitude, geol.latitude)
      
      # Append this value to the list of geometries
      geometry.append(pnt)
      
  except: 
      # If no street name was found decide what to do here.
      # df.loc[index]['geom'] = Point(0,0) # Alternate method
      geometry.append(Point(0,0))
      
# Finally, we stuff the geometry data we created back into the dataframe
df['geometry'] = geometry

# Convert the dataframe into a geodataframe and map it!
gdf = gpd.GeoDataFrame( df, geometry=geometry)
{% endraw %} {% raw %}
food_gdf2.head()
{% endraw %}

Approach 4: Connecting to a PostGIS database

In the following example pulls point geodata from a Postgres database.

We will pull the postgres point data in two manners.

  • SQL query where an SQL query uses ST_Transform(the_geom,4326) to transform the_geom's CRS from a DATABASE Binary encoding into standard Lat Long's
  • Using a plan SQL query and performing the conversion using gpd.io.sql.read_postgis() to pull the data in as 2248 and convert the CRS using .to_crs(epsg=4326)
  • These examples will not work in colabs as their is no local database to connect to and has been commented out for that reason
{% raw %}
# This Notebook can be downloaded to connect to a database
'''
conn = psycopg2.connect(host='', dbname='', user='', password='', port='')

# DB Import Method One
sql1 = 'SELECT the_geom, gid, geogcode, ooi, address, addrtyp, city, block, lot, desclu, existing FROM housing.mdprop_2017v2 limit 100;'
pointData = gpd.io.sql.read_postgis(sql1, conn, geom_col='the_geom', crs=2248)
pointData = pointData.to_crs(epsg=4326)

# DB Import Method Two
sql2 = 'SELECT ST_Transform(the_geom,4326) as the_geom, ooi, desclu, address FROM housing.mdprop_2017v2;'
pointData = gpd.GeoDataFrame.from_postgis(sql2, conn, geom_col='the_geom', crs=4326)
pointData.head()
pointData.plot()
'''
{% endraw %}

Basics Operations

Inspection

{% raw %}
def geomSummary(gdf): return type(gdf), gdf.crs, gdf.columns;
# for p in df['Tract'].sort_values(): print(p)
geomSummary(csa_gdf)
(geopandas.geodataframe.GeoDataFrame, <Geographic 2D CRS: EPSG:4326>
 Name: WGS 84
 Axis Info [ellipsoidal]:
 - Lat[north]: Geodetic latitude (degree)
 - Lon[east]: Geodetic longitude (degree)
 Area of Use:
 - name: World
 - bounds: (-180.0, -90.0, 180.0, 90.0)
 Datum: World Geodetic System 1984
 - Ellipsoid: WGS 84
 - Prime Meridian: Greenwich, Index(['OBJECTID', 'CSA2010', 'tpop10', 'male10', 'female10', 'paa17',
        'pwhite17', 'pasi17', 'p2more17', 'ppac17', 'phisp17', 'racdiv17',
        'age5_17', 'age18_17', 'age24_17', 'age64_17', 'age65_17', 'hhs10',
        'femhhs17', 'fam17', 'hhsize10', 'mhhi17', 'hh25inc17', 'hh40inc17',
        'hh60inc17', 'hh75inc17', 'hhm7517', 'hhpov17', 'hhchpov17',
        'Shape__Area', 'Shape__Length', 'geometry'],
       dtype='object'))
{% endraw %}

Converting CRS

{% raw %}
# Convert the CRS of the dataset into one you desire
# The gdf must be loaded with a known crs in order for the to_crs conversion to work
# We use this often to converting BNIAs custom CRS to the common type 
out_crs = 4326
csa_gdf = csa_gdf.to_crs(epsg=out_crs)
{% endraw %}

Saving

{% raw %}
# Here is code to comit a simple save
filename = 'TEST_FILE_NAME'
csa_gdf.to_file(f"{filename}.geojson", driver='GeoJSON')
{% endraw %} {% raw %}
# Here is code to save this new projection as a geojson file and read it back in
csa_gdf = csa_gdf.to_crs(epsg=2248) #just making sure
csa_gdf.to_file(filename+'.shp', driver='ESRI Shapefile')
csa_gdf = gpd.read_file(filename+'.shp')
{% endraw %}

Geometric Manipulations

Draw Tool

{% raw %}
import folium
from folium.plugins import Draw
# Draw tool. Create and export your own boundaries
m = folium.Map()
draw = Draw()
draw.add_to(m)
m = folium.Map(location=[-27.23, -48.36], zoom_start=12)
draw = Draw(export=True)
draw.add_to(m)
# m.save(os.path.join('results', 'Draw1.html'))
m
<folium.plugins.draw.Draw at 0x7fdb346b7470>
<folium.plugins.draw.Draw at 0x7fdb4de43518>
Make this Notebook Trusted to load map: File -> Trust Notebook
{% endraw %}

Boundary

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.boundary
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb340a5668>
{% endraw %}

envelope

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.envelope
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb3402b748>
{% endraw %}

convex_hull

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.convex_hull
newcsa.plot(column='CSA2010' )
# , cmap='OrRd', scheme='quantiles'
# newcsa.boundary.plot(  )
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb33fa22e8>
{% endraw %}

simplify

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.simplify(30)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb33f86358>
{% endraw %}

buffer

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.buffer(0.01)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb33edceb8>
{% endraw %}

rotate

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.rotate(30)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb33ec5c18>
{% endraw %}

scale

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.scale(3, 2)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb33e34ac8>
{% endraw %}

skew

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.skew(1, 10)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb33f86b38>
{% endraw %}

Advanced

Operations:

  • Reading in data (points/ geoms) -- Convert lat/lng columns to point coordinates -- Geocoding address to coordinates -- Changing coordinate reference systems -- Connecting to PostGisDB's
  • Basic Operations
  • Saving shape data
  • Get Polygon Centroids
  • Working with Points and Polygons -- Map Points and Polygons -- Get Points in Polygons

Input(s):

  • Dataset (points/ bounds) url
  • Points/ bounds geometry column(s)
  • Points/ bounds crs's
  • Points/ bounds mapping color(s)
  • New filename

Output: File

Create Geospatial Functions

This function will handle common geo spatial exploratory methods. It covers everything discussed in the basic operations and more!

{% raw %}
{% endraw %} {% raw %}

workWithGeometryData[source]

workWithGeometryData(method=False, df=False, polys=False, ptsCoordCol=False, polygonsCoordCol=False, polyColorCol=False, polygonsLabel='polyOnPoint', pntsClr='red', polysClr='white')

{% endraw %} {% raw %}
{% endraw %} {% raw %}

map_points[source]

map_points(df, lat_col='latitude', lon_col='longitude', zoom_start=11, plot_points=False, pt_radius=15, draw_heatmap=False, heat_map_weights_col=None, heat_map_weights_normalize=True, heat_map_radius=15)

Creates a map given a dataframe of points. Can also produce a heatmap overlay

Arg: df: dataframe containing points to maps lat_col: Column containing latitude (string) lon_col: Column containing longitude (string) zoom_start: Integer representing the initial zoom of the map plot_points: Add points to map (boolean) pt_radius: Size of each point draw_heatmap: Add heatmap to map (boolean) heat_map_weights_col: Column containing heatmap weights heat_map_weights_normalize: Normalize heatmap weights (boolean) heat_map_radius: Size of heatmap point

Returns: folium map object

{% endraw %}

Processing Geometry is tedius enough to merit its own handler

{% raw %}
{% endraw %} {% raw %}

readInGeometryData[source]

readInGeometryData(url=False, porg=False, geom=False, lat=False, lng=False, revgeocode=False, save=False, in_crs=4326, out_crs=False)

{% endraw %}

As you can see we have a lot of points. Lets see if there is any better way to visualize this.

Example: Using the advanced Functions

Simple Examples

Example 0: Loading describing and plotting a simple shapefile.

{% raw %}
# Example 0: Loading describing and plotting a simple shapefile.
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv'
geom = readInGeometryData(url=url, geom='geometry', in_crs=2248, out_crs=2248)
workWithGeometryData('summary', geom) 
geom.plot()
RECIEVED url: https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv, 
 porg: g, 
 geom: geometry, 
 lat: False, 
 lng: False, 
 revgeocode: False, 
 in_crs: 2248, 
 out_crs: 2248
Index(['TRACTCE10', 'GEOID10', 'NAME10', 'CSA', 'Tract', 'geometry'], dtype='object')
TheStartOfSomethingNew
(geopandas.geodataframe.GeoDataFrame, <Projected CRS: EPSG:2248>
 Name: NAD83 / Maryland (ftUS)
 Axis Info [cartesian]:
 - X[east]: Easting (US survey foot)
 - Y[north]: Northing (US survey foot)
 Area of Use:
 - name: USA - Maryland
 - bounds: (-79.49, 37.97, -74.97, 39.73)
 Coordinate Operation:
 - name: SPCS83 Maryland zone (US Survey feet)
 - method: Lambert Conic Conformal (2SP)
 Datum: North American Datum 1983
 - Ellipsoid: GRS 1980
 - Prime Meridian: Greenwich, Index(['TRACTCE10', 'GEOID10', 'NAME10', 'CSA', 'Tract', 'geometry'], dtype='object'))
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb30a39ef0>
{% endraw %}

Example Points 1: Loading Tax data from addresses. Get sent to coordinates.

{% raw %}
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTMgdqWykZeIsMwCFllPuG1cd4gGDB6BUqaAOM0Lx9VGdCo2JJy9v_CR8ZaEDWO3Q/pub?gid=290715815&single=true&output=csv'
# points = readInGeometryData(url=url, revgeocode='y', lat='STREET_')
# workWithGeometryData('summary', points) 
{% endraw %}

Example Points 2: Loading Food Pantries as Points

{% raw %}
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vT3lG0n542sIGE2O-C8fiXx-qUZG2WDO6ezRGcNsS4z8MM30XocVZ90P1UQOIXO2w/pub?gid=1152681223&single=true&output=csv'
crs = {'init' :'epsg:2248'} 
foodPantryLocations = readInGeometryData(url=url, porg='p', geom=False, lat='Y', lng='X', revgeocode=False,  save=False, in_crs=crs, out_crs=crs)
foodPantryLocations.plot()
RECIEVED url: https://docs.google.com/spreadsheets/d/e/2PACX-1vT3lG0n542sIGE2O-C8fiXx-qUZG2WDO6ezRGcNsS4z8MM30XocVZ90P1UQOIXO2w/pub?gid=1152681223&single=true&output=csv, 
 porg: p, 
 geom: False, 
 lat: Y, 
 lng: X, 
 revgeocode: False, 
 in_crs: {'init': 'epsg:2248'}, 
 out_crs: {'init': 'epsg:2248'}
Index(['X', 'Y', 'OBJECTID', 'Name', 'Address', 'City_1', 'State', 'Zip',
       '# in Zip', 'FIPS'],
      dtype='object')
/usr/local/lib/python3.6/dist-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb30800da0>
{% endraw %} {% raw %}
foodPantryLocations.head()
X Y OBJECTID Name Address City_1 State Zip # in Zip FIPS geometry
0 -77.06 39.02 1 Victory Forest -... 10000 Brunswick ... Silver Spring MD 20817 NaN NaN POINT (-77.05673...
1 -76.99 38.82 2 Glassmanor Eleme... 1011 Marcy Avenue Oxon Hill MD 20745 NaN NaN POINT (-76.99036...
2 -76.99 39.02 3 Apple Blossoms 1013 Cresthaven ... Silver Spring MD 20903 NaN NaN POINT (-76.99155...
3 -77.15 39.02 4 Lakeview Apartme... 10250 West Lake Dr. Bethesda MD 20817 NaN NaN POINT (-77.14929...
4 -76.89 38.89 5 Central Gardens 103 Cindy Lane Capitol Heights MD NaN NaN POINT (-76.88974...
{% endraw %}

Lets see how our map looks when we have points atop polygons

{% raw %}
panp = workWithGeometryData( 'pandp', foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], csa_gdf, pntsClr='red', polysClr='white')
mapPointsandPolygons
{% endraw %}

Looking good! But the red dots are a bit too noisy. Lets create a choropleth instead!

We can start of by finding which points are inside of which polygons!

A choropleth map will be created at the bottom of the output once the the code below this cell is exectued for our Food Pantries Data.

{% raw %}
# https://stackoverflow.com/questions/27606924/count-number-of-points-in-multipolygon-shapefile-using-python
ptsCoordCol = 'geometry'
polygonsCoordCol = 'geometry'

pointsInPolys = workWithGeometryData('pinp', foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], csa_gdf, 'geometry' , 'geometry')
pointsInPolys.plot(column='number of points', legend=True)
Total Points:  220.0
Total Points in Polygons:  199
Prcnt Points in Polygons:  0.9045454545454545
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:74: FutureWarning:     You are passing non-geometry data to the GeoSeries constructor. Currently,
    it falls back to returning a pandas Series. But in the future, we will start
    to raise a TypeError instead.
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb30c85438>
{% endraw %} {% raw %}
pointsInPolys.head()
OBJECTID CSA2010 tpop10 male10 female10 paa17 pwhite17 pasi17 p2more17 ppac17 phisp17 racdiv17 age5_17 age18_17 age24_17 age64_17 age65_17 hhs10 femhhs17 fam17 hhsize10 mhhi17 hh25inc17 hh40inc17 hh60inc17 hh75inc17 hhm7517 hhpov17 hhchpov17 Shape__Are Shape__Len geometry number of points pointsinpolygon
0 1 Allendale/Irving... 16726 7657 9069 90.28 6.53 0.11 1.00 0.00 2.06 20.12 6.61 17.37 9.00 53.29 13.72 6098 71.13 35.20 2.64 39495.63 32.99 17.72 19.91 11.95 17.43 20.70 32.77 6.38e+07 38770.17 POLYGON ((-76.65... 7 7
1 2 Beechfield/Ten H... 13391 5985 7406 75.32 18.86 0.42 3.31 0.31 1.78 41.02 7.93 14.58 9.71 55.37 12.42 5076 55.19 26.14 2.40 57572.50 20.42 13.90 18.18 10.87 36.64 10.47 23.92 4.79e+07 37524.95 POLYGON ((-76.69... 2 2
2 3 Belair-Edison 17380 7297 10083 85.65 10.03 0.57 1.70 0.81 1.24 27.26 5.42 22.81 8.41 54.45 8.91 6174 77.53 38.27 2.90 39624.48 34.10 16.28 20.07 8.11 21.45 20.27 34.56 4.50e+07 31307.31 POLYGON ((-76.56... 0 0
3 4 Brooklyn/Curtis ... 12900 5746 7154 37.96 39.68 2.53 3.61 1.31 14.91 73.93 10.91 16.09 8.25 57.45 7.30 5204 43.39 32.32 2.61 40275.28 31.40 18.32 18.52 9.08 22.68 24.21 46.41 1.76e+08 150987.70 MULTIPOLYGON (((... 4 4
4 5 Canton 8326 4094 4232 3.94 85.58 4.38 2.45 0.56 3.09 26.31 5.25 3.12 5.85 75.25 10.53 4310 10.55 11.03 1.86 111891.25 7.41 7.82 9.18 6.43 69.16 3.66 4.02 1.54e+07 23338.61 POLYGON ((-76.57... 1 1
{% endraw %} {% raw %}
# And now that we have that settled, lets map it!
panp = workWithGeometryData( 'pandp', foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], pointsInPolys, polyColorCol='number of points')
mapPointsandPolygons
{% endraw %}

In that last example, we got a count of points in the polygon dataset.

If we wanted the Polygon each Point is on, you'd do it like this!

{% raw %}
panp = getPolygonOnPoints(foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], csa_gdf, 'geometry', 'geometry', 'CSA2010')
panp.head()
panp.plot()
Total Points:  220.0
Total Points in Polygons:  199
Prcnt Points in Polygons:  0.9045454545454545
X Y OBJECTID Name Address City_1 State Zip # in Zip FIPS geometry CSA2010
310 -76.61 39.29 317 2-1-1 MD@UWCM 100 S. Charles S... Baltimore MD 21201 5.0 40100.0 POINT (-76.61499... Downtown/Seton Hill
313 -76.62 39.31 502 University of Ba... 21 W. Mount Roya... Baltimore MD 21201 NaN 110200.0 POINT (-76.61716... Midtown
315 -76.68 39.28 626 Project PLASE Em... 3601 Old Frederi... Baltimore MD 21201 NaN 200800.0 POINT (-76.67662... Allendale/Irving...
316 -76.63 39.30 699 Historic Samuel ... 507 W Preston St... Baltimore MD 21201 NaN 170200.0 POINT (-76.62645... Upton/Druid Heights
321 -76.62 39.30 715 Westminster Hous... 524 North Charle... Baltimore MD 21201 NaN 110200.0 POINT (-76.61560... Midtown
<matplotlib.axes._subplots.AxesSubplot at 0x7fdb3073bf98>
{% endraw %}

Alternately, we could map the centroid of boundaries within another boundary to find boundaries within boundaries

{% raw %}
map_points(food_df, lat_col='Y', lon_col='X', zoom_start=11, plot_points=True, pt_radius=15, draw_heatmap=True, heat_map_weights_col=None, heat_map_weights_normalize=True, heat_map_radius=15)
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:51: FutureWarning: Method `add_children` is deprecated. Please use `add_child` instead.
Make this Notebook Trusted to load map: File -> Trust Notebook
{% endraw %}