--- title: Geo-Data Intake and Operations keywords: fastai sidebar: home_sidebar summary: "This notebook was made to demonstrate how to work with geographic data." description: "This notebook was made to demonstrate how to work with geographic data." ---
This Coding Notebook is the third in a series.
An Interactive version can be found here .
This colab and more can be found on our webpage.
Content covered in previous tutorials will be used in later tutorials.
New code and or information should have explanations and or descriptions attached.
Concepts or code covered in previous tutorials will be used without being explaining in entirety.
The Dataplay Handbook development techniques covered in the Datalabs Guidebook
If content can not be found in the current tutorial and is not covered in previous tutorials, please let me know.
This notebook has been optimized for Google Colabs ran on a Chrome Browser.
Statements found in the index page on view expressed, responsibility, errors and ommissions, use at risk, and licensing extend throughout the tutorial.
In this notebook, the basics of working with geographic data are introduced.
Geographic data must be encoded properly order to attain the full potential of the spatial nature of your geographic data.
If you have read in a dataset using pandas it's data type will be a Dataframe.
It may be converted into a Geo-Dataframe using Geopandas as demonstrated in the sections below.
You can check a variables at any time using the dtype command:
yourGeoDataframe.dtype
Make sure the appropriate spatial Coordinate Reference System (CRS) is used when reading in your data!
ala wiki:
A spatial reference system (SRS) or coordinate reference system (CRS) is a coordinate-based local, regional or global system used to locate geographical entities
CRS 4326 is the CRS most people are familar with when refering to latiude and longitudes.
Baltimore's 4326 CRS should be at (39.2, -76.6)
BNIA uses CRS 2248 internally Additional Information: https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/coordinate_reference_systems.html
Ensure your geodataframes' coordinates are using the same CRS using the geopandas command:
yourGeoDataframe.CRS
When first recieving a spatial dataset, the spatial column may need to be encoded to convert its 'text' data type values into understood 'coordinate' data types before it can be understood/processed accordingly.
Namely, there are two ways to encode text into coordinates:
The first approach can be used for text taking the form "Point(-76, 39)" and will encode the text too coordinates. The second approach is useful when creating a point from two columns containing lat/lng information and will create Point coordinates from the two columns.
There exists two types of Geospatial Data, Raster and Vector. Both have different file formats.
This lab will only cover vector data.
Vector Data: Individual points stored as (x,y) coordinates pairs. These points can be joined to create lines or polygons.
Format of Vector data
Esri Shapefile — .shp, .dbf, .shx Description - Industry standard, most widely used. The three files listed above are needed to make a shapefile. Additional file formats may be included.
Geographic JavaScript Object Notation — .geojson, .json Description — Second most popular, Geojson is typically used in web-based mapping used by storing the coordinates as JSON.
Geography Markup Language — .gml Description — Similar to Geojson, GML has more data for the same amount of information.
Google Keyhole Markup Language — .kml, .kmz Description — XML-based and predominantly used for google earth. KMZ is a the newer, zipped version of KML.
Raster Data: Cell-based data where each cell represent geographic information. An Aerial photograph is one such example where each pixel has a color value
Raster Data Files: GeoTIFF — .tif, .tiff, .ovr ERDAS Imagine — .img IDRISI Raster — .rst, .rdc
Information Sourced From: https://towardsdatascience.com/getting-started-with-geospatial-works-1f7b47955438
Vector Data: Census Geographic Data:
# @title Run: Install Modules
%%capture
! pip install -U -q PyDrive
! pip install geopy
! pip install geopandas
! pip install geoplot
! pip install dexplot
! pip install dataplay
# This will just beautify the output
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# pd.set_option('display.expand_frame_repr', False)
# pd.set_option('display.precision', 2)
# pd.reset_option('max_colwidth')
pd.set_option('max_colwidth', 20)
# pd.reset_option('max_colwidth')
# (Optional) Run this cell to gain access to Google Drive (Colabs only)
from google.colab import drive
# Colabs operates in a virtualized enviornment
# Colabs default directory is at ~/content.
# We mount Drive into a temporary folder at '~/content/drive'
drive.mount('/content/drive')
cd drive/'My Drive'/colabs/DATA
ls
# Find Relative Path to Files
def findFile(root, file):
for d, subD, f in os.walk(root):
if file in f:
return "{1}/{0}".format(file, d)
break
# To 'import' a script you wrote, map its filepath into the sys
def addPath(root, file): sys.path.append(os.path.abspath( findFile( './', file) ))
If you are using Geopandas, Direct imports only work with geojson and shape files
# A Dataset taken from the public database provided by BNIAJFI hosted by Esro / ArcGIS
# BNIA ArcGIS Homepage: https://data-bniajfi.opendata.arcgis.com/
csa_gdf = gpd.read_file("https://opendata.arcgis.com/datasets/b738a8587b6d479a8824d937892701d8_0.geojson");
csa_gdf.plot()
This approach loads a map using a geometry column
# The attributes are what we will use.
in_crs = 2248 # The CRS we recieve our data
out_crs = 4326 # The CRS we would like to have our data represented as
geom = 'geometry' # The column where our spatial information lives.
# A url to a public Dataset
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv'
# Read in the dataframe
tract_df = pd.read_csv(url)
# Convert the geometry column datatype from a string of text into a coordinate datatype
tract_df[geom] = tract_df[geom].apply(lambda x: loads( str(x) ))
# Process the dataframe as a geodataframe with a known CRS and geom column
tract_gdf = GeoDataFrame(tract_df, crs=in_crs, geometry=geom)
tract_gdf.plot()
tract_gdf.head()
This example is using data constructed at the end of Tutorial 1.
Be sure to access the menu in the left drawer, hit the 'Files' tab and upload it.
# Primary Table
# Description: I created a public dataset from a google xlsx sheet 'Bank Addresses and Census Tract' from a workbook of the same name.
# Table: FDIC Baltimore Banks
# Columns: Bank Name, Address(es), Census Tract
left_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTViIZu-hbvhM3L7dIRAG95ISa7TNhUwdzlYxYzc1ygJoaYc3_scaXHe8Rtj5iwNA/pub?gid=1078028768&single=true&output=csv'
left_col = 'Census Tract'
# Alternate Primary Table
# Description: Same workbook, different Sheet: 'Branches per tract'
# Columns: Census Tract, Number branches per tract
# left_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSHFrRSHva1f82ZQ7Uxwf3A1phqljj1oa2duGlZDM1vLtrm1GI5yHmpVX2ilTfMHQ/pub?gid=1698745725&single=true&output=csv'
# lef_col = 'Number branches per tract'
# Crosswalk Table
# Table: Crosswalk Census Communities
# 'TRACT2010', 'GEOID2010', 'CSA2010'
crosswalk_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv'
use_crosswalk = True
crosswalk_left_col = 'TRACT2010'
crosswalk_right_col = 'GEOID2010'
# Secondary Table
# Table: Baltimore Boundaries
# 'TRACTCE10', 'GEOID10', 'CSA', 'NAME10', 'Tract', 'geometry'
right_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv'
right_col ='GEOID10'
merge_how = 'geometry'
interactive = True
merge_how = 'outer'
banksPd = mergeDatasets( left_ds=left_ds, left_col=left_col,
use_crosswalk=use_crosswalk, crosswalk_ds=crosswalk_ds,
crosswalk_left_col = crosswalk_left_col, crosswalk_right_col = crosswalk_right_col,
right_ds=right_ds, right_col=right_col,
merge_how=merge_how, interactive = interactive )
# The attributes are what we will use.
in_crs = 2248 # The CRS we recieve our data
out_crs = 4326 # The CRS we would like to have our data represented as
geom = 'geometry' # The column where our spatial information lives.
# Description: This was created in the previous tutorial.
# Description: It can be accessed via accessing a google drive, or by using the upload file feature in the left hand menu.
# Columns: Bank Information with TRACT, CSA, and GEOMETRY columns.
# To create this dataset I had to commit a full outer join in the previous tutorial.
# In this way geometries will be included even if there merge does not have a direct match.
# What this will do is that it means at least one (near) empty record for each community will exist that includes (at minimum) the geographic information and name of a Community.
# That way if no point level information existed in the community, that during the merge the geoboundaries are still carried over.
# If a user wanted to create a heatmap of this data, they would first have to perform an aggregation of their columns onto unique geometry columns.
# It would be the aggregate of of a column that gets colorized on the heatmap.
# Aggregation operations can easily be performed using a pivot table in XL.
# I hope to embed support for this functionality in the future.
# Heatmaps are covered in the next tutorial.
# Pre-Aggregated information is required to continue on to the next tutorial.
#url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTViIZu-hbvhM3L7dIRAG95ISa7TNhUwdzlYxYzc1ygJoaYc3_scaXHe8Rtj5iwNA/pub?gid=1078028768&single=true&output=csv'
# Read in the dataframe
#df = pd.read_csv(url)
# Convert the geometry column datatype from a string of text into a coordinate datatype
banksPd[geom] = banksPd[geom].apply(lambda x: loads( str(x) ))
# Process the dataframe as a geodataframe with a known CRS and geom column
banksGdf = GeoDataFrame(banksPd, crs=in_crs, geometry=geom)
banksGdf.plot()
banksGdf.head()
Lets aggregate by banks by CSA just for fun, huh?
banksGdf['CSA'].value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True).head()
Thats interesting. Lets see if we can map that!
# Aggregate dataframe by CSA
banksGdf['banksCount'] = 1
banksCount = banksGdf.groupby('CSA').sum(numeric_only=True)
banksCount = banksCount.reset_index()
banksCount.head()
So now we have the count in a dataframe..
# A url to a public Dataset
csa_gdf = gpd.read_file("https://opendata.arcgis.com/datasets/b738a8587b6d479a8824d937892701d8_0.geojson");
csa_gdf.head()
csa_gdf.columns
# merge it to our banks dataset
merged_banks = pd.merge(banksCount, csa_gdf, left_on='CSA', right_on='CSA2010', how='left')
merged_banks.head()
# Lets check what datatype our geometry column is before we try to convert it!
merged_banks.geometry.dtype
# Process the dataframe as a geodataframe with a known CRS and geom column
# Since the geometry column is already being interpreted as a geopandas dtype,
# we should readily be able to convert the dataframe without fuss.
banksCountgdf = GeoDataFrame(merged_banks, crs=2248, geometry='geometry')
# If you'd like, drop duplicate columns like so.
# merged_df = merged_df.drop('CSA', axis=1)
# In order for this choropleth to work, the total number of banks in each csa must be tallied.
# This can be done programmatically, but i havent added the code.
# The column needs to be changed from CSA to whatever this new tallied column is named.
banksCountgdf.plot( column='banksCount', legend=True)
dxp.bar(x='CSA2010', y='banksCount', data=banksCountgdf)
Go ahead and update the form and run the cell to explore the data visually!
tempBanksNoGeom = banksCountgdf.drop('geometry', axis=1)
#@title Form fields to view spatial data
#@markdown Forms support many types of fields.
legendOn = "True" #@param ['True', 'False']
displayCol = "banksCount" #@param ["banksCount", 'tpop10', 'male10', 'female10', 'paa17', 'pwhite17', 'pasi17', 'p2more17', 'ppac17', 'phisp17', 'racdiv17', 'age5_17', 'age18_17', 'age24_17', 'age64_17', 'age65_17', 'hhs10', 'femhhs17', 'fam17', 'hhsize10', 'mhhi17', 'hh25inc17', 'hh40inc17', 'hh60inc17', 'hh75inc17', 'hhm7517', 'hhpov17', 'hhchpov17"] {allow-input: true}
#@markdown ---
banksCountgdf.plot( column=displayCol, legend=legendOn)
dxp.bar(x='CSA2010',
y=displayCol,
data=banksCountgdf)
banksCountgdf.dropna().CSA2010.values
#@title Now lets compare some indicators
#@markdown Forms support many types of fields.
legendOn = "True" #@param ['True', 'False']
displayColx = "banksCount" #@param ["banksCount", 'tpop10', 'male10', 'female10', 'paa17', 'pwhite17', 'pasi17', 'p2more17', 'ppac17', 'phisp17', 'racdiv17', 'age5_17', 'age18_17', 'age24_17', 'age64_17', 'age65_17', 'hhs10', 'femhhs17', 'fam17', 'hhsize10', 'mhhi17', 'hh25inc17', 'hh40inc17', 'hh60inc17', 'hh75inc17', 'hhm7517', 'hhpov17', 'hhchpov17"] {allow-input: true}
displayColy = "female10" #@param ["banksCount", 'tpop10', 'male10', 'female10', 'paa17', 'pwhite17', 'pasi17', 'p2more17', 'ppac17', 'phisp17', 'racdiv17', 'age5_17', 'age18_17', 'age24_17', 'age64_17', 'age65_17', 'hhs10', 'femhhs17', 'fam17', 'hhsize10', 'mhhi17', 'hh25inc17', 'hh40inc17', 'hh60inc17', 'hh75inc17', 'hhm7517', 'hhpov17', 'hhchpov17"] {allow-input: true}
# displayColm = "Westport/Mount Winans/Lakeland" #@param ['Allendale/Irvington/S. Hilton', 'Beechfield/Ten Hills/West Hills', 'Belair-Edison', 'Brooklyn/Curtis Bay/Hawkins Point', 'Canton', 'Cedonia/Frankford', 'Cherry Hill', 'Chinquapin Park/Belvedere', 'Claremont/Armistead', 'Clifton-Berea', 'Cross-Country/Cheswolde', 'Dickeyville/Franklintown', 'Dorchester/Ashburton', 'Downtown/Seton Hill', 'Edmondson Village', 'Fells Point', 'Forest Park/Walbrook', 'Glen-Fallstaff', 'Greater Charles Village/Barclay', 'Greater Govans', 'Greater Mondawmin', 'Greater Roland Park/Poplar Hill', 'Greater Rosemont', 'Greenmount East', 'Hamilton', 'Harbor East/Little Italy', 'Harford/Echodale', 'Highlandtown', 'Howard Park/West Arlington', 'Inner Harbor/Federal Hill', 'Lauraville', 'Loch Raven', 'Madison/East End', 'Medfield/Hampden/Woodberry/Remington', 'Midtown', 'Midway/Coldstream', 'Morrell Park/Violetville', 'Mount Washington/Coldspring', 'North Baltimore/Guilford/Homeland', 'Northwood', 'Oldtown/Middle East', 'Orangeville/East Highlandtown', 'Patterson Park North & East', 'Penn North/Reservoir Hill', 'Pimlico/Arlington/Hilltop', 'Poppleton/The Terraces/Hollins Market', 'Sandtown-Winchester/Harlem Park', 'South Baltimore', 'Southeastern', 'Southern Park Heights', 'Southwest Baltimore', 'The Waverlies', 'Upton/Druid Heights', 'Washington Village/Pigtown', 'Westport/Mount Winans/Lakeland'] {allow-input: true}
#@markdown ---
# print('Comparing', displayColm, "'s Indicators ", displayColx, " and " , displayColy )
dxp.kde(x=displayColx, y=displayColy, data= banksCountgdf.dropna() )
# dxp.kde(x=displayColx, y=displayColy, data= banksCountgdf[ banksCountgdf.CSA2010 != displayColm ].dropna() )
Later in this tutorial we will show how you may find geometric bounds that coorespond with a points location ( points in polygons ). This can be a useful trick when we we want to create a heatmap of point data with specified boundaries.
If a user wanted to create a heatmap of this data...
they would first have to perform an aggregation of their columns onto unique geometry columns
Possible Path: (points in polygons -> crosswalk -> merge GeoJson).
It would be the aggregate of of a column that gets colorized on the heatmap.
Aggregation operations can easily be performed using a pivot table in XL.
I hope to embed support for this functionality in the future.
Heatmaps are covered in the next tutorial.
Pre-Aggregated information is required to continue onto the next tutorial.
Approach 3: Method 1: Example 1:
This is the generic example but it wont work since no URL is given.
# More Information: https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html#from-longitudes-and-latitudes
# If your data has coordinates in two columns run this cell
# It will create a geometry column from the two.
# A public dataset is not provided for this example and will not run.
# Load DF HERE. Accidently deleted the link. Need to refind.
# Just rely on example 2 for now.
exe_df = ''
exe_df['x'] = pd.to_numeric(exe_df['x'], errors='coerce')
exe_df['y'] = pd.to_numeric(exe_df['y'], errors='coerce')
# exe_df = exe_df.replace(np.nan, 0, regex=True)
# An example of loading in an internal BNIA file
crs = {'init' :'epsg:2248'}
geometry=[Point(xy) for xy in zip(exe_df.x, exe_df.y)]
exe_gdf = gpd.GeoDataFrame( exe_df.drop(['x', 'y'], axis=1), crs=in_crs, geometry=geometry)
Approach 3: Method 1: Example 2: FOOD BANK PANTRIES
# Alternate Primary Table
# Table: Food Bank And Pantry Sites,
# XLSX Sheet: Baltimore City Pantries Tracts
# Columns: X Y OBJECTID Name Address City_1 State Zip # in Zip FIPS
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vT3lG0n542sIGE2O-C8fiXx-qUZG2WDO6ezRGcNsS4z8MM30XocVZ90P1UQOIXO2w/pub?gid=1152681223&single=true&output=csv'
# Read in the dataframe
food_df = pd.read_csv(url)
food_df['X'] = pd.to_numeric(food_df['X'], errors='coerce')
food_df['Y'] = pd.to_numeric(food_df['Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)
# An example of loading in an internal BNIA file
crs = {'init' :'epsg:2248'}
geometry=[Point(xy) for xy in zip(food_df['X'], food_df['Y'])]
food_gdf = gpd.GeoDataFrame( food_df.drop(['X', 'Y'], axis=1), crs=in_crs, geometry=geometry)
food_gdf.head()
food_gdf.plot()
This function takes a while. The less columns/data/records the faster it execues.
# More information vist: https://geopy.readthedocs.io/en/stable/#module-geopy.geocoders
# In this example we retrieve and map a dataset with no lat/lng but containing an address
# The url listed below is public.
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTMgdqWykZeIsMwCFllPuG1cd4gGDB6BUqaAOM0Lx9VGdCo2JJy9v_CR8ZaEDWO3Q/pub?gid=290715815&single=true&output=csv'
df = pd.read_csv(url)
# In this example our data is stored in the 'STREET' attribute
addrCol = 'STREET'
geometry = []
geolocator = Nominatim(user_agent="specify_your_app_name_here")
for index, row in df.iterrows():
# We will try and return an address for each Street Name
try:
# retrieve the geocoded information of our street address
geol = geolocator.geocode(row[addrCol], timeout=None)
print('Geocoding: ', location.address)
# print(location.raw)
# create a mappable coordinate point from the response object's lat/lang values.
pnt = Point(geol.longitude, geol.latitude)
# Append this value to the list of geometries
geometry.append(pnt)
except:
# If no street name was found decide what to do here.
# df.loc[index]['geom'] = Point(0,0) # Alternate method
geometry.append(Point(0,0))
# Finally, we stuff the geometry data we created back into the dataframe
df['geometry'] = geometry
# Convert the dataframe into a geodataframe and map it!
gdf = gpd.GeoDataFrame( df, geometry=geometry)
food_gdf2.head()
In the following example pulls point geodata from a Postgres database.
We will pull the postgres point data in two manners.
# This Notebook can be downloaded to connect to a database
'''
conn = psycopg2.connect(host='', dbname='', user='', password='', port='')
# DB Import Method One
sql1 = 'SELECT the_geom, gid, geogcode, ooi, address, addrtyp, city, block, lot, desclu, existing FROM housing.mdprop_2017v2 limit 100;'
pointData = gpd.io.sql.read_postgis(sql1, conn, geom_col='the_geom', crs=2248)
pointData = pointData.to_crs(epsg=4326)
# DB Import Method Two
sql2 = 'SELECT ST_Transform(the_geom,4326) as the_geom, ooi, desclu, address FROM housing.mdprop_2017v2;'
pointData = gpd.GeoDataFrame.from_postgis(sql2, conn, geom_col='the_geom', crs=4326)
pointData.head()
pointData.plot()
'''
def geomSummary(gdf): return type(gdf), gdf.crs, gdf.columns;
# for p in df['Tract'].sort_values(): print(p)
geomSummary(csa_gdf)
# Convert the CRS of the dataset into one you desire
# The gdf must be loaded with a known crs in order for the to_crs conversion to work
# We use this often to converting BNIAs custom CRS to the common type
out_crs = 4326
csa_gdf = csa_gdf.to_crs(epsg=out_crs)
# Here is code to comit a simple save
filename = 'TEST_FILE_NAME'
csa_gdf.to_file(f"{filename}.geojson", driver='GeoJSON')
# Here is code to save this new projection as a geojson file and read it back in
csa_gdf = csa_gdf.to_crs(epsg=2248) #just making sure
csa_gdf.to_file(filename+'.shp', driver='ESRI Shapefile')
csa_gdf = gpd.read_file(filename+'.shp')
Draw Tool
import folium
from folium.plugins import Draw
# Draw tool. Create and export your own boundaries
m = folium.Map()
draw = Draw()
draw.add_to(m)
m = folium.Map(location=[-27.23, -48.36], zoom_start=12)
draw = Draw(export=True)
draw.add_to(m)
# m.save(os.path.join('results', 'Draw1.html'))
m
Boundary
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.boundary
newcsa.plot(column='CSA2010' )
envelope
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.envelope
newcsa.plot(column='CSA2010' )
convex_hull
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.convex_hull
newcsa.plot(column='CSA2010' )
# , cmap='OrRd', scheme='quantiles'
# newcsa.boundary.plot( )
simplify
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.simplify(30)
newcsa.plot(column='CSA2010' )
buffer
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.buffer(0.01)
newcsa.plot(column='CSA2010' )
rotate
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.rotate(30)
newcsa.plot(column='CSA2010' )
scale
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.scale(3, 2)
newcsa.plot(column='CSA2010' )
skew
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.skew(1, 10)
newcsa.plot(column='CSA2010' )
Operations:
Input(s):
Output: File
This function will handle common geo spatial exploratory methods. It covers everything discussed in the basic operations and more!
Processing Geometry is tedius enough to merit its own handler
As you can see we have a lot of points. Lets see if there is any better way to visualize this.
# Example 0: Loading describing and plotting a simple shapefile.
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv'
# Lets create a dataframe using the library
geom = readInGeometryData(url=url, geom='geometry', in_crs=2248, out_crs=2248)
# And now lets do it again just for fun using the dataframe we just created from the URL
geomi = readInGeometryData(url=geom, geom='geometry', in_crs=2248, out_crs=2248)
workWithGeometryData('summary', geomi)
geomi.plot()
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTMgdqWykZeIsMwCFllPuG1cd4gGDB6BUqaAOM0Lx9VGdCo2JJy9v_CR8ZaEDWO3Q/pub?gid=290715815&single=true&output=csv'
# points = readInGeometryData(url=url, revgeocode='y', lat='STREET_')
# workWithGeometryData('summary', points)
foodPantryLocationsUrl = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vT3lG0n542sIGE2O-C8fiXx-qUZG2WDO6ezRGcNsS4z8MM30XocVZ90P1UQOIXO2w/pub?gid=1152681223&single=true&output=csv'
crs = {'init' :'epsg:2248'}
foodPantryLocations = readInGeometryData(url=foodPantryLocationsUrl, porg='p', geom=False, lat='Y', lng='X', revgeocode=False, save=False, in_crs=crs, out_crs=crs)
foodPantryLocations.plot()
foodPantryLocations.head()
Lets see how our map looks when we have points atop polygons
panp = workWithGeometryData( 'pandp', foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], csa_gdf, pntsClr='red', polysClr='white')
Looking good! But the red dots are a bit too noisy. Lets create a choropleth instead!
We can start of by finding which points are inside of which polygons!
A choropleth map will be created at the bottom of the output once the the code below this cell is exectued for our Food Pantries Data.
# https://stackoverflow.com/questions/27606924/count-number-of-points-in-multipolygon-shapefile-using-python
ptsCoordCol = 'geometry'
polygonsCoordCol = 'geometry'
pointsInPolys = workWithGeometryData('pinp', foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], csa_gdf, 'geometry' , 'geometry')
pointsInPolys.plot(column='number of points', legend=True)
pointsInPolys.head()
# And now that we have that settled, lets map it!
panp = workWithGeometryData( 'pandp', foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], pointsInPolys, polyColorCol='number of points')
In that last example, we got a count of points in the polygon dataset.
If we wanted the Polygon each Point is on, you'd do it like this!
panp = getPolygonOnPoints(foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], csa_gdf, 'geometry', 'geometry', 'CSA2010')
panp.head()
panp.plot()
Alternately, we could map the centroid of boundaries within another boundary to find boundaries within boundaries
map_points(food_df, lat_col='Y', lon_col='X', zoom_start=11, plot_points=True, pt_radius=15, draw_heatmap=True, heat_map_weights_col=None, heat_map_weights_normalize=True, heat_map_radius=15)