
Binary Spatial operations on Geo Dataframes¶
This time the spatial operations will be applied when we have as input two GDFs.
Getting ready¶
The links to the our maps on GitHub are here:
linkWorldMap="https://github.com/CienciaDeDatosEspacial/dataSets/raw/refs/heads/main/WORLD/worldMaps.gpkg"
linkBrazil="https://github.com/CienciaDeDatosEspacial/dataSets/raw/refs/heads/main/BRAZIL/brazil_5880.gpkg"
linkIndicators="https://github.com/CienciaDeDatosEspacial/dataSets/raw/refs/heads/main/WORLD/worldindicators.json"
Let me introduce another data source on seaports. Their locations in long/lat is stored in the file seaports.csv which we already have on GitHub:
linkSeaPorts='https://github.com/CienciaDeDatosEspacial/dataSets/raw/refs/heads/main/WORLD/seaports.csv'
Let's get some maps:
import geopandas as gpd
#world
world_rivers=gpd.read_file(linkWorldMap,layer='rivers')
#brazil
brazil5880=gpd.read_file(linkBrazil,layer='country')
airports_brazil5880=gpd.read_file(linkBrazil,layer='airports')
states_brazil5880=gpd.read_file(linkBrazil,layer='states')
municipalities_brazil5880=gpd.read_file(linkBrazil,layer='municipalities')
#some indicators
indicators=gpd.read_file(linkIndicators)
# the seaports
import pandas as pd
infoseaports=pd.read_csv(linkSeaPorts)
# the sesports data has too manny columns:
len(infoseaports.columns)
108
Let me keep some columns, and turn the DF into a GDF:
#rename
infoseaports.rename(columns={'Main Port Name':'seaport_name','Country Code':'country_name'},inplace=True)
#keep few columns
infoseaports=infoseaports.loc[:,['seaport_name', 'country_name','Latitude', 'Longitude']]
#spatial points (unprojected)
seaports=gpd.GeoDataFrame(data=infoseaports.copy(),
geometry=gpd.points_from_xy(infoseaports.Longitude,
infoseaports.Latitude),
crs=4326)# notice it is unprojected
# keep Brazil
seaports_bra=seaports[seaports['country_name']=='Brazil'].copy()
# reset indexes
seaports_bra.reset_index(drop=True, inplace=True)
# reprojecting
seaports_brazil5880=seaports_bra.to_crs(5880) # projected crs
Before proceeding, let me compute some GDFs we created the previous session:
- The centroid of Brazil
brazil5880_cen=brazil5880.centroid
- Large Brazilian airports
large_airports=airports_brazil5880.query("airport_type=='large_airport'")
- Amazon Rivers
AmazonSystem=world_rivers.query("SYSTEM=='Amazon'")
AmazonSystem_5880=AmazonSystem.to_crs(5880)
- The states in the East, West, South, and North
mid_x,mid_y=brazil5880_cen.x[0],brazil5880_cen.y[0]
# the north
N_brazil=states_brazil5880.cx[:,mid_y:]
# the south
S_brazil=states_brazil5880.cx[:,:mid_y]
# the west
W_brazil=states_brazil5880.cx[:mid_x,:]
# the east
E_brazil=states_brazil5880.cx[mid_x:,:]
- The mean fragility by region of the world
indicatorsByRegion=indicators.dissolve(
by="region", #groupby()
aggfunc={"fragility": "mean"}, #agg()
)
# You may need to install if using Colab
# !pip install mapclassify
I. Distance¶
Distance is a key binary operation as so many practical policy matters depend on knowing distances between objects in space.
Any pair of rightly projected GDFs have a distance between them. Below we can make query using distances:
Which are the airports whose distance to Brazil centroid is > 2500000?
# this is the centroid we have:
brazil5880_cen
0 POINT (5085264.134 8827720.201) dtype: geometry
Then,
airports_brazil5880[airports_brazil5880.distance(brazil5880_cen[0]) > 2500000]
airport_name | airport_type | elevation_ft | region | municipality | geometry | |
---|---|---|---|---|---|---|
4104 | Viatec Aviação Agrícola Airport | small_airport | 92.0 | Rio Grande do Sul | Santa Vitória do Palmar | POINT (5068397.782 6323060.053) |
4106 | Santa Vitória do Palmar Airport | small_airport | 82.0 | Rio Grande do Sul | Santa Vitória Do Palmar | POINT (5060939.818 6291358.318) |
6662 | Trindade Heliport | heliport | NaN | Espírito Santo | Vitória | POINT (7565615.936 7537143.55) |
The results can be confirmed visually:
base=airports_brazil5880[airports_brazil5880.distance(brazil5880_cen.iloc[0]) > 2500000].plot(marker='+',markersize=100)
airports_brazil5880.plot(ax=base,color='grey', markersize=0.1)
brazil5880_cen.plot(ax=base,color='red')
<Axes: >
Let me review how distances work between different kinds of geometries:
a. Distance between points¶
We have these points:
seaports_brazil5880.head()
seaport_name | country_name | Latitude | Longitude | geometry | |
---|---|---|---|---|---|
0 | Dtse / Gegua Oil Terminal | Brazil | -22.816667 | -43.150000 | POINT (6112865.27 7434885.819) |
1 | Porto De Mucuripe | Brazil | -3.716667 | -38.483333 | POINT (6723610.313 9573897.029) |
2 | Portocel | Brazil | -19.850000 | -40.050000 | POINT (6459540.7 7743872.147) |
3 | Santa Clara | Brazil | -20.883333 | -51.366667 | POINT (5273988.618 7687595.596) |
4 | Aratu | Brazil | -12.783333 | -38.500000 | POINT (6681955.632 8535904.773) |
large_airports.head()
airport_name | airport_type | elevation_ft | region | municipality | geometry | |
---|---|---|---|---|---|---|
0 | Guarulhos - Governador André Franco Montoro In... | large_airport | 2461.0 | São Paulo | São Paulo | POINT (5769392.959 7387510.487) |
1 | Rio Galeão – Tom Jobim International Airport | large_airport | 28.0 | Rio de Janeiro | Rio De Janeiro | POINT (6102623.812 7436387.271) |
4 | Presidente Juscelino Kubitschek International ... | large_airport | 3497.0 | Distrito Federal | Brasília | POINT (5651010.181 8235390.09) |
5 | Deputado Luiz Eduardo Magalhães International ... | large_airport | 64.0 | Bahia | Salvador | POINT (6700330.807 8520415.591) |
9 | Tancredo Neves International Airport | large_airport | 2721.0 | Minas Gerais | Belo Horizonte | POINT (6051718.72 7797008.905) |
For results of distance operation to be more insightful, we may do this:
# index will be names of airports/seaports instead of numbers
large_airports.set_index('airport_name',inplace=True)
seaports_brazil5880.set_index('seaport_name',inplace=True)
Notice:
seaports_brazil5880.head()
country_name | Latitude | Longitude | geometry | |
---|---|---|---|---|
seaport_name | ||||
Dtse / Gegua Oil Terminal | Brazil | -22.816667 | -43.150000 | POINT (6112865.27 7434885.819) |
Porto De Mucuripe | Brazil | -3.716667 | -38.483333 | POINT (6723610.313 9573897.029) |
Portocel | Brazil | -19.850000 | -40.050000 | POINT (6459540.7 7743872.147) |
Santa Clara | Brazil | -20.883333 | -51.366667 | POINT (5273988.618 7687595.596) |
Aratu | Brazil | -12.783333 | -38.500000 | POINT (6681955.632 8535904.773) |
large_airports.head()
airport_type | elevation_ft | region | municipality | geometry | |
---|---|---|---|---|---|
airport_name | |||||
Guarulhos - Governador André Franco Montoro International Airport | large_airport | 2461.0 | São Paulo | São Paulo | POINT (5769392.959 7387510.487) |
Rio Galeão – Tom Jobim International Airport | large_airport | 28.0 | Rio de Janeiro | Rio De Janeiro | POINT (6102623.812 7436387.271) |
Presidente Juscelino Kubitschek International Airport | large_airport | 3497.0 | Distrito Federal | Brasília | POINT (5651010.181 8235390.09) |
Deputado Luiz Eduardo Magalhães International Airport | large_airport | 64.0 | Bahia | Salvador | POINT (6700330.807 8520415.591) |
Tancredo Neves International Airport | large_airport | 2721.0 | Minas Gerais | Belo Horizonte | POINT (6051718.72 7797008.905) |
The distance from each airport to the "Dtse / Gegua Oil Terminal"
large_airports.distance(seaports_brazil5880.geometry.loc['Dtse / Gegua Oil Terminal'])
airport_name Guarulhos - Governador André Franco Montoro International Airport 3.467242e+05 Rio Galeão – Tom Jobim International Airport 1.035093e+04 Presidente Juscelino Kubitschek International Airport 9.241846e+05 Deputado Luiz Eduardo Magalhães International Airport 1.234298e+06 Tancredo Neves International Airport 3.672493e+05 Eduardo Gomes International Airport 2.854494e+06 Hercílio Luz International Airport 7.681356e+05 Val de Cans/Júlio Cezar Ribeiro International Airport 2.462792e+06 dtype: float64
What about computing...
All the distances between large aiports and seaports (in km)
# apply creates a LOOP, computes distances from each seaport to all large airports
seaports_brazil5880.geometry.apply\
(lambda seaport: large_airports.geometry.distance(seaport)/1000)
airport_name | Guarulhos - Governador André Franco Montoro International Airport | Rio Galeão – Tom Jobim International Airport | Presidente Juscelino Kubitschek International Airport | Deputado Luiz Eduardo Magalhães International Airport | Tancredo Neves International Airport | Eduardo Gomes International Airport | Hercílio Luz International Airport | Val de Cans/Júlio Cezar Ribeiro International Airport |
---|---|---|---|---|---|---|---|---|
seaport_name | ||||||||
Dtse / Gegua Oil Terminal | 346.724171 | 10.350933 | 924.184620 | 1234.297632 | 367.249276 | 2854.493600 | 768.135646 | 2462.791784 |
Porto De Mucuripe | 2385.543307 | 2225.886834 | 1715.246883 | 1053.738618 | 1899.676213 | 2397.739727 | 2901.295202 | 1141.963646 |
Portocel | 776.722303 | 471.101490 | 946.209008 | 813.018817 | 411.269111 | 2867.979624 | 1232.521248 | 2266.253017 |
Santa Clara | 579.203362 | 865.876382 | 664.999298 | 1651.678293 | 785.388683 | 2189.659524 | 805.599343 | 2185.898767 |
Aratu | 1466.826530 | 1242.804930 | 1073.851757 | 24.032516 | 971.167168 | 2609.953079 | 1975.180167 | 1690.370196 |
Cabedelo | 2248.124981 | 2023.173930 | 1747.914940 | 786.022123 | 1751.230863 | 2830.123680 | 2757.957782 | 1641.263825 |
Imbituba | 578.310920 | 814.278300 | 1377.148556 | 2031.882238 | 1072.105305 | 3042.768039 | 63.612808 | 2983.998277 |
Angra Dos Reis | 225.003606 | 111.795735 | 883.019058 | 1308.028199 | 380.901616 | 2791.717379 | 671.807826 | 2455.089996 |
Natal | 2341.056650 | 2128.327377 | 1799.131283 | 891.421964 | 1843.689307 | 2774.776071 | 2853.930050 | 1556.780342 |
Guamare Oil Terminal | 2371.907633 | 2177.693211 | 1779.896593 | 953.130770 | 1876.302092 | 2647.096318 | 2887.478112 | 1411.272089 |
Itaqui | 2344.169726 | 2275.985960 | 1535.410406 | 1347.564542 | 1915.248841 | 1744.522370 | 2836.644708 | 476.002360 |
Manaus | 2682.252079 | 2836.382779 | 1935.849306 | 2625.311641 | 2529.751943 | 11.154274 | 2980.307205 | 1298.255235 |
Itacoatiara | 2592.193077 | 2730.217800 | 1820.822272 | 2468.851545 | 2413.939317 | 178.249697 | 2913.273479 | 1126.561239 |
Belem | 2456.456648 | 2452.107621 | 1604.006437 | 1709.617689 | 2088.711020 | 1296.677448 | 2913.649490 | 8.290566 |
Macae | 496.852182 | 160.532855 | 973.748856 | 1135.432221 | 383.200281 | 2918.055893 | 905.988525 | 2458.151928 |
Niteroi | 348.597382 | 16.028283 | 932.309968 | 1239.746717 | 375.348945 | 2862.315099 | 765.202458 | 2470.939163 |
Pelotas | 1093.531165 | 1343.662618 | 1822.421018 | 2557.070047 | 1590.903559 | 3287.745109 | 585.001710 | 3396.677050 |
Rio De Janeiro | 343.209409 | 13.287640 | 930.999018 | 1243.545205 | 375.961548 | 2860.201645 | 760.258369 | 2471.340121 |
Recife | 2142.801315 | 1910.277530 | 1674.666425 | 676.860725 | 1647.057100 | 2845.133868 | 2650.142734 | 1687.882366 |
Tramandai | 814.930209 | 1054.174075 | 1584.514525 | 2272.782881 | 1310.905328 | 3164.410125 | 299.764293 | 3180.525010 |
Madre De Deus | 1462.725337 | 1241.121137 | 1063.232802 | 36.749390 | 966.657897 | 2596.778994 | 1971.741706 | 1679.002816 |
Laguna | 610.078831 | 843.875837 | 1407.360604 | 2062.999867 | 1103.719364 | 3064.635623 | 95.009235 | 3013.520141 |
Port De Aracaju | 1733.070695 | 1504.384506 | 1303.001081 | 267.564965 | 1237.441529 | 2687.325788 | 2240.825988 | 1659.464068 |
Porto Santana | 2646.507607 | 2677.012736 | 1791.151164 | 2023.853225 | 2318.125252 | 1040.555921 | 3076.376196 | 335.441624 |
Porto De Maceio | 1934.791716 | 1699.567552 | 1497.384909 | 468.512961 | 1440.007502 | 2788.984232 | 2440.819954 | 1693.889037 |
Port De Salvador | 1448.644611 | 1222.959201 | 1065.257319 | 22.089660 | 953.390744 | 2616.268890 | 1956.445482 | 1704.676897 |
Rio Grande | 1106.010760 | 1349.213774 | 1845.254974 | 2567.264855 | 1603.088647 | 3322.620918 | 593.951743 | 3423.363634 |
Santarem | 2494.223469 | 2582.870485 | 1666.149767 | 2153.673543 | 2242.151898 | 596.990455 | 2873.540176 | 703.734669 |
Porto Do Forno | 458.612166 | 127.835449 | 1007.569738 | 1207.778202 | 425.934098 | 2947.271281 | 842.325416 | 2513.571318 |
Port Of Ilheus | 1250.183516 | 1012.231807 | 963.853808 | 227.158223 | 760.366657 | 2653.948977 | 1752.959780 | 1830.939739 |
Itajai | 445.212169 | 711.429716 | 1230.766858 | 1908.751545 | 942.378862 | 2912.927857 | 84.183089 | 2837.650203 |
Cameta | 2378.168716 | 2391.088714 | 1522.497285 | 1718.258865 | 2029.693661 | 1176.378545 | 2824.439489 | 149.389065 |
Gebig | 234.293252 | 102.687867 | 890.971113 | 1306.374359 | 383.834407 | 2801.212589 | 675.394423 | 2460.997434 |
Camocim | 2386.572249 | 2263.438786 | 1649.578153 | 1174.469354 | 1918.592061 | 2136.332613 | 2896.762517 | 867.120057 |
Tutoia | 2359.329043 | 2258.261495 | 1590.076044 | 1231.305489 | 1905.363602 | 1975.390802 | 2863.730319 | 706.603901 |
Porto De Suape | 2103.354422 | 1869.279589 | 1643.841933 | 637.106285 | 1607.968400 | 2841.333120 | 2610.100411 | 1695.716388 |
Sao Sebastiao | 116.468574 | 246.255403 | 922.876460 | 1439.559506 | 489.364806 | 2790.737899 | 534.907206 | 2520.013507 |
Porto Alegre | 873.579840 | 1131.972395 | 1608.962449 | 2338.373437 | 1370.955950 | 3133.273638 | 370.518357 | 3192.817209 |
Sao Francisco | 382.581936 | 667.068214 | 1156.757681 | 1848.757620 | 879.673853 | 2846.986805 | 158.134767 | 2763.531465 |
Santos | 60.285901 | 336.840080 | 916.131756 | 1503.504871 | 539.576757 | 2750.220613 | 471.909979 | 2523.542123 |
Vitoria | 724.103309 | 412.874702 | 945.370886 | 871.332211 | 388.004920 | 2877.110506 | 1173.859123 | 2300.907362 |
Vila Do Conde | 2447.390678 | 2447.041476 | 1593.895601 | 1719.067617 | 2083.991348 | 1267.605039 | 2902.031022 | 35.886867 |
Tubarao | 733.536010 | 421.815958 | 951.016346 | 865.639116 | 395.886973 | 2881.588731 | 1182.657631 | 2301.239995 |
Paranagua | 310.122786 | 613.685944 | 1072.847448 | 1775.347212 | 804.518079 | 2778.830902 | 241.366210 | 2680.325904 |
Guaiba Island Terminal | 253.451432 | 83.581854 | 896.378698 | 1294.218692 | 379.377843 | 2810.644009 | 690.358499 | 2461.628137 |
If we save the matrix...
D_Matrix_sea_air=seaports_brazil5880.geometry.apply \
(lambda seaport: large_airports.geometry.distance(seaport)/1000)
We can compute some distance stats from there:
Some summary of distances from each seaport to all large airports
Stat_sea_air=pd.DataFrame()
Stat_sea_air['mean']=D_Matrix_sea_air.mean(axis=1) # mean D to all airports
Stat_sea_air['min']=D_Matrix_sea_air.min(axis=1)# min D to all airports
Stat_sea_air['max']=D_Matrix_sea_air.max(axis=1)# max D to all airports
# see some
Stat_sea_air.head(10)
mean | min | max | |
---|---|---|---|
seaport_name | |||
Dtse / Gegua Oil Terminal | 1121.028458 | 10.350933 | 2854.493600 |
Porto De Mucuripe | 1965.136304 | 1053.738618 | 2901.295202 |
Portocel | 1223.134327 | 411.269111 | 2867.979624 |
Santa Clara | 1216.037957 | 579.203362 | 2189.659524 |
Aratu | 1381.773293 | 24.032516 | 2609.953079 |
Cabedelo | 1973.226516 | 786.022123 | 2830.123680 |
Imbituba | 1495.513055 | 63.612808 | 3042.768039 |
Angra Dos Reis | 1103.420427 | 111.795735 | 2791.717379 |
Natal | 2023.639130 | 891.421964 | 2853.930050 |
Guamare Oil Terminal | 2013.097102 | 953.130770 | 2887.478112 |
Of course, the idmax and idmin could come in handy:
Farthest airport to each seaport
D_Matrix_sea_air.idxmax(axis=1).head()
seaport_name Dtse / Gegua Oil Terminal Eduardo Gomes International Airport Porto De Mucuripe Hercílio Luz International Airport Portocel Eduardo Gomes International Airport Santa Clara Eduardo Gomes International Airport Aratu Eduardo Gomes International Airport dtype: object
Farthest seaport to each airport
D_Matrix_sea_air.idxmax(axis=0).head()
airport_name Guarulhos - Governador André Franco Montoro International Airport Manaus Rio Galeão – Tom Jobim International Airport Manaus Presidente Juscelino Kubitschek International Airport Manaus Deputado Luiz Eduardo Magalhães International Airport Manaus Tancredo Neves International Airport Manaus dtype: object
Closest airport to each seaport
D_Matrix_sea_air.idxmin(axis=1).head()
seaport_name Dtse / Gegua Oil Terminal Rio Galeão – Tom Jobim International Airport Porto De Mucuripe Deputado Luiz Eduardo Magalhães International ... Portocel Tancredo Neves International Airport Santa Clara Guarulhos - Governador André Franco Montoro In... Aratu Deputado Luiz Eduardo Magalhães International ... dtype: object
Closest seaport to each airport
# closest seaport to each airport
D_Matrix_sea_air.idxmin(axis=0).head()
airport_name Guarulhos - Governador André Franco Montoro International Airport Santos Rio Galeão – Tom Jobim International Airport Dtse / Gegua Oil Terminal Presidente Juscelino Kubitschek International Airport Santa Clara Deputado Luiz Eduardo Magalhães International Airport Port De Salvador Tancredo Neves International Airport Dtse / Gegua Oil Terminal dtype: object
b. Distance between line and point¶
Once we know understand how distance and idxmin/idxmax work, we can feel comfortable in this stage.
Let's use these rivers from before:
AmazonSystem_5880.set_index("RIVER",inplace=True)
AmazonSystem_5880
SYSTEM | geometry | |
---|---|---|
RIVER | ||
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... |
Guapore | Amazon | MULTILINESTRING ((3784022.891 8842606.845, 378... |
Japura | Amazon | MULTILINESTRING ((3782432.967 9748590.005, 382... |
Madeira | Amazon | MULTILINESTRING ((3784022.891 8842606.845, 378... |
Madre de Dios | Amazon | MULTILINESTRING ((3407019.812 8246998.613, 343... |
Purus | Amazon | MULTILINESTRING ((3008899.534 8748725.178, 305... |
Putamayo | Amazon | MULTILINESTRING ((2478636.973 10162886.59, 246... |
Rio Branco | Amazon | MULTILINESTRING ((4372657.932 10403233.815, 43... |
Rio Juruena | Amazon | MULTILINESTRING ((4488340.769 8382801.957, 456... |
Rio Maranon | Amazon | MULTILINESTRING ((2538298.165 8770999.896, 252... |
Rio Negro | Amazon | MULTILINESTRING ((3242119.495 10234498.68, 327... |
Rio Teles Pires | Amazon | MULTILINESTRING ((4890008.905 8388474.8, 49295... |
Tapajos | Amazon | MULTILINESTRING ((4956154.209 9738475.606, 490... |
Ucayali | Amazon | MULTILINESTRING ((3104425.924 8220993.168, 313... |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... |
Then,
Distance from river Tapajos to Guarulhos airport
airName='Guarulhos - Governador André Franco Montoro International Airport'
rivName='Tapajos'
AmazonSystem_5880.geometry.loc[rivName].distance(large_airports.geometry.loc[airName])/1000
2170.030197547702
We can compute the distance matrix now:
D_Matrix_amazRivs_air=AmazonSystem_5880.geometry.apply \
(lambda river: large_airports.geometry.distance(river)/1000)
Stat_amz_air=pd.DataFrame()
Stat_amz_air['mean']=D_Matrix_amazRivs_air.mean(axis=1) # mean D to all airports
Stat_amz_air['min']=D_Matrix_amazRivs_air.min(axis=1)# min D to all airports
Stat_amz_air['max']=D_Matrix_amazRivs_air.max(axis=1)# max D to all airports
# see some
Stat_amz_air.head(10)
mean | min | max | |
---|---|---|---|
RIVER | |||
Amazon | 1764.132480 | 25.549158 | 2865.306190 |
Guapore | 1822.694503 | 983.345006 | 2397.410373 |
Japura | 2495.652965 | 502.953032 | 3214.712005 |
Madeira | 1949.986161 | 153.127907 | 2652.416364 |
Madre de Dios | 2237.270472 | 983.345006 | 2934.051647 |
Purus | 2143.371707 | 150.306441 | 2848.394230 |
Putamayo | 2784.484656 | 853.437589 | 3437.832352 |
Rio Branco | 2353.394115 | 264.452689 | 3241.167974 |
Rio Juruena | 1498.931579 | 525.908802 | 2139.336751 |
Rio Maranon | 3216.884463 | 1479.515487 | 3961.463537 |
# closest river to each airport
D_Matrix_amazRivs_air.idxmin(axis=0).head()
airport_name Guarulhos - Governador André Franco Montoro International Airport Xingu Rio Galeão – Tom Jobim International Airport Xingu Presidente Juscelino Kubitschek International Airport Xingu Deputado Luiz Eduardo Magalhães International Airport Xingu Tancredo Neves International Airport Xingu dtype: object
# farthest river to each airport
D_Matrix_amazRivs_air.idxmax(axis=0).head()
airport_name Guarulhos - Governador André Franco Montoro International Airport Rio Maranon Rio Galeão – Tom Jobim International Airport Rio Maranon Presidente Juscelino Kubitschek International Airport Rio Maranon Deputado Luiz Eduardo Magalhães International Airport Rio Maranon Tancredo Neves International Airport Rio Maranon dtype: object
c. Between Polygon and Point¶
Let me re use the world rivers to get the rivers in a couple of systems:
river_systems=world_rivers.query("SYSTEM in ['Amazon','Parana']")
river_systems
RIVER | SYSTEM | geometry | |
---|---|---|---|
1 | Amazon | Amazon | MULTILINESTRING ((-61.2773 -3.60706, -60.68466... |
24 | Guapore | Amazon | MULTILINESTRING ((-65.1024 -10.27593, -65.1201... |
29 | Japura | Amazon | MULTILINESTRING ((-64.94595 -2.23269, -64.5731... |
37 | Madeira | Amazon | MULTILINESTRING ((-65.1024 -10.27593, -65.0844... |
38 | Madre de Dios | Amazon | MULTILINESTRING ((-68.84791 -15.35547, -68.594... |
55 | Paraguay | Parana | MULTILINESTRING ((-53.47152 -16.67963, -54.309... |
56 | Parana | Parana | MULTILINESTRING ((-58.48091 -27.30186, -58.402... |
59 | Purus | Amazon | MULTILINESTRING ((-72.21624 -10.77936, -71.803... |
60 | Putamayo | Amazon | MULTILINESTRING ((-76.65652 1.36565, -76.77346... |
62 | Rio Branco | Amazon | MULTILINESTRING ((-59.64679 3.62898, -60.19013... |
64 | Rio Juruena | Amazon | MULTILINESTRING ((-58.74846 -14.57408, -58.045... |
65 | Rio Maranon | Amazon | MULTILINESTRING ((-76.49432 -10.32964, -76.523... |
66 | Rio Negro | Amazon | MULTILINESTRING ((-69.80154 2.04259, -69.48542... |
67 | Rio Paranaiba | Parana | MULTILINESTRING ((-46.02263 -19.02853, -46.130... |
68 | Rio Teles Pires | Amazon | MULTILINESTRING ((-55.02068 -14.5688, -54.6526... |
80 | Tapajos | Amazon | MULTILINESTRING ((-54.39421 -2.36508, -54.8251... |
85 | Ucayali | Amazon | MULTILINESTRING ((-71.67654 -15.38325, -71.373... |
92 | Xingu | Amazon | MULTILINESTRING ((-54.72902 -15.05463, -53.670... |
97 | Rio Grande, South America | Parana | MULTILINESTRING ((-44.37152 -22.17241, -43.860... |
Let me combine per system:
ama_para=river_systems.dissolve(by='SYSTEM')
ama_para.drop(columns='RIVER',inplace=True)
ama_para
geometry | |
---|---|
SYSTEM | |
Amazon | MULTILINESTRING ((-61.2773 -3.60706, -60.68466... |
Parana | MULTILINESTRING ((-53.47152 -16.67963, -54.309... |
We still have lines:
ama_para.plot(cmap='viridis')
<Axes: >
But we will have polygons after this:
ama_para.convex_hull.plot(cmap='viridis')
<Axes: >
As we have a geoseries of two geometries...
ama_para.convex_hull,type(ama_para.convex_hull)
(SYSTEM Amazon POLYGON ((-68.59431 -15.48713, -71.67654 -15.3... Parana POLYGON ((-58.13602 -34.19917, -59.41318 -33.6... dtype: geometry, geopandas.geoseries.GeoSeries)
Let's turn that into a GDF:
ama_para_hulls=ama_para.convex_hull.to_frame()
ama_para_hulls.rename(columns={0:'geometry'},inplace=True)
ama_para_hulls=ama_para_hulls.set_geometry('geometry')
ama_para_hulls.crs="EPSG:5880"
#voila
ama_para_hulls
geometry | |
---|---|
SYSTEM | |
Amazon | POLYGON ((-68.594 -15.487, -71.677 -15.383, -7... |
Parana | POLYGON ((-58.136 -34.199, -59.413 -33.614, -6... |
And now, the distance matrix:
D_Matrix_SYSHulls_air=ama_para_hulls.geometry.apply \
(lambda system: large_airports.geometry.distance(system)/1000)
D_Matrix_SYSHulls_air
airport_name | Guarulhos - Governador André Franco Montoro International Airport | Rio Galeão – Tom Jobim International Airport | Presidente Juscelino Kubitschek International Airport | Deputado Luiz Eduardo Magalhães International Airport | Tancredo Neves International Airport | Eduardo Gomes International Airport | Hercílio Luz International Airport | Val de Cans/Júlio Cezar Ribeiro International Airport |
---|---|---|---|---|---|---|---|---|
SYSTEM | ||||||||
Amazon | 9373.464024 | 9619.902769 | 9987.801447 | 10839.402030 | 9870.020232 | 10586.985788 | 8867.621633 | 11335.110218 |
Parana | 9373.473877 | 9619.912262 | 9987.812192 | 10839.411834 | 9870.030130 | 10587.000384 | 8867.631318 | 11335.122740 |
From here, you can compute distances between other kinds of geometries.
II. Clipping¶
Clipping uses a GDF geometry as a MASK to cut another GDF which suppossedly is bigger and needes to be clipped.
Pay attention to the world rivers again:
world_rivers
RIVER | SYSTEM | geometry | |
---|---|---|---|
0 | Aldan | Lena | MULTILINESTRING ((124.00678 56.47258, 123.2595... |
1 | Amazon | Amazon | MULTILINESTRING ((-61.2773 -3.60706, -60.68466... |
2 | Amu Darya | None | MULTILINESTRING ((73.98818 37.49952, 73.52595 ... |
3 | Amur | None | MULTILINESTRING ((122.63956 49.9973, 120.47874... |
4 | Angara | None | MULTILINESTRING ((105.07841 51.93053, 103.9295... |
... | ... | ... | ... |
93 | Yangtze | None | MULTILINESTRING ((119.82609 32.24864, 118.9707... |
94 | Yenisey | None | MULTILINESTRING ((98.94706 52.57675, 98.12095 ... |
95 | Yukon | None | MULTILINESTRING ((-130.89319 59.2448, -131.065... |
96 | Zambezi | None | MULTILINESTRING ((35.52866 -17.66773, 36.28055... |
97 | Rio Grande, South America | Parana | MULTILINESTRING ((-44.37152 -22.17241, -43.860... |
98 rows × 3 columns
As you see, this GDF has no Country column. But since it has geometry, you can keep the rivers, or their sections, that serve a country:
rivers_brazil5880 = gpd.clip(gdf=world_rivers.to_crs(5880),
mask=brazil5880)
See differences:
- input 1: Rivers
world_rivers.plot()
<Axes: >
- input 2: mask (Brazil)
brazil5880.plot(facecolor="greenyellow")
<Axes: >
Output: clipped rivers
rivers_brazil5880.plot()
<Axes: >
base = brazil5880.plot(facecolor="greenyellow", edgecolor='black', linewidth=0.4,figsize=(5,5))
rivers_brazil5880.plot(edgecolor='blue', linewidth=0.5,
ax=base)
<Axes: >
We can create our own mask for clipping:
Let me get the bounding box of the map (the smallest possible rectangle that completely encloses a geometric shape or set of shapes):
brazil5880.total_bounds #[minx, miny, maxx, maxy]
array([ 2793074.63914733, 6264891.06203913, 7120881.08835731, 10586462.14322563])
# saving the output
minx, miny, maxx, maxy=brazil5880.total_bounds
minx, miny, maxx, maxy
(np.float64(2793074.6391473296), np.float64(6264891.0620391285), np.float64(7120881.088357305), np.float64(10586462.143225627))
I will combine those coordinates with the centroid (mid_x,mid_y)to create a BOX of the north and south of Brazil:
north_mask = [minx, mid_y, maxx, maxy]
south_mask = [minx, minx, maxx, mid_y]
# split Brazil
states_brazil5880.clip(north_mask).plot(edgecolor="yellow")
<Axes: >
states_brazil5880.clip(south_mask).plot(edgecolor="yellow")
<Axes: >
As you see, with clip we can cut polygons (not respecting the borders).
III. Spatial Joins¶
We’re familiar with merging, which joins tables using common keys. Spatial joins, by contrast, rely solely on geometry columns to perform various types of filtering.
Let me use the brazilian large airports and states:
large_airports.head()
airport_type | elevation_ft | region | municipality | geometry | |
---|---|---|---|---|---|
airport_name | |||||
Guarulhos - Governador André Franco Montoro International Airport | large_airport | 2461.0 | São Paulo | São Paulo | POINT (5769392.959 7387510.487) |
Rio Galeão – Tom Jobim International Airport | large_airport | 28.0 | Rio de Janeiro | Rio De Janeiro | POINT (6102623.812 7436387.271) |
Presidente Juscelino Kubitschek International Airport | large_airport | 3497.0 | Distrito Federal | Brasília | POINT (5651010.181 8235390.09) |
Deputado Luiz Eduardo Magalhães International Airport | large_airport | 64.0 | Bahia | Salvador | POINT (6700330.807 8520415.591) |
Tancredo Neves International Airport | large_airport | 2721.0 | Minas Gerais | Belo Horizonte | POINT (6051718.72 7797008.905) |
...and:
states_brazil5880.head()
state_name | state_code | geometry | |
---|---|---|---|
0 | Acre | BR12 | MULTIPOLYGON (((3374854.317 8740996.704, 33745... |
1 | Alagoas | BR27 | MULTIPOLYGON (((7038407.237 8973545.06, 703823... |
2 | Amapá | BR16 | MULTIPOLYGON (((5393641.625 10233907.833, 5394... |
3 | Amazonas | BR13 | MULTIPOLYGON (((4499820.936 9906611.648, 45003... |
4 | Bahia | BR29 | MULTIPOLYGON (((6618113.779 7946308.264, 66178... |
# as before set index for 'states'
states_brazil5880.set_index('state_name',inplace=True)
states_brazil5880.head()
state_code | geometry | |
---|---|---|
state_name | ||
Acre | BR12 | MULTIPOLYGON (((3374854.317 8740996.704, 33745... |
Alagoas | BR27 | MULTIPOLYGON (((7038407.237 8973545.06, 703823... |
Amapá | BR16 | MULTIPOLYGON (((5393641.625 10233907.833, 5394... |
Amazonas | BR13 | MULTIPOLYGON (((4499820.936 9906611.648, 45003... |
Bahia | BR29 | MULTIPOLYGON (((6618113.779 7946308.264, 66178... |
a. Within¶
Let's ask:
The large airports whose geometries are within the borders of a state in Brazil.
airports_within_states = gpd.sjoin(
large_airports, # LEFT: airports we want to filter/keep
states_brazil5880, # RIGHT: spatial boundaries to check against
how='inner', # return geometries that match in both LEFT/RIGHT (jointype)
predicate='within' # spatial condition: LEFT geometry within RIGHT geometry
)
# these are:
airports_within_states
airport_type | elevation_ft | region | municipality | geometry | state_name | state_code | |
---|---|---|---|---|---|---|---|
airport_name | |||||||
Guarulhos - Governador André Franco Montoro International Airport | large_airport | 2461.0 | São Paulo | São Paulo | POINT (5769392.959 7387510.487) | São Paulo | BR35 |
Rio Galeão – Tom Jobim International Airport | large_airport | 28.0 | Rio de Janeiro | Rio De Janeiro | POINT (6102623.812 7436387.271) | Rio de Janeiro | BR33 |
Presidente Juscelino Kubitschek International Airport | large_airport | 3497.0 | Distrito Federal | Brasília | POINT (5651010.181 8235390.09) | Distrito Federal | BR53 |
Deputado Luiz Eduardo Magalhães International Airport | large_airport | 64.0 | Bahia | Salvador | POINT (6700330.807 8520415.591) | Bahia | BR29 |
Tancredo Neves International Airport | large_airport | 2721.0 | Minas Gerais | Belo Horizonte | POINT (6051718.72 7797008.905) | Minas Gerais | BR31 |
Eduardo Gomes International Airport | large_airport | 264.0 | Amazonas | Manaus | POINT (4327494.283 9662122.706) | Amazonas | BR13 |
Hercílio Luz International Airport | large_airport | 16.0 | Santa Catarina | Florianópolis | POINT (5537273.577 6926234.097) | Santa Catarina | BR42 |
Val de Cans/Júlio Cezar Ribeiro International Airport | large_airport | 54.0 | Pará | Belém | POINT (5614728.302 9846773.655) | Pará | BR15 |
We just performed a point-to-polygon spatial join. Notice that the result preserves the original geometries from the LEFT GeoDataFrame — specifically, only those features whose spatial relationship satisfied both the predicate (e.g., 'within') and the join type ('inner'). The non-geometric attributes (columns) from the RIGHT GeoDataFrame are joined to the matching rows.
Notice that, the request gave you points. I will add all the states below:
base=states_brazil5880.explore(edgecolor='black',style_kwds={'fillOpacity': 0.1,'color':'grey'})
airports_within_states.explore(color='red',m=base)
b. Contains¶
Importantly, if the LEFT GeoDataFrame geometries are polygons and the RIGHT one points (a polygon-to-point join), you’ll typically need to use a different predicate — such as 'contains' — to express the spatial relationship correctly.
Brazilian states that house large airports
states_containing_LargeAirports = gpd.sjoin(states_brazil5880,large_airports,how='inner',
predicate='contains')
states_containing_LargeAirports
state_code | geometry | airport_name | airport_type | elevation_ft | region | municipality | |
---|---|---|---|---|---|---|---|
state_name | |||||||
Amazonas | BR13 | MULTIPOLYGON (((4499820.936 9906611.648, 45003... | Eduardo Gomes International Airport | large_airport | 264.0 | Amazonas | Manaus |
Bahia | BR29 | MULTIPOLYGON (((6618113.779 7946308.264, 66178... | Deputado Luiz Eduardo Magalhães International ... | large_airport | 64.0 | Bahia | Salvador |
Distrito Federal | BR53 | MULTIPOLYGON (((5709200.339 8232316.484, 57090... | Presidente Juscelino Kubitschek International ... | large_airport | 3497.0 | Distrito Federal | Brasília |
Minas Gerais | BR31 | MULTIPOLYGON (((5730280.854 8106234.254, 57299... | Tancredo Neves International Airport | large_airport | 2721.0 | Minas Gerais | Belo Horizonte |
Pará | BR15 | MULTIPOLYGON (((5842059.053 9886741.08, 584156... | Val de Cans/Júlio Cezar Ribeiro International ... | large_airport | 54.0 | Pará | Belém |
Rio de Janeiro | BR33 | MULTIPOLYGON (((5957764.631 7386508.56, 595773... | Rio Galeão – Tom Jobim International Airport | large_airport | 28.0 | Rio de Janeiro | Rio De Janeiro |
Santa Catarina | BR42 | MULTIPOLYGON (((5384989.919 6745543.64, 538466... | Hercílio Luz International Airport | large_airport | 16.0 | Santa Catarina | Florianópolis |
São Paulo | BR35 | MULTIPOLYGON (((5619062.728 7205958.887, 56186... | Guarulhos - Governador André Franco Montoro In... | large_airport | 2461.0 | São Paulo | São Paulo |
As requested, you got polygons (not the airports). I will add airports for reference.
base=states_containing_LargeAirports.explore(color='red')
large_airports.explore(color='white',m=base)
c. Intersects¶
'Contains' and 'within' are literally strict: Any airport located exactly on a state boundary — whether due to data precision, snapping, or real geography — will be excluded, even if it’s “practically” inside the state. More flexibility is achieved with intersects.
A point on the border of Rio de Janeiro, can be detected if I use intersect!
# this is RDJ:
Rio=states_brazil5880.loc[['Rio de Janeiro'],:]
#this a point on its border
from shapely.geometry import Point
coordinates=Rio.geometry.iloc[0].boundary.geoms[0].coords[1]
aRDJ_PointInBorder=Point(coordinates)
aRDJ_borderPoint = gpd.GeoDataFrame({'name': ['RioJaneiro Border Point']},
geometry=[aRDJ_PointInBorder],
crs=states_brazil5880.crs)
## Intersects result
gpd.sjoin(aRDJ_borderPoint,Rio,
how='inner', predicate='intersects')
name | geometry | state_name | state_code | |
---|---|---|---|---|
0 | RioJaneiro Border Point | POINT (5957732.828 7386016.767) | Rio de Janeiro | BR33 |
A point on the border of Rio de Janeiro, can NOT be detected if I use within / contains
## within should return no rows
gpd.sjoin(aRDJ_borderPoint,Rio,
how='inner', predicate='within')
name | geometry | state_name | state_code |
---|
## contains should return no rows
gpd.sjoin(Rio,aRDJ_borderPoint,
how='inner', predicate='contains')
state_code | geometry | index_right | name | |
---|---|---|---|---|
state_name |
We knew this:
# we need crs4326 for explore:
aRDJ_borderPoint_latlon = aRDJ_borderPoint.to_crs(4326).geometry.iloc[0]
base=Rio.explore(zoom_start=20,location=[aRDJ_borderPoint_latlon.y, aRDJ_borderPoint_latlon.x])
aRDJ_borderPoint.explore(m=base, color='red')
d. Touches¶
We also have 'touches', a more stringent predicate than 'intersects'. It returns geometries that:
- Share a only a border (for polygons or lines), or
- Have only one tangent point in common.
Which states are neighbors of 'BAHIA", including BAHIA
# Neighbors of Bahia?
gpd.sjoin(N_brazil.loc[N_brazil.state_name=='Bahia',:],N_brazil,how='inner', predicate='intersects').shape
(6, 6)
That is, Bahia seems to share borders with 5 states:
base=gpd.sjoin(N_brazil,N_brazil.loc[N_brazil.state_name=='Bahia',:],
how='inner',
predicate='intersects').plot(color='yellow',edgecolor='red')
N_brazil.loc[N_brazil.state_name=='Bahia',:].plot(ax=base, color='red')
<Axes: >
However, because many free GeoDataFrames — especially those sourced as Shapefiles — contain topological imperfections like gaps, overlaps, or misaligned vertices, 'touches' often fails to detect what should be adjacent features. Ironically, this “failure” can be useful: 'touches' acts as a diagnostic tool — highlighting where boundaries are not perfectly aligned.
gpd.sjoin(N_brazil.loc[N_brazil.state_name=='Bahia',:],N_brazil,how='inner', predicate='touches').shape
(4, 6)
See the neighbor that disappears:
base=gpd.sjoin(N_brazil,N_brazil.loc[N_brazil.state_name=='Bahia',:],
how='inner',
predicate='touches').plot(color='yellow',edgecolor='red')
N_brazil.loc[N_brazil.state_name=='Bahia',:].plot(ax=base, color='red')
<Axes: >
e. Crosses¶
When we have lines, we may need crosses. Let me subset our rivers:
amazonSystem=rivers_brazil5880[rivers_brazil5880.SYSTEM=='Amazon']
amazonSystem.set_index('RIVER',inplace=True)
Then,
Which rivers from the Amazon system are intersecting states?
gpd.sjoin(amazonSystem,states_brazil5880,how='inner', predicate='intersects')
SYSTEM | geometry | state_name | state_code | |
---|---|---|---|---|
RIVER | ||||
Madre de Dios | Amazon | LINESTRING (3747782.967 8816805.526, 3784022.8... | Rondônia | BR11 |
Guapore | Amazon | MULTILINESTRING ((3784022.891 8842606.845, 378... | Rondônia | BR11 |
Rio Juruena | Amazon | LINESTRING (4488340.769 8382801.957, 4561243.9... | Mato Grosso | BR51 |
Rio Juruena | Amazon | LINESTRING (4488340.769 8382801.957, 4561243.9... | Pará | BR15 |
Rio Teles Pires | Amazon | LINESTRING (4890008.905 8388474.8, 4929569.193... | Mato Grosso | BR51 |
Rio Teles Pires | Amazon | LINESTRING (4890008.905 8388474.8, 4929569.193... | Pará | BR15 |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... | Mato Grosso | BR51 |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... | Pará | BR15 |
Purus | Amazon | LINESTRING (3179817.343 8884480.014, 3233812.4... | Acre | BR12 |
Purus | Amazon | LINESTRING (3179817.343 8884480.014, 3233812.4... | Amazonas | BR13 |
Madeira | Amazon | LINESTRING (3784022.891 8842606.845, 3785447.5... | Rondônia | BR11 |
Madeira | Amazon | LINESTRING (3784022.891 8842606.845, 3785447.5... | Amazonas | BR13 |
Tapajos | Amazon | LINESTRING (4956154.209 9738475.606, 4908229.1... | Pará | BR15 |
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... | Amazonas | BR13 |
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... | Pará | BR15 |
Putamayo | Amazon | LINESTRING (3255840.239 9675268.883, 3294855.9... | Amazonas | BR13 |
Japura | Amazon | MULTILINESTRING ((3782432.967 9748590.005, 382... | Amazonas | BR13 |
Rio Negro | Amazon | MULTILINESTRING ((3592372.872 10116670.869, 35... | Amazonas | BR13 |
Rio Negro | Amazon | MULTILINESTRING ((3592372.872 10116670.869, 35... | Roraima | BR14 |
Rio Branco | Amazon | LINESTRING (4352974.168 10393701.397, 4312100.... | Roraima | BR14 |
A count of the result:
gpd.sjoin(amazonSystem,states_brazil5880,how='inner', predicate='intersects').shape
(20, 4)
Alternatively,
Which rivers from the Amazon system are crossing states?
gpd.sjoin(amazonSystem,states_brazil5880,how='inner', predicate='crosses')
SYSTEM | geometry | state_name | state_code | |
---|---|---|---|---|
RIVER | ||||
Madre de Dios | Amazon | LINESTRING (3747782.967 8816805.526, 3784022.8... | Rondônia | BR11 |
Guapore | Amazon | MULTILINESTRING ((3784022.891 8842606.845, 378... | Rondônia | BR11 |
Rio Juruena | Amazon | LINESTRING (4488340.769 8382801.957, 4561243.9... | Mato Grosso | BR51 |
Rio Juruena | Amazon | LINESTRING (4488340.769 8382801.957, 4561243.9... | Pará | BR15 |
Rio Teles Pires | Amazon | LINESTRING (4890008.905 8388474.8, 4929569.193... | Mato Grosso | BR51 |
Rio Teles Pires | Amazon | LINESTRING (4890008.905 8388474.8, 4929569.193... | Pará | BR15 |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... | Mato Grosso | BR51 |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... | Pará | BR15 |
Purus | Amazon | LINESTRING (3179817.343 8884480.014, 3233812.4... | Acre | BR12 |
Purus | Amazon | LINESTRING (3179817.343 8884480.014, 3233812.4... | Amazonas | BR13 |
Madeira | Amazon | LINESTRING (3784022.891 8842606.845, 3785447.5... | Rondônia | BR11 |
Madeira | Amazon | LINESTRING (3784022.891 8842606.845, 3785447.5... | Amazonas | BR13 |
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... | Amazonas | BR13 |
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... | Pará | BR15 |
Putamayo | Amazon | LINESTRING (3255840.239 9675268.883, 3294855.9... | Amazonas | BR13 |
Japura | Amazon | MULTILINESTRING ((3782432.967 9748590.005, 382... | Amazonas | BR13 |
Rio Negro | Amazon | MULTILINESTRING ((3592372.872 10116670.869, 35... | Amazonas | BR13 |
Rio Negro | Amazon | MULTILINESTRING ((3592372.872 10116670.869, 35... | Roraima | BR14 |
Rio Branco | Amazon | LINESTRING (4352974.168 10393701.397, 4312100.... | Roraima | BR14 |
You got one less:
gpd.sjoin(amazonSystem,states_brazil5880,how='inner', predicate='crosses').shape
(19, 4)
Again, intersects means both geometries have some 'space' in common. But crosses is an intersection that has to cross the spatial object. From the result above, there is one river that shares space with the state, but is not crossing its border:
# Get intersects result
intersects_result = gpd.sjoin(amazonSystem,states_brazil5880, how='inner', predicate='intersects')
intersects_result
SYSTEM | geometry | state_name | state_code | |
---|---|---|---|---|
RIVER | ||||
Madre de Dios | Amazon | LINESTRING (3747782.967 8816805.526, 3784022.8... | Rondônia | BR11 |
Guapore | Amazon | MULTILINESTRING ((3784022.891 8842606.845, 378... | Rondônia | BR11 |
Rio Juruena | Amazon | LINESTRING (4488340.769 8382801.957, 4561243.9... | Mato Grosso | BR51 |
Rio Juruena | Amazon | LINESTRING (4488340.769 8382801.957, 4561243.9... | Pará | BR15 |
Rio Teles Pires | Amazon | LINESTRING (4890008.905 8388474.8, 4929569.193... | Mato Grosso | BR51 |
Rio Teles Pires | Amazon | LINESTRING (4890008.905 8388474.8, 4929569.193... | Pará | BR15 |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... | Mato Grosso | BR51 |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... | Pará | BR15 |
Purus | Amazon | LINESTRING (3179817.343 8884480.014, 3233812.4... | Acre | BR12 |
Purus | Amazon | LINESTRING (3179817.343 8884480.014, 3233812.4... | Amazonas | BR13 |
Madeira | Amazon | LINESTRING (3784022.891 8842606.845, 3785447.5... | Rondônia | BR11 |
Madeira | Amazon | LINESTRING (3784022.891 8842606.845, 3785447.5... | Amazonas | BR13 |
Tapajos | Amazon | LINESTRING (4956154.209 9738475.606, 4908229.1... | Pará | BR15 |
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... | Amazonas | BR13 |
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... | Pará | BR15 |
Putamayo | Amazon | LINESTRING (3255840.239 9675268.883, 3294855.9... | Amazonas | BR13 |
Japura | Amazon | MULTILINESTRING ((3782432.967 9748590.005, 382... | Amazonas | BR13 |
Rio Negro | Amazon | MULTILINESTRING ((3592372.872 10116670.869, 35... | Amazonas | BR13 |
Rio Negro | Amazon | MULTILINESTRING ((3592372.872 10116670.869, 35... | Roraima | BR14 |
Rio Branco | Amazon | LINESTRING (4352974.168 10393701.397, 4312100.... | Roraima | BR14 |
# Get crosses result
crosses_result = gpd.sjoin(amazonSystem,states_brazil5880, how='inner', predicate='crosses')
crosses_result
SYSTEM | geometry | state_name | state_code | |
---|---|---|---|---|
RIVER | ||||
Madre de Dios | Amazon | LINESTRING (3747782.967 8816805.526, 3784022.8... | Rondônia | BR11 |
Guapore | Amazon | MULTILINESTRING ((3784022.891 8842606.845, 378... | Rondônia | BR11 |
Rio Juruena | Amazon | LINESTRING (4488340.769 8382801.957, 4561243.9... | Mato Grosso | BR51 |
Rio Juruena | Amazon | LINESTRING (4488340.769 8382801.957, 4561243.9... | Pará | BR15 |
Rio Teles Pires | Amazon | LINESTRING (4890008.905 8388474.8, 4929569.193... | Mato Grosso | BR51 |
Rio Teles Pires | Amazon | LINESTRING (4890008.905 8388474.8, 4929569.193... | Pará | BR15 |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... | Mato Grosso | BR51 |
Xingu | Amazon | MULTILINESTRING ((4921614.129 8334835.678, 503... | Pará | BR15 |
Purus | Amazon | LINESTRING (3179817.343 8884480.014, 3233812.4... | Acre | BR12 |
Purus | Amazon | LINESTRING (3179817.343 8884480.014, 3233812.4... | Amazonas | BR13 |
Madeira | Amazon | LINESTRING (3784022.891 8842606.845, 3785447.5... | Rondônia | BR11 |
Madeira | Amazon | LINESTRING (3784022.891 8842606.845, 3785447.5... | Amazonas | BR13 |
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... | Amazonas | BR13 |
Amazon | Amazon | MULTILINESTRING ((4191497.619 9597916.381, 425... | Pará | BR15 |
Putamayo | Amazon | LINESTRING (3255840.239 9675268.883, 3294855.9... | Amazonas | BR13 |
Japura | Amazon | MULTILINESTRING ((3782432.967 9748590.005, 382... | Amazonas | BR13 |
Rio Negro | Amazon | MULTILINESTRING ((3592372.872 10116670.869, 35... | Amazonas | BR13 |
Rio Negro | Amazon | MULTILINESTRING ((3592372.872 10116670.869, 35... | Roraima | BR14 |
Rio Branco | Amazon | LINESTRING (4352974.168 10393701.397, 4312100.... | Roraima | BR14 |
river_notCrossing=list(set(intersects_result.index)-set(crosses_result.index))
river_notCrossing
['Tapajos']
# Find indexes/columns
state_notCrossed=intersects_result.loc[river_notCrossing,'state_name'].to_list()
state_notCrossed
['Pará']
Now we know the river that is not crossing an state, and the name of that state.
base=states_brazil5880.loc[state_notCrossed,:].plot(color='w',edgecolor='k',figsize=(12, 8))
amazonSystem.plot(ax=base)
amazonSystem.loc[river_notCrossing,:].plot(color='red',ax=base)
<Axes: >
IV. Spatial Overlay¶
As the name implies, you need two inputs. We might need to create or find some geometries from the geometries we already have. Using a set theory approach, we will see the use of intersection, union, difference, and symmetric difference. Let's remember these results:
# Create a figure and a 2x2 grid of axes
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 10))
axes = axes.flatten()
N_brazil.plot(color='pink',edgecolor='k',ax=axes[0])
axes[0].set_title("1. North")
S_brazil.plot(color='pink',edgecolor='k',ax=axes[1])
axes[1].set_title("2. South")
W_brazil.plot(color='pink',edgecolor='k',ax=axes[2])
axes[2].set_title("3. West")
E_brazil.plot(color='pink',edgecolor='k',ax=axes[3])
axes[3].set_title("4. East")
plt.show()
Let's play with these GDFs, keep in mind the position of the GDFs:
GDFleft.overlay(GDFright, ...)
NS_brazil=N_brazil.overlay(S_brazil, how="intersection",keep_geom_type=True)
NS_brazil.plot(color='pink',edgecolor='k')
<Axes: >
Intersection of West and East
WE_brazil=W_brazil.overlay(E_brazil, how="intersection",keep_geom_type=True)
WE_brazil.plot(color='pink',edgecolor='k')
<Axes: >
b. Union¶
In the union overlay, we do not return dissolved geometries. Instead, we return a complete set of new geometries that represent every unique spatial combination of the two input GeoDataFrames. The resulting GeoDataFrame will contain the columns from both GDFs, and the row count will be the sum of all parts of A, all parts of B, and the newly created intersection parts, potentially inflated by topological flaws
Unite the West/East intersection with the North/South intersection
mid_Brazil=NS_brazil.overlay(WE_brazil,how="union",keep_geom_type=True)
mid_Brazil.plot(color='pink',edgecolor='k')
<Axes: >
c. Difference¶
Here, you keep what belongs to the left GDF that is not in the right GDF.
Keep nothern states that are not in the southern ones
NS_diff_brazil=N_brazil.overlay(S_brazil, how='difference')
NS_diff_brazil.plot(color='pink',edgecolor='k')
<Axes: >
d. Symmetric Difference¶
Here, we keep what is not in the intersection but in both GDFs.
Unite Northern and Southern states, but keep states that are not in their intersection.
NS_simdiff_brazil=N_brazil.overlay(S_brazil, how='symmetric_difference')
NS_simdiff_brazil.plot(color='pink',edgecolor='k')
<Axes: >
e. Cleaning and Overlay¶
Overlay and SJoin differ in that overlay, as it creates geometries, may not output clean results. Not because overlay fault, but due to the quality of the original maps.
Let me use the intersection as our topological detective.
- Using set operations: What are the common states between northern and souther states.
setIntersection=set(N_brazil.state_name) & set(S_brazil.state_name)
# how many, which are they
len(setIntersection),setIntersection
(8, {'Acre', 'Alagoas', 'Bahia', 'Mato Grosso', 'Piauí', 'Rondônia', 'Sergipe', 'Tocantins'})
- The spatial overlay with keep_geom_type as True:
NS_brazil_T=N_brazil.overlay(S_brazil, how="intersection",keep_geom_type=True)
# different from 8 (see above)
NS_brazil_T.shape[0]
11
- The spatial overlay with keep_geom_type as False:
NS_brazil_F=N_brazil.overlay(S_brazil, how="intersection",keep_geom_type=False)
# different from 8 and 11 (see above)
NS_brazil_F.shape[0]
43
As you see, the intersection that we plotted before had several topological issues we could not see with bare eyes. There were several additional geometries that represent flaws. When we used "keep_geom_type=True" the results were polygons (left GDF geometries were polygons); but when we asked "keep_geom_type=False" the result may include polygons and more (even collections). Let's see this last case:
NS_brazil_F.geometry.geom_type.value_counts()
MultiLineString 32 MultiPolygon 4 Polygon 4 GeometryCollection 3 Name: count, dtype: int64
Can this be fixed?
Let's see what can be done:
- review the previous intersection:
NS_brazil_T
state_name_1 | state_code_1 | state_name_2 | state_code_2 | geometry | |
---|---|---|---|---|---|
0 | Acre | BR12 | Acre | BR12 | MULTIPOLYGON (((3374472.826 8741412.368, 33746... |
1 | Alagoas | BR27 | Alagoas | BR27 | POLYGON ((7038233.546 8973146.179, 7038561.661... |
2 | Bahia | BR29 | Bahia | BR29 | MULTIPOLYGON (((6617883.775 7945834.225, 66175... |
3 | Bahia | BR29 | Minas Gerais | BR31 | POLYGON ((6473422.295 8203724.413, 6472353.598... |
4 | Mato Grosso | BR51 | Mato Grosso | BR51 | MULTIPOLYGON (((4718500.714 8080705.175, 47182... |
5 | Pernambuco | BR26 | Alagoas | BR27 | POLYGON ((6965686.284 8954189.214, 6965358.315... |
6 | Pernambuco | BR26 | Bahia | BR29 | POLYGON ((6611335.359 9024941.961, 6611714.257... |
7 | Piauí | BR22 | Piauí | BR22 | POLYGON ((6358809.561 9687374.554, 6358994.184... |
8 | Rondônia | BR11 | Rondônia | BR11 | MULTIPOLYGON (((4336668.469 8680873.21, 433691... |
9 | Sergipe | BR28 | Sergipe | BR28 | POLYGON ((6758965.277 8905155.665, 6759481.274... |
10 | Tocantins | BR17 | Tocantins | BR17 | POLYGON ((5765755.273 8998397.361, 5765506.449... |
- Detect problematic rows
NS_brazil_T[NS_brazil_T.state_name_1!=NS_brazil_T.state_name_2]
state_name_1 | state_code_1 | state_name_2 | state_code_2 | geometry | |
---|---|---|---|---|---|
3 | Bahia | BR29 | Minas Gerais | BR31 | POLYGON ((6473422.295 8203724.413, 6472353.598... |
5 | Pernambuco | BR26 | Alagoas | BR27 | POLYGON ((6965686.284 8954189.214, 6965358.315... |
6 | Pernambuco | BR26 | Bahia | BR29 | POLYGON ((6611335.359 9024941.961, 6611714.257... |
Pernambuco and Minas Gerais must have some of their points "invading" Bahia and Alagoas polygon. The strategy here is that Bahia and Aalagoas will not push away the invading points, but to 'abandon' the area in conflict (difference) for the sake of clean limits (dissolve).
- We need to work with the original states. Let's make a copy of them before cleaning:
states_brazil_clean=states_brazil5880.copy()
states_brazil_clean.head()
state_code | geometry | |
---|---|---|
state_name | ||
Acre | BR12 | MULTIPOLYGON (((3374854.317 8740996.704, 33745... |
Alagoas | BR27 | MULTIPOLYGON (((7038407.237 8973545.06, 703823... |
Amapá | BR16 | MULTIPOLYGON (((5393641.625 10233907.833, 5394... |
Amazonas | BR13 | MULTIPOLYGON (((4499820.936 9906611.648, 45003... |
Bahia | BR29 | MULTIPOLYGON (((6618113.779 7946308.264, 66178... |
- Let's dissolve Pernambuco and Minas Gerais, the invaders:
alienStates=["Minas Gerais","Pernambuco"]
alienUnion_GDF=states_brazil_clean[states_brazil_clean.index.isin(alienStates)].dissolve()
alienUnion_GDF
geometry | state_code | |
---|---|---|
0 | MULTIPOLYGON (((5715427.146 8093800.386, 57154... | BR31 |
- Keep the states to be cleaned from the 'alienStates', no dissolving - just filtering:
forCleaning=["Alagoas","Bahia"] #order matters
forCleaning_GDF=states_brazil_clean[states_brazil_clean.index.isin(forCleaning)]
forCleaning_GDF
state_code | geometry | |
---|---|---|
state_name | ||
Alagoas | BR27 | MULTIPOLYGON (((7038407.237 8973545.06, 703823... |
Bahia | BR29 | MULTIPOLYGON (((6618113.779 7946308.264, 66178... |
- Recreate boundaries, so that no intersection occurs... do you see different values in geometry column now?
recreatedPolygons=forCleaning_GDF.overlay(alienUnion_GDF,how="difference",keep_geom_type=False).dissolve(by="state_code")
recreatedPolygons
geometry | |
---|---|
state_code | |
BR27 | POLYGON ((7038233.546 8973146.179, 7038561.661... |
BR29 | MULTIPOLYGON (((6618113.779 7946308.264, 66178... |
- Replace old geometries with recent values.
This is a sensible moment: You need to change just the geometry cells, not the whole rows:
#old values
states_brazil_clean.loc[forCleaning,'geometry']
state_name Alagoas MULTIPOLYGON (((7038407.237 8973545.06, 703823... Bahia MULTIPOLYGON (((6618113.779 7946308.264, 66178... Name: geometry, dtype: geometry
#new values - notice order
recreatedPolygons.geometry
state_code BR27 POLYGON ((7038233.546 8973146.179, 7038561.661... BR29 MULTIPOLYGON (((6618113.779 7946308.264, 66178... Name: geometry, dtype: geometry
# but use this for replacement
recreatedPolygons.geometry.values
<GeometryArray> [<POLYGON ((7038233.546 8973146.179, 7038561.661 8972873.722, 7038666.047 897...>, <MULTIPOLYGON (((6618113.779 7946308.264, 6617883.775 7945834.225, 6617580.6...>] Length: 2, dtype: geometry
Then,
states_brazil_clean.loc[forCleaning,'geometry']=recreatedPolygons.geometry.values
#see
states_brazil_clean.loc[forCleaning,'geometry']
state_name Alagoas POLYGON ((7038233.546 8973146.179, 7038561.661... Bahia MULTIPOLYGON (((6618113.779 7946308.264, 66178... Name: geometry, dtype: geometry
- We need to recreate northern and southern GDFs, let's redo the four ones at once:
# the north
N_brazil_clean=states_brazil_clean.reset_index().cx[:,mid_y:]
# the south
S_brazil_clean=states_brazil_clean.reset_index().cx[:,:mid_y]
# the west
W_brazil_clean=states_brazil_clean.reset_index().cx[:mid_x,:]
# the east
E_brazil_clean=states_brazil_clean.reset_index().cx[mid_x:,:]
- Confirm you have improved the intersection with keep_geom_type=True:
N_brazil_clean.overlay(S_brazil_clean, how="intersection",keep_geom_type=True)
state_name_1 | state_code_1 | state_name_2 | state_code_2 | geometry | |
---|---|---|---|---|---|
0 | Acre | BR12 | Acre | BR12 | MULTIPOLYGON (((3374472.826 8741412.368, 33746... |
1 | Alagoas | BR27 | Alagoas | BR27 | POLYGON ((7038561.661 8972873.722, 7038666.047... |
2 | Bahia | BR29 | Bahia | BR29 | MULTIPOLYGON (((6617883.775 7945834.225, 66175... |
3 | Mato Grosso | BR51 | Mato Grosso | BR51 | MULTIPOLYGON (((4718500.714 8080705.175, 47182... |
4 | Piauí | BR22 | Piauí | BR22 | POLYGON ((6358809.561 9687374.554, 6358994.184... |
5 | Rondônia | BR11 | Rondônia | BR11 | MULTIPOLYGON (((4336668.469 8680873.21, 433691... |
6 | Sergipe | BR28 | Sergipe | BR28 | POLYGON ((6758965.277 8905155.665, 6759481.274... |
7 | Tocantins | BR17 | Tocantins | BR17 | POLYGON ((5765755.273 8998397.361, 5765506.449... |
- Confirm you have improved the intersection with keep_geom_type=False:
N_brazil_clean.overlay(S_brazil_clean, how="intersection",keep_geom_type=False).geometry.geom_type.value_counts()
MultiLineString 35 MultiPolygon 4 Polygon 4 Name: count, dtype: int64
All this work did help. But remember while we cleaned the neighborhood issues between 4 states, cleaning 2 of them there might be more cleaning remaining.
What about union?
The mid_Brazil had these many geometries (rows):
len(mid_Brazil)
22
Just using sets, this is the target count:
interNS=set(N_brazil_clean.state_name)&set(S_brazil_clean.state_name)
interEW=set(E_brazil_clean.state_name)&set(W_brazil_clean.state_name)
union_interNSEW=interNS|interEW
len(union_interNSEW)
15
Let's recreate intersection to recreate a clean mid Brazil:
NS_brazil_clean=N_brazil_clean.overlay(S_brazil_clean, how="intersection",keep_geom_type=True)
EW_brazil_clean=E_brazil_clean.overlay(W_brazil_clean, how="intersection",keep_geom_type=True)
# then
mid_Brazil_clean=NS_brazil_clean.overlay(EW_brazil_clean,how="union",keep_geom_type=True)
# count
len(mid_Brazil_clean)
19
Again, we had improved, but you realize there is still more cleaning to do.
Finally, let's check the recent result:
mid_Brazil_clean
state_name_1_1 | state_code_1_1 | state_name_2_1 | state_code_2_1 | state_name_1_2 | state_code_1_2 | state_name_2_2 | state_code_2_2 | geometry | |
---|---|---|---|---|---|---|---|---|---|
0 | Mato Grosso | BR51 | Mato Grosso | BR51 | Mato Grosso | BR51 | Mato Grosso | BR51 | MULTIPOLYGON (((4718274.258 8081710.261, 47189... |
1 | Acre | BR12 | Acre | BR12 | NaN | NaN | NaN | NaN | MULTIPOLYGON (((3374472.826 8741412.368, 33746... |
2 | Alagoas | BR27 | Alagoas | BR27 | NaN | NaN | NaN | NaN | POLYGON ((7038561.661 8972873.722, 7038666.047... |
3 | Bahia | BR29 | Bahia | BR29 | NaN | NaN | NaN | NaN | MULTIPOLYGON (((6617580.695 7946157.331, 66181... |
4 | Piauí | BR22 | Piauí | BR22 | NaN | NaN | NaN | NaN | POLYGON ((6358809.561 9687374.554, 6358994.184... |
5 | Rondônia | BR11 | Rondônia | BR11 | NaN | NaN | NaN | NaN | MULTIPOLYGON (((4336911.983 8681201.152, 43372... |
6 | Sergipe | BR28 | Sergipe | BR28 | NaN | NaN | NaN | NaN | POLYGON ((6758965.277 8905155.665, 6759481.274... |
7 | Tocantins | BR17 | Tocantins | BR17 | NaN | NaN | NaN | NaN | POLYGON ((5764722.95 8996734.494, 5764264.595 ... |
8 | NaN | NaN | NaN | NaN | Amapá | BR16 | Amapá | BR16 | MULTIPOLYGON (((5394219.767 10234053.175, 5394... |
9 | NaN | NaN | NaN | NaN | Distrito Federal | BR53 | Goiás | BR52 | MULTIPOLYGON (((5709441.711 8230228.217, 57094... |
10 | NaN | NaN | NaN | NaN | Goiás | BR52 | Goiás | BR52 | MULTIPOLYGON (((5715352.084 8093167.787, 57150... |
11 | NaN | NaN | NaN | NaN | Mato Grosso do Sul | BR50 | Mato Grosso do Sul | BR50 | MULTIPOLYGON (((4643554.93 7541102.839, 464377... |
12 | NaN | NaN | NaN | NaN | Minas Gerais | BR31 | Goiás | BR52 | MULTIPOLYGON (((5488048.073 7931930.787, 54880... |
13 | NaN | NaN | NaN | NaN | Pará | BR15 | Pará | BR15 | MULTIPOLYGON (((5670588.737 9923682.938, 56706... |
14 | NaN | NaN | NaN | NaN | Paraná | BR41 | Paraná | BR41 | MULTIPOLYGON (((5445907.946 7090332.337, 54456... |
15 | NaN | NaN | NaN | NaN | Paraná | BR41 | Santa Catarina | BR42 | MULTIPOLYGON (((5247750.046 7056248.157, 52477... |
16 | NaN | NaN | NaN | NaN | Rio Grande do Sul | BR43 | Rio Grande do Sul | BR43 | MULTIPOLYGON (((5390972.988 6769017.466, 53912... |
17 | NaN | NaN | NaN | NaN | Santa Catarina | BR42 | Paraná | BR41 | MULTIPOLYGON (((5247761.665 7056252.309, 52476... |
18 | NaN | NaN | NaN | NaN | Santa Catarina | BR42 | Santa Catarina | BR42 | MULTIPOLYGON (((5384989.919 6745543.64, 538466... |
The UNION overlay is a simple idea. But the result above does not seem friendly.
Let's recreat this GDF from previous steps:
NS_brazil_clean
state_name_1 | state_code_1 | state_name_2 | state_code_2 | geometry | |
---|---|---|---|---|---|
0 | Acre | BR12 | Acre | BR12 | MULTIPOLYGON (((3374472.826 8741412.368, 33746... |
1 | Alagoas | BR27 | Alagoas | BR27 | POLYGON ((7038561.661 8972873.722, 7038666.047... |
2 | Bahia | BR29 | Bahia | BR29 | MULTIPOLYGON (((6617883.775 7945834.225, 66175... |
3 | Mato Grosso | BR51 | Mato Grosso | BR51 | MULTIPOLYGON (((4718500.714 8080705.175, 47182... |
4 | Piauí | BR22 | Piauí | BR22 | POLYGON ((6358809.561 9687374.554, 6358994.184... |
5 | Rondônia | BR11 | Rondônia | BR11 | MULTIPOLYGON (((4336668.469 8680873.21, 433691... |
6 | Sergipe | BR28 | Sergipe | BR28 | POLYGON ((6758965.277 8905155.665, 6759481.274... |
7 | Tocantins | BR17 | Tocantins | BR17 | POLYGON ((5765755.273 8998397.361, 5765506.449... |
# better format
NS_brazil_clean.drop(columns=['state_code_1','state_name_2','state_code_2'],inplace=True)
NS_brazil_clean.rename(columns={'state_name_1':'state_name_clean'},inplace=True)
# then
NS_brazil_clean
state_name_clean | geometry | |
---|---|---|
0 | Acre | MULTIPOLYGON (((3374472.826 8741412.368, 33746... |
1 | Alagoas | POLYGON ((7038561.661 8972873.722, 7038666.047... |
2 | Bahia | MULTIPOLYGON (((6617883.775 7945834.225, 66175... |
3 | Mato Grosso | MULTIPOLYGON (((4718500.714 8080705.175, 47182... |
4 | Piauí | POLYGON ((6358809.561 9687374.554, 6358994.184... |
5 | Rondônia | MULTIPOLYGON (((4336668.469 8680873.21, 433691... |
6 | Sergipe | POLYGON ((6758965.277 8905155.665, 6759481.274... |
7 | Tocantins | POLYGON ((5765755.273 8998397.361, 5765506.449... |
Notice this one is not cleaned:
EW_brazil_clean
state_name_1 | state_code_1 | state_name_2 | state_code_2 | geometry | |
---|---|---|---|---|---|
0 | Amapá | BR16 | Amapá | BR16 | MULTIPOLYGON (((5394219.767 10234053.175, 5394... |
1 | Distrito Federal | BR53 | Goiás | BR52 | MULTIPOLYGON (((5709441.711 8230228.217, 57094... |
2 | Goiás | BR52 | Goiás | BR52 | MULTIPOLYGON (((5715352.084 8093167.787, 57150... |
3 | Mato Grosso | BR51 | Mato Grosso | BR51 | MULTIPOLYGON (((4718500.714 8080705.175, 47182... |
4 | Mato Grosso do Sul | BR50 | Mato Grosso do Sul | BR50 | MULTIPOLYGON (((4643440.815 7540678.682, 46435... |
5 | Minas Gerais | BR31 | Goiás | BR52 | MULTIPOLYGON (((5488048.073 7931930.787, 54880... |
6 | Pará | BR15 | Pará | BR15 | MULTIPOLYGON (((5669998.17 9923194.452, 567022... |
7 | Paraná | BR41 | Paraná | BR41 | MULTIPOLYGON (((5445907.946 7090332.337, 54456... |
8 | Paraná | BR41 | Santa Catarina | BR42 | MULTIPOLYGON (((5247750.046 7056248.157, 52477... |
9 | Rio Grande do Sul | BR43 | Rio Grande do Sul | BR43 | MULTIPOLYGON (((5390972.988 6769017.466, 53912... |
10 | Santa Catarina | BR42 | Paraná | BR41 | MULTIPOLYGON (((5247761.665 7056252.309, 52476... |
11 | Santa Catarina | BR42 | Santa Catarina | BR42 | MULTIPOLYGON (((5384989.919 6745543.64, 538466... |
We will not do the any reformatting as before, since we can not erase columns.
Let's see the 'mid' again:
mid_Brazil_clean=NS_brazil_clean.overlay(EW_brazil_clean,how="union",keep_geom_type=True)
mid_Brazil_clean
state_name_clean | state_name_1 | state_code_1 | state_name_2 | state_code_2 | geometry | |
---|---|---|---|---|---|---|
0 | Mato Grosso | Mato Grosso | BR51 | Mato Grosso | BR51 | MULTIPOLYGON (((4718274.258 8081710.261, 47189... |
1 | Acre | NaN | NaN | NaN | NaN | MULTIPOLYGON (((3374472.826 8741412.368, 33746... |
2 | Alagoas | NaN | NaN | NaN | NaN | POLYGON ((7038561.661 8972873.722, 7038666.047... |
3 | Bahia | NaN | NaN | NaN | NaN | MULTIPOLYGON (((6617580.695 7946157.331, 66181... |
4 | Piauí | NaN | NaN | NaN | NaN | POLYGON ((6358809.561 9687374.554, 6358994.184... |
5 | Rondônia | NaN | NaN | NaN | NaN | MULTIPOLYGON (((4336911.983 8681201.152, 43372... |
6 | Sergipe | NaN | NaN | NaN | NaN | POLYGON ((6758965.277 8905155.665, 6759481.274... |
7 | Tocantins | NaN | NaN | NaN | NaN | POLYGON ((5764722.95 8996734.494, 5764264.595 ... |
8 | NaN | Amapá | BR16 | Amapá | BR16 | MULTIPOLYGON (((5394219.767 10234053.175, 5394... |
9 | NaN | Distrito Federal | BR53 | Goiás | BR52 | MULTIPOLYGON (((5709441.711 8230228.217, 57094... |
10 | NaN | Goiás | BR52 | Goiás | BR52 | MULTIPOLYGON (((5715352.084 8093167.787, 57150... |
11 | NaN | Mato Grosso do Sul | BR50 | Mato Grosso do Sul | BR50 | MULTIPOLYGON (((4643554.93 7541102.839, 464377... |
12 | NaN | Minas Gerais | BR31 | Goiás | BR52 | MULTIPOLYGON (((5488048.073 7931930.787, 54880... |
13 | NaN | Pará | BR15 | Pará | BR15 | MULTIPOLYGON (((5670588.737 9923682.938, 56706... |
14 | NaN | Paraná | BR41 | Paraná | BR41 | MULTIPOLYGON (((5445907.946 7090332.337, 54456... |
15 | NaN | Paraná | BR41 | Santa Catarina | BR42 | MULTIPOLYGON (((5247750.046 7056248.157, 52477... |
16 | NaN | Rio Grande do Sul | BR43 | Rio Grande do Sul | BR43 | MULTIPOLYGON (((5390972.988 6769017.466, 53912... |
17 | NaN | Santa Catarina | BR42 | Paraná | BR41 | MULTIPOLYGON (((5247761.665 7056252.309, 52476... |
18 | NaN | Santa Catarina | BR42 | Santa Catarina | BR42 | MULTIPOLYGON (((5384989.919 6745543.64, 538466... |
Notice this UNION operation identifies the intersection in the result. MATO GROSSO is present in both.
However, missing values were created where no intersection exists. Remember, whatever is not in the intersection of these two, would be the result of symmetric difference:
NS_brazil_clean.overlay(EW_brazil_clean,how="symmetric_difference",keep_geom_type=True)
state_name_clean | state_name_1 | state_code_1 | state_name_2 | state_code_2 | geometry | |
---|---|---|---|---|---|---|
0 | Acre | NaN | NaN | NaN | NaN | MULTIPOLYGON (((3374472.826 8741412.368, 33746... |
1 | Alagoas | NaN | NaN | NaN | NaN | POLYGON ((7038561.661 8972873.722, 7038666.047... |
2 | Bahia | NaN | NaN | NaN | NaN | MULTIPOLYGON (((6617580.695 7946157.331, 66181... |
3 | Piauí | NaN | NaN | NaN | NaN | POLYGON ((6358809.561 9687374.554, 6358994.184... |
4 | Rondônia | NaN | NaN | NaN | NaN | MULTIPOLYGON (((4336911.983 8681201.152, 43372... |
5 | Sergipe | NaN | NaN | NaN | NaN | POLYGON ((6758965.277 8905155.665, 6759481.274... |
6 | Tocantins | NaN | NaN | NaN | NaN | POLYGON ((5764722.95 8996734.494, 5764264.595 ... |
7 | NaN | Amapá | BR16 | Amapá | BR16 | MULTIPOLYGON (((5394219.767 10234053.175, 5394... |
8 | NaN | Distrito Federal | BR53 | Goiás | BR52 | MULTIPOLYGON (((5709441.711 8230228.217, 57094... |
9 | NaN | Goiás | BR52 | Goiás | BR52 | MULTIPOLYGON (((5715352.084 8093167.787, 57150... |
10 | NaN | Mato Grosso do Sul | BR50 | Mato Grosso do Sul | BR50 | MULTIPOLYGON (((4643554.93 7541102.839, 464377... |
11 | NaN | Minas Gerais | BR31 | Goiás | BR52 | MULTIPOLYGON (((5488048.073 7931930.787, 54880... |
12 | NaN | Pará | BR15 | Pará | BR15 | MULTIPOLYGON (((5670588.737 9923682.938, 56706... |
13 | NaN | Paraná | BR41 | Paraná | BR41 | MULTIPOLYGON (((5445907.946 7090332.337, 54456... |
14 | NaN | Paraná | BR41 | Santa Catarina | BR42 | MULTIPOLYGON (((5247750.046 7056248.157, 52477... |
15 | NaN | Rio Grande do Sul | BR43 | Rio Grande do Sul | BR43 | MULTIPOLYGON (((5390972.988 6769017.466, 53912... |
16 | NaN | Santa Catarina | BR42 | Paraná | BR41 | MULTIPOLYGON (((5247761.665 7056252.309, 52476... |
17 | NaN | Santa Catarina | BR42 | Santa Catarina | BR42 | MULTIPOLYGON (((5384989.919 6745543.64, 538466... |
As you see, the fact that we have not clean all the geometries affects all these overlay operations.
NEXT STEPS
- Cleaning spatial data is not trivial. It is time consuming. We could keep working here to improve this.
- We may need to take this fight to QGis or ArcGis to edit the points manually, it gives you more control, but it will affect replicability.
- A smart decision would be to get a new map with a better quality - which might also require some money investment.