No description has been provided for this image
Open In Colab

Binary Spatial operations on Geo Dataframes¶

This time the spatial operations will be applied when we have as input two GDFs.

Getting ready¶

The links to the our maps on GitHub are here:

In [1]:
linkWorldMap="https://github.com/CienciaDeDatosEspacial/dataSets/raw/refs/heads/main/WORLD/worldMaps.gpkg"
linkBrazil="https://github.com/CienciaDeDatosEspacial/dataSets/raw/refs/heads/main/BRAZIL/brazil_5880.gpkg"
linkIndicators="https://github.com/CienciaDeDatosEspacial/dataSets/raw/refs/heads/main/WORLD/worldindicators.json"

Let me introduce another data source on seaports. Their locations in long/lat is stored in the file seaports.csv which we already have on GitHub:

In [2]:
linkSeaPorts='https://github.com/CienciaDeDatosEspacial/dataSets/raw/refs/heads/main/WORLD/seaports.csv'

Let's get some maps:

In [3]:
import geopandas as gpd

#world 
world_rivers=gpd.read_file(linkWorldMap,layer='rivers')
#brazil 
brazil5880=gpd.read_file(linkBrazil,layer='country')
airports_brazil5880=gpd.read_file(linkBrazil,layer='airports')
states_brazil5880=gpd.read_file(linkBrazil,layer='states')
municipalities_brazil5880=gpd.read_file(linkBrazil,layer='municipalities')
#some indicators
indicators=gpd.read_file(linkIndicators)

# the seaports
import pandas as pd 
infoseaports=pd.read_csv(linkSeaPorts)
In [4]:
# the sesports data has too manny columns:
len(infoseaports.columns)
Out[4]:
108

Let me keep some columns, and turn the DF into a GDF:

In [5]:
#rename
infoseaports.rename(columns={'Main Port Name':'seaport_name','Country Code':'country_name'},inplace=True)

#keep few columns
infoseaports=infoseaports.loc[:,['seaport_name', 'country_name','Latitude', 'Longitude']]

#spatial points (unprojected)
seaports=gpd.GeoDataFrame(data=infoseaports.copy(),
                           geometry=gpd.points_from_xy(infoseaports.Longitude,
                                                       infoseaports.Latitude), 
                          crs=4326)# notice it is unprojected

# keep Brazil
seaports_bra=seaports[seaports['country_name']=='Brazil'].copy()

# reset indexes
seaports_bra.reset_index(drop=True, inplace=True)

# reprojecting
seaports_brazil5880=seaports_bra.to_crs(5880) # projected crs

Before proceeding, let me compute some GDFs we created the previous session:

  • The centroid of Brazil
In [6]:
brazil5880_cen=brazil5880.centroid
  • Large Brazilian airports
In [7]:
large_airports=airports_brazil5880.query("airport_type=='large_airport'")
  • Amazon Rivers
In [8]:
AmazonSystem=world_rivers.query("SYSTEM=='Amazon'")
AmazonSystem_5880=AmazonSystem.to_crs(5880)
  • The states in the East, West, South, and North
In [9]:
mid_x,mid_y=brazil5880_cen.x[0],brazil5880_cen.y[0]

# the north
N_brazil=states_brazil5880.cx[:,mid_y:]
# the south
S_brazil=states_brazil5880.cx[:,:mid_y]
# the west
W_brazil=states_brazil5880.cx[:mid_x,:]
# the east
E_brazil=states_brazil5880.cx[mid_x:,:]
  • The mean fragility by region of the world
In [10]:
indicatorsByRegion=indicators.dissolve(
    by="region", #groupby()
    aggfunc={"fragility": "mean"}, #agg()
    )
In [11]:
# You may need to install if using Colab
# !pip install mapclassify

I. Distance¶

Distance is a key binary operation as so many practical policy matters depend on knowing distances between objects in space.

Any pair of rightly projected GDFs have a distance between them. Below we can make query using distances:

Which are the airports whose distance to Brazil centroid is > 2500000?

In [12]:
# this is the centroid we have:
brazil5880_cen
Out[12]:
0    POINT (5085264.134 8827720.201)
dtype: geometry

Then,

In [13]:
airports_brazil5880[airports_brazil5880.distance(brazil5880_cen[0]) > 2500000]
Out[13]:
airport_name airport_type elevation_ft region municipality geometry
4104 Viatec Aviação Agrícola Airport small_airport 92.0 Rio Grande do Sul Santa Vitória do Palmar POINT (5068397.782 6323060.053)
4106 Santa Vitória do Palmar Airport small_airport 82.0 Rio Grande do Sul Santa Vitória Do Palmar POINT (5060939.818 6291358.318)
6662 Trindade Heliport heliport NaN Espírito Santo Vitória POINT (7565615.936 7537143.55)

The results can be confirmed visually:

In [14]:
base=airports_brazil5880[airports_brazil5880.distance(brazil5880_cen.iloc[0]) > 2500000].plot(marker='+',markersize=100)
airports_brazil5880.plot(ax=base,color='grey', markersize=0.1)
brazil5880_cen.plot(ax=base,color='red')
Out[14]:
<Axes: >
No description has been provided for this image

Let me review how distances work between different kinds of geometries:

a. Distance between points¶

We have these points:

In [15]:
seaports_brazil5880.head()
Out[15]:
seaport_name country_name Latitude Longitude geometry
0 Dtse / Gegua Oil Terminal Brazil -22.816667 -43.150000 POINT (6112865.27 7434885.819)
1 Porto De Mucuripe Brazil -3.716667 -38.483333 POINT (6723610.313 9573897.029)
2 Portocel Brazil -19.850000 -40.050000 POINT (6459540.7 7743872.147)
3 Santa Clara Brazil -20.883333 -51.366667 POINT (5273988.618 7687595.596)
4 Aratu Brazil -12.783333 -38.500000 POINT (6681955.632 8535904.773)
In [16]:
large_airports.head()
Out[16]:
airport_name airport_type elevation_ft region municipality geometry
0 Guarulhos - Governador André Franco Montoro In... large_airport 2461.0 São Paulo São Paulo POINT (5769392.959 7387510.487)
1 Rio Galeão – Tom Jobim International Airport large_airport 28.0 Rio de Janeiro Rio De Janeiro POINT (6102623.812 7436387.271)
4 Presidente Juscelino Kubitschek International ... large_airport 3497.0 Distrito Federal Brasília POINT (5651010.181 8235390.09)
5 Deputado Luiz Eduardo Magalhães International ... large_airport 64.0 Bahia Salvador POINT (6700330.807 8520415.591)
9 Tancredo Neves International Airport large_airport 2721.0 Minas Gerais Belo Horizonte POINT (6051718.72 7797008.905)

For results of distance operation to be more insightful, we may do this:

In [17]:
# index will be names of airports/seaports instead of numbers
large_airports.set_index('airport_name',inplace=True)
seaports_brazil5880.set_index('seaport_name',inplace=True)

Notice:

In [18]:
seaports_brazil5880.head()
Out[18]:
country_name Latitude Longitude geometry
seaport_name
Dtse / Gegua Oil Terminal Brazil -22.816667 -43.150000 POINT (6112865.27 7434885.819)
Porto De Mucuripe Brazil -3.716667 -38.483333 POINT (6723610.313 9573897.029)
Portocel Brazil -19.850000 -40.050000 POINT (6459540.7 7743872.147)
Santa Clara Brazil -20.883333 -51.366667 POINT (5273988.618 7687595.596)
Aratu Brazil -12.783333 -38.500000 POINT (6681955.632 8535904.773)
In [19]:
large_airports.head()
Out[19]:
airport_type elevation_ft region municipality geometry
airport_name
Guarulhos - Governador André Franco Montoro International Airport large_airport 2461.0 São Paulo São Paulo POINT (5769392.959 7387510.487)
Rio Galeão – Tom Jobim International Airport large_airport 28.0 Rio de Janeiro Rio De Janeiro POINT (6102623.812 7436387.271)
Presidente Juscelino Kubitschek International Airport large_airport 3497.0 Distrito Federal Brasília POINT (5651010.181 8235390.09)
Deputado Luiz Eduardo Magalhães International Airport large_airport 64.0 Bahia Salvador POINT (6700330.807 8520415.591)
Tancredo Neves International Airport large_airport 2721.0 Minas Gerais Belo Horizonte POINT (6051718.72 7797008.905)

The distance from each airport to the "Dtse / Gegua Oil Terminal"

In [20]:
large_airports.distance(seaports_brazil5880.geometry.loc['Dtse / Gegua Oil Terminal'])
Out[20]:
airport_name
Guarulhos - Governador André Franco Montoro International Airport    3.467242e+05
Rio Galeão – Tom Jobim International Airport                         1.035093e+04
Presidente Juscelino Kubitschek International Airport                9.241846e+05
Deputado Luiz Eduardo Magalhães International Airport                1.234298e+06
Tancredo Neves International Airport                                 3.672493e+05
Eduardo Gomes International Airport                                  2.854494e+06
Hercílio Luz International Airport                                   7.681356e+05
Val de Cans/Júlio Cezar Ribeiro International Airport                2.462792e+06
dtype: float64

What about computing...

All the distances between large aiports and seaports (in km)

In [21]:
# apply creates a LOOP, computes distances from each seaport to all large airports
seaports_brazil5880.geometry.apply\
(lambda seaport: large_airports.geometry.distance(seaport)/1000)
Out[21]:
airport_name Guarulhos - Governador André Franco Montoro International Airport Rio Galeão – Tom Jobim International Airport Presidente Juscelino Kubitschek International Airport Deputado Luiz Eduardo Magalhães International Airport Tancredo Neves International Airport Eduardo Gomes International Airport Hercílio Luz International Airport Val de Cans/Júlio Cezar Ribeiro International Airport
seaport_name
Dtse / Gegua Oil Terminal 346.724171 10.350933 924.184620 1234.297632 367.249276 2854.493600 768.135646 2462.791784
Porto De Mucuripe 2385.543307 2225.886834 1715.246883 1053.738618 1899.676213 2397.739727 2901.295202 1141.963646
Portocel 776.722303 471.101490 946.209008 813.018817 411.269111 2867.979624 1232.521248 2266.253017
Santa Clara 579.203362 865.876382 664.999298 1651.678293 785.388683 2189.659524 805.599343 2185.898767
Aratu 1466.826530 1242.804930 1073.851757 24.032516 971.167168 2609.953079 1975.180167 1690.370196
Cabedelo 2248.124981 2023.173930 1747.914940 786.022123 1751.230863 2830.123680 2757.957782 1641.263825
Imbituba 578.310920 814.278300 1377.148556 2031.882238 1072.105305 3042.768039 63.612808 2983.998277
Angra Dos Reis 225.003606 111.795735 883.019058 1308.028199 380.901616 2791.717379 671.807826 2455.089996
Natal 2341.056650 2128.327377 1799.131283 891.421964 1843.689307 2774.776071 2853.930050 1556.780342
Guamare Oil Terminal 2371.907633 2177.693211 1779.896593 953.130770 1876.302092 2647.096318 2887.478112 1411.272089
Itaqui 2344.169726 2275.985960 1535.410406 1347.564542 1915.248841 1744.522370 2836.644708 476.002360
Manaus 2682.252079 2836.382779 1935.849306 2625.311641 2529.751943 11.154274 2980.307205 1298.255235
Itacoatiara 2592.193077 2730.217800 1820.822272 2468.851545 2413.939317 178.249697 2913.273479 1126.561239
Belem 2456.456648 2452.107621 1604.006437 1709.617689 2088.711020 1296.677448 2913.649490 8.290566
Macae 496.852182 160.532855 973.748856 1135.432221 383.200281 2918.055893 905.988525 2458.151928
Niteroi 348.597382 16.028283 932.309968 1239.746717 375.348945 2862.315099 765.202458 2470.939163
Pelotas 1093.531165 1343.662618 1822.421018 2557.070047 1590.903559 3287.745109 585.001710 3396.677050
Rio De Janeiro 343.209409 13.287640 930.999018 1243.545205 375.961548 2860.201645 760.258369 2471.340121
Recife 2142.801315 1910.277530 1674.666425 676.860725 1647.057100 2845.133868 2650.142734 1687.882366
Tramandai 814.930209 1054.174075 1584.514525 2272.782881 1310.905328 3164.410125 299.764293 3180.525010
Madre De Deus 1462.725337 1241.121137 1063.232802 36.749390 966.657897 2596.778994 1971.741706 1679.002816
Laguna 610.078831 843.875837 1407.360604 2062.999867 1103.719364 3064.635623 95.009235 3013.520141
Port De Aracaju 1733.070695 1504.384506 1303.001081 267.564965 1237.441529 2687.325788 2240.825988 1659.464068
Porto Santana 2646.507607 2677.012736 1791.151164 2023.853225 2318.125252 1040.555921 3076.376196 335.441624
Porto De Maceio 1934.791716 1699.567552 1497.384909 468.512961 1440.007502 2788.984232 2440.819954 1693.889037
Port De Salvador 1448.644611 1222.959201 1065.257319 22.089660 953.390744 2616.268890 1956.445482 1704.676897
Rio Grande 1106.010760 1349.213774 1845.254974 2567.264855 1603.088647 3322.620918 593.951743 3423.363634
Santarem 2494.223469 2582.870485 1666.149767 2153.673543 2242.151898 596.990455 2873.540176 703.734669
Porto Do Forno 458.612166 127.835449 1007.569738 1207.778202 425.934098 2947.271281 842.325416 2513.571318
Port Of Ilheus 1250.183516 1012.231807 963.853808 227.158223 760.366657 2653.948977 1752.959780 1830.939739
Itajai 445.212169 711.429716 1230.766858 1908.751545 942.378862 2912.927857 84.183089 2837.650203
Cameta 2378.168716 2391.088714 1522.497285 1718.258865 2029.693661 1176.378545 2824.439489 149.389065
Gebig 234.293252 102.687867 890.971113 1306.374359 383.834407 2801.212589 675.394423 2460.997434
Camocim 2386.572249 2263.438786 1649.578153 1174.469354 1918.592061 2136.332613 2896.762517 867.120057
Tutoia 2359.329043 2258.261495 1590.076044 1231.305489 1905.363602 1975.390802 2863.730319 706.603901
Porto De Suape 2103.354422 1869.279589 1643.841933 637.106285 1607.968400 2841.333120 2610.100411 1695.716388
Sao Sebastiao 116.468574 246.255403 922.876460 1439.559506 489.364806 2790.737899 534.907206 2520.013507
Porto Alegre 873.579840 1131.972395 1608.962449 2338.373437 1370.955950 3133.273638 370.518357 3192.817209
Sao Francisco 382.581936 667.068214 1156.757681 1848.757620 879.673853 2846.986805 158.134767 2763.531465
Santos 60.285901 336.840080 916.131756 1503.504871 539.576757 2750.220613 471.909979 2523.542123
Vitoria 724.103309 412.874702 945.370886 871.332211 388.004920 2877.110506 1173.859123 2300.907362
Vila Do Conde 2447.390678 2447.041476 1593.895601 1719.067617 2083.991348 1267.605039 2902.031022 35.886867
Tubarao 733.536010 421.815958 951.016346 865.639116 395.886973 2881.588731 1182.657631 2301.239995
Paranagua 310.122786 613.685944 1072.847448 1775.347212 804.518079 2778.830902 241.366210 2680.325904
Guaiba Island Terminal 253.451432 83.581854 896.378698 1294.218692 379.377843 2810.644009 690.358499 2461.628137

If we save the matrix...

In [22]:
D_Matrix_sea_air=seaports_brazil5880.geometry.apply \
                (lambda seaport: large_airports.geometry.distance(seaport)/1000)

We can compute some distance stats from there:

Some summary of distances from each seaport to all large airports

In [23]:
Stat_sea_air=pd.DataFrame()
Stat_sea_air['mean']=D_Matrix_sea_air.mean(axis=1) # mean D to all airports
Stat_sea_air['min']=D_Matrix_sea_air.min(axis=1)# min D to all airports
Stat_sea_air['max']=D_Matrix_sea_air.max(axis=1)# max D to all airports

# see some
Stat_sea_air.head(10)
Out[23]:
mean min max
seaport_name
Dtse / Gegua Oil Terminal 1121.028458 10.350933 2854.493600
Porto De Mucuripe 1965.136304 1053.738618 2901.295202
Portocel 1223.134327 411.269111 2867.979624
Santa Clara 1216.037957 579.203362 2189.659524
Aratu 1381.773293 24.032516 2609.953079
Cabedelo 1973.226516 786.022123 2830.123680
Imbituba 1495.513055 63.612808 3042.768039
Angra Dos Reis 1103.420427 111.795735 2791.717379
Natal 2023.639130 891.421964 2853.930050
Guamare Oil Terminal 2013.097102 953.130770 2887.478112

Of course, the idmax and idmin could come in handy:

Farthest airport to each seaport

In [24]:
D_Matrix_sea_air.idxmax(axis=1).head()
Out[24]:
seaport_name
Dtse / Gegua Oil Terminal    Eduardo Gomes International Airport
Porto De Mucuripe             Hercílio Luz International Airport
Portocel                     Eduardo Gomes International Airport
Santa Clara                  Eduardo Gomes International Airport
Aratu                        Eduardo Gomes International Airport
dtype: object

Farthest seaport to each airport

In [25]:
D_Matrix_sea_air.idxmax(axis=0).head()
Out[25]:
airport_name
Guarulhos - Governador André Franco Montoro International Airport    Manaus
Rio Galeão – Tom Jobim International Airport                         Manaus
Presidente Juscelino Kubitschek International Airport                Manaus
Deputado Luiz Eduardo Magalhães International Airport                Manaus
Tancredo Neves International Airport                                 Manaus
dtype: object

Closest airport to each seaport

In [26]:
D_Matrix_sea_air.idxmin(axis=1).head()
Out[26]:
seaport_name
Dtse / Gegua Oil Terminal         Rio Galeão – Tom Jobim International Airport
Porto De Mucuripe            Deputado Luiz Eduardo Magalhães International ...
Portocel                                  Tancredo Neves International Airport
Santa Clara                  Guarulhos - Governador André Franco Montoro In...
Aratu                        Deputado Luiz Eduardo Magalhães International ...
dtype: object

Closest seaport to each airport

In [27]:
# closest seaport to each airport
D_Matrix_sea_air.idxmin(axis=0).head()
Out[27]:
airport_name
Guarulhos - Governador André Franco Montoro International Airport                       Santos
Rio Galeão – Tom Jobim International Airport                         Dtse / Gegua Oil Terminal
Presidente Juscelino Kubitschek International Airport                              Santa Clara
Deputado Luiz Eduardo Magalhães International Airport                         Port De Salvador
Tancredo Neves International Airport                                 Dtse / Gegua Oil Terminal
dtype: object

b. Distance between line and point¶

Once we know understand how distance and idxmin/idxmax work, we can feel comfortable in this stage.

Let's use these rivers from before:

In [28]:
AmazonSystem_5880.set_index("RIVER",inplace=True)
AmazonSystem_5880
Out[28]:
SYSTEM geometry
RIVER
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425...
Guapore Amazon MULTILINESTRING ((3784022.891 8842606.845, 378...
Japura Amazon MULTILINESTRING ((3782432.967 9748590.005, 382...
Madeira Amazon MULTILINESTRING ((3784022.891 8842606.845, 378...
Madre de Dios Amazon MULTILINESTRING ((3407019.812 8246998.613, 343...
Purus Amazon MULTILINESTRING ((3008899.534 8748725.178, 305...
Putamayo Amazon MULTILINESTRING ((2478636.973 10162886.59, 246...
Rio Branco Amazon MULTILINESTRING ((4372657.932 10403233.815, 43...
Rio Juruena Amazon MULTILINESTRING ((4488340.769 8382801.957, 456...
Rio Maranon Amazon MULTILINESTRING ((2538298.165 8770999.896, 252...
Rio Negro Amazon MULTILINESTRING ((3242119.495 10234498.68, 327...
Rio Teles Pires Amazon MULTILINESTRING ((4890008.905 8388474.8, 49295...
Tapajos Amazon MULTILINESTRING ((4956154.209 9738475.606, 490...
Ucayali Amazon MULTILINESTRING ((3104425.924 8220993.168, 313...
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503...

Then,

Distance from river Tapajos to Guarulhos airport

In [29]:
airName='Guarulhos - Governador André Franco Montoro International Airport'
rivName='Tapajos'
AmazonSystem_5880.geometry.loc[rivName].distance(large_airports.geometry.loc[airName])/1000
Out[29]:
2170.030197547702

We can compute the distance matrix now:

In [30]:
D_Matrix_amazRivs_air=AmazonSystem_5880.geometry.apply \
                (lambda river: large_airports.geometry.distance(river)/1000)
In [31]:
Stat_amz_air=pd.DataFrame()
Stat_amz_air['mean']=D_Matrix_amazRivs_air.mean(axis=1) # mean D to all airports
Stat_amz_air['min']=D_Matrix_amazRivs_air.min(axis=1)# min D to all airports
Stat_amz_air['max']=D_Matrix_amazRivs_air.max(axis=1)# max D to all airports

# see some
Stat_amz_air.head(10)
Out[31]:
mean min max
RIVER
Amazon 1764.132480 25.549158 2865.306190
Guapore 1822.694503 983.345006 2397.410373
Japura 2495.652965 502.953032 3214.712005
Madeira 1949.986161 153.127907 2652.416364
Madre de Dios 2237.270472 983.345006 2934.051647
Purus 2143.371707 150.306441 2848.394230
Putamayo 2784.484656 853.437589 3437.832352
Rio Branco 2353.394115 264.452689 3241.167974
Rio Juruena 1498.931579 525.908802 2139.336751
Rio Maranon 3216.884463 1479.515487 3961.463537
In [32]:
# closest river to each airport
D_Matrix_amazRivs_air.idxmin(axis=0).head()
Out[32]:
airport_name
Guarulhos - Governador André Franco Montoro International Airport    Xingu
Rio Galeão – Tom Jobim International Airport                         Xingu
Presidente Juscelino Kubitschek International Airport                Xingu
Deputado Luiz Eduardo Magalhães International Airport                Xingu
Tancredo Neves International Airport                                 Xingu
dtype: object
In [33]:
# farthest river to each airport
D_Matrix_amazRivs_air.idxmax(axis=0).head()
Out[33]:
airport_name
Guarulhos - Governador André Franco Montoro International Airport    Rio Maranon
Rio Galeão – Tom Jobim International Airport                         Rio Maranon
Presidente Juscelino Kubitschek International Airport                Rio Maranon
Deputado Luiz Eduardo Magalhães International Airport                Rio Maranon
Tancredo Neves International Airport                                 Rio Maranon
dtype: object

c. Between Polygon and Point¶

Let me re use the world rivers to get the rivers in a couple of systems:

In [34]:
river_systems=world_rivers.query("SYSTEM in ['Amazon','Parana']")
river_systems
Out[34]:
RIVER SYSTEM geometry
1 Amazon Amazon MULTILINESTRING ((-61.2773 -3.60706, -60.68466...
24 Guapore Amazon MULTILINESTRING ((-65.1024 -10.27593, -65.1201...
29 Japura Amazon MULTILINESTRING ((-64.94595 -2.23269, -64.5731...
37 Madeira Amazon MULTILINESTRING ((-65.1024 -10.27593, -65.0844...
38 Madre de Dios Amazon MULTILINESTRING ((-68.84791 -15.35547, -68.594...
55 Paraguay Parana MULTILINESTRING ((-53.47152 -16.67963, -54.309...
56 Parana Parana MULTILINESTRING ((-58.48091 -27.30186, -58.402...
59 Purus Amazon MULTILINESTRING ((-72.21624 -10.77936, -71.803...
60 Putamayo Amazon MULTILINESTRING ((-76.65652 1.36565, -76.77346...
62 Rio Branco Amazon MULTILINESTRING ((-59.64679 3.62898, -60.19013...
64 Rio Juruena Amazon MULTILINESTRING ((-58.74846 -14.57408, -58.045...
65 Rio Maranon Amazon MULTILINESTRING ((-76.49432 -10.32964, -76.523...
66 Rio Negro Amazon MULTILINESTRING ((-69.80154 2.04259, -69.48542...
67 Rio Paranaiba Parana MULTILINESTRING ((-46.02263 -19.02853, -46.130...
68 Rio Teles Pires Amazon MULTILINESTRING ((-55.02068 -14.5688, -54.6526...
80 Tapajos Amazon MULTILINESTRING ((-54.39421 -2.36508, -54.8251...
85 Ucayali Amazon MULTILINESTRING ((-71.67654 -15.38325, -71.373...
92 Xingu Amazon MULTILINESTRING ((-54.72902 -15.05463, -53.670...
97 Rio Grande, South America Parana MULTILINESTRING ((-44.37152 -22.17241, -43.860...

Let me combine per system:

In [35]:
ama_para=river_systems.dissolve(by='SYSTEM')
ama_para.drop(columns='RIVER',inplace=True)
ama_para
Out[35]:
geometry
SYSTEM
Amazon MULTILINESTRING ((-61.2773 -3.60706, -60.68466...
Parana MULTILINESTRING ((-53.47152 -16.67963, -54.309...

We still have lines:

In [36]:
ama_para.plot(cmap='viridis')
Out[36]:
<Axes: >
No description has been provided for this image

But we will have polygons after this:

In [37]:
ama_para.convex_hull.plot(cmap='viridis')
Out[37]:
<Axes: >
No description has been provided for this image

As we have a geoseries of two geometries...

In [38]:
ama_para.convex_hull,type(ama_para.convex_hull)
Out[38]:
(SYSTEM
 Amazon    POLYGON ((-68.59431 -15.48713, -71.67654 -15.3...
 Parana    POLYGON ((-58.13602 -34.19917, -59.41318 -33.6...
 dtype: geometry,
 geopandas.geoseries.GeoSeries)

Let's turn that into a GDF:

In [39]:
ama_para_hulls=ama_para.convex_hull.to_frame()
ama_para_hulls.rename(columns={0:'geometry'},inplace=True)
ama_para_hulls=ama_para_hulls.set_geometry('geometry')
ama_para_hulls.crs="EPSG:5880"

#voila
ama_para_hulls
Out[39]:
geometry
SYSTEM
Amazon POLYGON ((-68.594 -15.487, -71.677 -15.383, -7...
Parana POLYGON ((-58.136 -34.199, -59.413 -33.614, -6...

And now, the distance matrix:

In [40]:
D_Matrix_SYSHulls_air=ama_para_hulls.geometry.apply \
                (lambda system: large_airports.geometry.distance(system)/1000)
D_Matrix_SYSHulls_air
Out[40]:
airport_name Guarulhos - Governador André Franco Montoro International Airport Rio Galeão – Tom Jobim International Airport Presidente Juscelino Kubitschek International Airport Deputado Luiz Eduardo Magalhães International Airport Tancredo Neves International Airport Eduardo Gomes International Airport Hercílio Luz International Airport Val de Cans/Júlio Cezar Ribeiro International Airport
SYSTEM
Amazon 9373.464024 9619.902769 9987.801447 10839.402030 9870.020232 10586.985788 8867.621633 11335.110218
Parana 9373.473877 9619.912262 9987.812192 10839.411834 9870.030130 10587.000384 8867.631318 11335.122740

From here, you can compute distances between other kinds of geometries.

II. Clipping¶

Clipping uses a GDF geometry as a MASK to cut another GDF which suppossedly is bigger and needes to be clipped.

Pay attention to the world rivers again:

In [41]:
world_rivers
Out[41]:
RIVER SYSTEM geometry
0 Aldan Lena MULTILINESTRING ((124.00678 56.47258, 123.2595...
1 Amazon Amazon MULTILINESTRING ((-61.2773 -3.60706, -60.68466...
2 Amu Darya None MULTILINESTRING ((73.98818 37.49952, 73.52595 ...
3 Amur None MULTILINESTRING ((122.63956 49.9973, 120.47874...
4 Angara None MULTILINESTRING ((105.07841 51.93053, 103.9295...
... ... ... ...
93 Yangtze None MULTILINESTRING ((119.82609 32.24864, 118.9707...
94 Yenisey None MULTILINESTRING ((98.94706 52.57675, 98.12095 ...
95 Yukon None MULTILINESTRING ((-130.89319 59.2448, -131.065...
96 Zambezi None MULTILINESTRING ((35.52866 -17.66773, 36.28055...
97 Rio Grande, South America Parana MULTILINESTRING ((-44.37152 -22.17241, -43.860...

98 rows × 3 columns

As you see, this GDF has no Country column. But since it has geometry, you can keep the rivers, or their sections, that serve a country:

In [42]:
rivers_brazil5880 = gpd.clip(gdf=world_rivers.to_crs(5880),
                             mask=brazil5880)

See differences:

  • input 1: Rivers
In [43]:
world_rivers.plot()
Out[43]:
<Axes: >
No description has been provided for this image
  • input 2: mask (Brazil)
In [44]:
brazil5880.plot(facecolor="greenyellow")
Out[44]:
<Axes: >
No description has been provided for this image

Output: clipped rivers

In [45]:
rivers_brazil5880.plot()
Out[45]:
<Axes: >
No description has been provided for this image
In [46]:
base = brazil5880.plot(facecolor="greenyellow", edgecolor='black', linewidth=0.4,figsize=(5,5))
rivers_brazil5880.plot(edgecolor='blue', linewidth=0.5,
                    ax=base)
Out[46]:
<Axes: >
No description has been provided for this image

We can create our own mask for clipping:

Let me get the bounding box of the map (the smallest possible rectangle that completely encloses a geometric shape or set of shapes):

In [47]:
brazil5880.total_bounds #[minx, miny, maxx, maxy]
Out[47]:
array([ 2793074.63914733,  6264891.06203913,  7120881.08835731,
       10586462.14322563])
In [48]:
# saving the output
minx, miny, maxx, maxy=brazil5880.total_bounds
minx, miny, maxx, maxy
Out[48]:
(np.float64(2793074.6391473296),
 np.float64(6264891.0620391285),
 np.float64(7120881.088357305),
 np.float64(10586462.143225627))

I will combine those coordinates with the centroid (mid_x,mid_y)to create a BOX of the north and south of Brazil:

In [49]:
north_mask = [minx, mid_y, maxx, maxy]
south_mask = [minx, minx, maxx, mid_y]

# split Brazil
states_brazil5880.clip(north_mask).plot(edgecolor="yellow")
Out[49]:
<Axes: >
No description has been provided for this image
In [50]:
states_brazil5880.clip(south_mask).plot(edgecolor="yellow")
Out[50]:
<Axes: >
No description has been provided for this image

As you see, with clip we can cut polygons (not respecting the borders).

III. Spatial Joins¶

We’re familiar with merging, which joins tables using common keys. Spatial joins, by contrast, rely solely on geometry columns to perform various types of filtering.

Let me use the brazilian large airports and states:

In [51]:
large_airports.head()
Out[51]:
airport_type elevation_ft region municipality geometry
airport_name
Guarulhos - Governador André Franco Montoro International Airport large_airport 2461.0 São Paulo São Paulo POINT (5769392.959 7387510.487)
Rio Galeão – Tom Jobim International Airport large_airport 28.0 Rio de Janeiro Rio De Janeiro POINT (6102623.812 7436387.271)
Presidente Juscelino Kubitschek International Airport large_airport 3497.0 Distrito Federal Brasília POINT (5651010.181 8235390.09)
Deputado Luiz Eduardo Magalhães International Airport large_airport 64.0 Bahia Salvador POINT (6700330.807 8520415.591)
Tancredo Neves International Airport large_airport 2721.0 Minas Gerais Belo Horizonte POINT (6051718.72 7797008.905)

...and:

In [52]:
states_brazil5880.head()
Out[52]:
state_name state_code geometry
0 Acre BR12 MULTIPOLYGON (((3374854.317 8740996.704, 33745...
1 Alagoas BR27 MULTIPOLYGON (((7038407.237 8973545.06, 703823...
2 Amapá BR16 MULTIPOLYGON (((5393641.625 10233907.833, 5394...
3 Amazonas BR13 MULTIPOLYGON (((4499820.936 9906611.648, 45003...
4 Bahia BR29 MULTIPOLYGON (((6618113.779 7946308.264, 66178...
In [53]:
# as before set index for 'states'
states_brazil5880.set_index('state_name',inplace=True)
states_brazil5880.head()
Out[53]:
state_code geometry
state_name
Acre BR12 MULTIPOLYGON (((3374854.317 8740996.704, 33745...
Alagoas BR27 MULTIPOLYGON (((7038407.237 8973545.06, 703823...
Amapá BR16 MULTIPOLYGON (((5393641.625 10233907.833, 5394...
Amazonas BR13 MULTIPOLYGON (((4499820.936 9906611.648, 45003...
Bahia BR29 MULTIPOLYGON (((6618113.779 7946308.264, 66178...

a. Within¶

Let's ask:

The large airports whose geometries are within the borders of a state in Brazil.

In [54]:
airports_within_states = gpd.sjoin(
    large_airports,         # LEFT: airports we want to filter/keep
    states_brazil5880,      # RIGHT: spatial boundaries to check against
    how='inner',            # return geometries that match in both LEFT/RIGHT (jointype)
    predicate='within'      # spatial condition: LEFT geometry within RIGHT geometry
)

# these are:
airports_within_states
Out[54]:
airport_type elevation_ft region municipality geometry state_name state_code
airport_name
Guarulhos - Governador André Franco Montoro International Airport large_airport 2461.0 São Paulo São Paulo POINT (5769392.959 7387510.487) São Paulo BR35
Rio Galeão – Tom Jobim International Airport large_airport 28.0 Rio de Janeiro Rio De Janeiro POINT (6102623.812 7436387.271) Rio de Janeiro BR33
Presidente Juscelino Kubitschek International Airport large_airport 3497.0 Distrito Federal Brasília POINT (5651010.181 8235390.09) Distrito Federal BR53
Deputado Luiz Eduardo Magalhães International Airport large_airport 64.0 Bahia Salvador POINT (6700330.807 8520415.591) Bahia BR29
Tancredo Neves International Airport large_airport 2721.0 Minas Gerais Belo Horizonte POINT (6051718.72 7797008.905) Minas Gerais BR31
Eduardo Gomes International Airport large_airport 264.0 Amazonas Manaus POINT (4327494.283 9662122.706) Amazonas BR13
Hercílio Luz International Airport large_airport 16.0 Santa Catarina Florianópolis POINT (5537273.577 6926234.097) Santa Catarina BR42
Val de Cans/Júlio Cezar Ribeiro International Airport large_airport 54.0 Pará Belém POINT (5614728.302 9846773.655) Pará BR15

We just performed a point-to-polygon spatial join. Notice that the result preserves the original geometries from the LEFT GeoDataFrame — specifically, only those features whose spatial relationship satisfied both the predicate (e.g., 'within') and the join type ('inner'). The non-geometric attributes (columns) from the RIGHT GeoDataFrame are joined to the matching rows.

Notice that, the request gave you points. I will add all the states below:

In [55]:
base=states_brazil5880.explore(edgecolor='black',style_kwds={'fillOpacity': 0.1,'color':'grey'})
airports_within_states.explore(color='red',m=base)
Out[55]:
Make this Notebook Trusted to load map: File -> Trust Notebook

b. Contains¶

Importantly, if the LEFT GeoDataFrame geometries are polygons and the RIGHT one points (a polygon-to-point join), you’ll typically need to use a different predicate — such as 'contains' — to express the spatial relationship correctly.

Brazilian states that house large airports

In [56]:
states_containing_LargeAirports = gpd.sjoin(states_brazil5880,large_airports,how='inner',
                                            predicate='contains')

states_containing_LargeAirports
Out[56]:
state_code geometry airport_name airport_type elevation_ft region municipality
state_name
Amazonas BR13 MULTIPOLYGON (((4499820.936 9906611.648, 45003... Eduardo Gomes International Airport large_airport 264.0 Amazonas Manaus
Bahia BR29 MULTIPOLYGON (((6618113.779 7946308.264, 66178... Deputado Luiz Eduardo Magalhães International ... large_airport 64.0 Bahia Salvador
Distrito Federal BR53 MULTIPOLYGON (((5709200.339 8232316.484, 57090... Presidente Juscelino Kubitschek International ... large_airport 3497.0 Distrito Federal Brasília
Minas Gerais BR31 MULTIPOLYGON (((5730280.854 8106234.254, 57299... Tancredo Neves International Airport large_airport 2721.0 Minas Gerais Belo Horizonte
Pará BR15 MULTIPOLYGON (((5842059.053 9886741.08, 584156... Val de Cans/Júlio Cezar Ribeiro International ... large_airport 54.0 Pará Belém
Rio de Janeiro BR33 MULTIPOLYGON (((5957764.631 7386508.56, 595773... Rio Galeão – Tom Jobim International Airport large_airport 28.0 Rio de Janeiro Rio De Janeiro
Santa Catarina BR42 MULTIPOLYGON (((5384989.919 6745543.64, 538466... Hercílio Luz International Airport large_airport 16.0 Santa Catarina Florianópolis
São Paulo BR35 MULTIPOLYGON (((5619062.728 7205958.887, 56186... Guarulhos - Governador André Franco Montoro In... large_airport 2461.0 São Paulo São Paulo

As requested, you got polygons (not the airports). I will add airports for reference.

In [57]:
base=states_containing_LargeAirports.explore(color='red')
large_airports.explore(color='white',m=base)
Out[57]:
Make this Notebook Trusted to load map: File -> Trust Notebook

c. Intersects¶

'Contains' and 'within' are literally strict: Any airport located exactly on a state boundary — whether due to data precision, snapping, or real geography — will be excluded, even if it’s “practically” inside the state. More flexibility is achieved with intersects.

A point on the border of Rio de Janeiro, can be detected if I use intersect!

In [58]:
# this is RDJ:
Rio=states_brazil5880.loc[['Rio de Janeiro'],:]

#this a point on its border
from shapely.geometry import Point

coordinates=Rio.geometry.iloc[0].boundary.geoms[0].coords[1]
aRDJ_PointInBorder=Point(coordinates)
aRDJ_borderPoint = gpd.GeoDataFrame({'name': ['RioJaneiro Border Point']}, 
                                    geometry=[aRDJ_PointInBorder], 
                                    crs=states_brazil5880.crs)
In [59]:
## Intersects result
gpd.sjoin(aRDJ_borderPoint,Rio,
          how='inner', predicate='intersects')
Out[59]:
name geometry state_name state_code
0 RioJaneiro Border Point POINT (5957732.828 7386016.767) Rio de Janeiro BR33

A point on the border of Rio de Janeiro, can NOT be detected if I use within / contains

In [60]:
## within should return no rows
gpd.sjoin(aRDJ_borderPoint,Rio,
          how='inner', predicate='within')
Out[60]:
name geometry state_name state_code
In [61]:
## contains should return no rows
gpd.sjoin(Rio,aRDJ_borderPoint,
          how='inner', predicate='contains')
Out[61]:
state_code geometry index_right name
state_name

We knew this:

In [62]:
# we need crs4326 for explore:
aRDJ_borderPoint_latlon = aRDJ_borderPoint.to_crs(4326).geometry.iloc[0]

base=Rio.explore(zoom_start=20,location=[aRDJ_borderPoint_latlon.y, aRDJ_borderPoint_latlon.x])
aRDJ_borderPoint.explore(m=base, color='red')
Out[62]:
Make this Notebook Trusted to load map: File -> Trust Notebook

d. Touches¶

We also have 'touches', a more stringent predicate than 'intersects'. It returns geometries that:

  • Share a only a border (for polygons or lines), or
  • Have only one tangent point in common.

Which states are neighbors of 'BAHIA", including BAHIA

In [63]:
# Neighbors of Bahia?
gpd.sjoin(N_brazil.loc[N_brazil.state_name=='Bahia',:],N_brazil,how='inner', predicate='intersects').shape
Out[63]:
(6, 6)

That is, Bahia seems to share borders with 5 states:

In [64]:
base=gpd.sjoin(N_brazil,N_brazil.loc[N_brazil.state_name=='Bahia',:],
               how='inner', 
               predicate='intersects').plot(color='yellow',edgecolor='red')
N_brazil.loc[N_brazil.state_name=='Bahia',:].plot(ax=base, color='red')
Out[64]:
<Axes: >
No description has been provided for this image

However, because many free GeoDataFrames — especially those sourced as Shapefiles — contain topological imperfections like gaps, overlaps, or misaligned vertices, 'touches' often fails to detect what should be adjacent features. Ironically, this “failure” can be useful: 'touches' acts as a diagnostic tool — highlighting where boundaries are not perfectly aligned.

In [65]:
gpd.sjoin(N_brazil.loc[N_brazil.state_name=='Bahia',:],N_brazil,how='inner', predicate='touches').shape
Out[65]:
(4, 6)

See the neighbor that disappears:

In [66]:
base=gpd.sjoin(N_brazil,N_brazil.loc[N_brazil.state_name=='Bahia',:],
               how='inner', 
               predicate='touches').plot(color='yellow',edgecolor='red')
N_brazil.loc[N_brazil.state_name=='Bahia',:].plot(ax=base, color='red')
Out[66]:
<Axes: >
No description has been provided for this image

e. Crosses¶

When we have lines, we may need crosses. Let me subset our rivers:

In [67]:
amazonSystem=rivers_brazil5880[rivers_brazil5880.SYSTEM=='Amazon']
amazonSystem.set_index('RIVER',inplace=True)

Then,

Which rivers from the Amazon system are intersecting states?

In [68]:
gpd.sjoin(amazonSystem,states_brazil5880,how='inner', predicate='intersects')
Out[68]:
SYSTEM geometry state_name state_code
RIVER
Madre de Dios Amazon LINESTRING (3747782.967 8816805.526, 3784022.8... Rondônia BR11
Guapore Amazon MULTILINESTRING ((3784022.891 8842606.845, 378... Rondônia BR11
Rio Juruena Amazon LINESTRING (4488340.769 8382801.957, 4561243.9... Mato Grosso BR51
Rio Juruena Amazon LINESTRING (4488340.769 8382801.957, 4561243.9... Pará BR15
Rio Teles Pires Amazon LINESTRING (4890008.905 8388474.8, 4929569.193... Mato Grosso BR51
Rio Teles Pires Amazon LINESTRING (4890008.905 8388474.8, 4929569.193... Pará BR15
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503... Mato Grosso BR51
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503... Pará BR15
Purus Amazon LINESTRING (3179817.343 8884480.014, 3233812.4... Acre BR12
Purus Amazon LINESTRING (3179817.343 8884480.014, 3233812.4... Amazonas BR13
Madeira Amazon LINESTRING (3784022.891 8842606.845, 3785447.5... Rondônia BR11
Madeira Amazon LINESTRING (3784022.891 8842606.845, 3785447.5... Amazonas BR13
Tapajos Amazon LINESTRING (4956154.209 9738475.606, 4908229.1... Pará BR15
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425... Amazonas BR13
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425... Pará BR15
Putamayo Amazon LINESTRING (3255840.239 9675268.883, 3294855.9... Amazonas BR13
Japura Amazon MULTILINESTRING ((3782432.967 9748590.005, 382... Amazonas BR13
Rio Negro Amazon MULTILINESTRING ((3592372.872 10116670.869, 35... Amazonas BR13
Rio Negro Amazon MULTILINESTRING ((3592372.872 10116670.869, 35... Roraima BR14
Rio Branco Amazon LINESTRING (4352974.168 10393701.397, 4312100.... Roraima BR14

A count of the result:

In [69]:
gpd.sjoin(amazonSystem,states_brazil5880,how='inner', predicate='intersects').shape
Out[69]:
(20, 4)

Alternatively,

Which rivers from the Amazon system are crossing states?

In [70]:
gpd.sjoin(amazonSystem,states_brazil5880,how='inner', predicate='crosses')
Out[70]:
SYSTEM geometry state_name state_code
RIVER
Madre de Dios Amazon LINESTRING (3747782.967 8816805.526, 3784022.8... Rondônia BR11
Guapore Amazon MULTILINESTRING ((3784022.891 8842606.845, 378... Rondônia BR11
Rio Juruena Amazon LINESTRING (4488340.769 8382801.957, 4561243.9... Mato Grosso BR51
Rio Juruena Amazon LINESTRING (4488340.769 8382801.957, 4561243.9... Pará BR15
Rio Teles Pires Amazon LINESTRING (4890008.905 8388474.8, 4929569.193... Mato Grosso BR51
Rio Teles Pires Amazon LINESTRING (4890008.905 8388474.8, 4929569.193... Pará BR15
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503... Mato Grosso BR51
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503... Pará BR15
Purus Amazon LINESTRING (3179817.343 8884480.014, 3233812.4... Acre BR12
Purus Amazon LINESTRING (3179817.343 8884480.014, 3233812.4... Amazonas BR13
Madeira Amazon LINESTRING (3784022.891 8842606.845, 3785447.5... Rondônia BR11
Madeira Amazon LINESTRING (3784022.891 8842606.845, 3785447.5... Amazonas BR13
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425... Amazonas BR13
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425... Pará BR15
Putamayo Amazon LINESTRING (3255840.239 9675268.883, 3294855.9... Amazonas BR13
Japura Amazon MULTILINESTRING ((3782432.967 9748590.005, 382... Amazonas BR13
Rio Negro Amazon MULTILINESTRING ((3592372.872 10116670.869, 35... Amazonas BR13
Rio Negro Amazon MULTILINESTRING ((3592372.872 10116670.869, 35... Roraima BR14
Rio Branco Amazon LINESTRING (4352974.168 10393701.397, 4312100.... Roraima BR14

You got one less:

In [71]:
gpd.sjoin(amazonSystem,states_brazil5880,how='inner', predicate='crosses').shape
Out[71]:
(19, 4)

Again, intersects means both geometries have some 'space' in common. But crosses is an intersection that has to cross the spatial object. From the result above, there is one river that shares space with the state, but is not crossing its border:

In [72]:
# Get intersects result
intersects_result = gpd.sjoin(amazonSystem,states_brazil5880, how='inner', predicate='intersects')
intersects_result
Out[72]:
SYSTEM geometry state_name state_code
RIVER
Madre de Dios Amazon LINESTRING (3747782.967 8816805.526, 3784022.8... Rondônia BR11
Guapore Amazon MULTILINESTRING ((3784022.891 8842606.845, 378... Rondônia BR11
Rio Juruena Amazon LINESTRING (4488340.769 8382801.957, 4561243.9... Mato Grosso BR51
Rio Juruena Amazon LINESTRING (4488340.769 8382801.957, 4561243.9... Pará BR15
Rio Teles Pires Amazon LINESTRING (4890008.905 8388474.8, 4929569.193... Mato Grosso BR51
Rio Teles Pires Amazon LINESTRING (4890008.905 8388474.8, 4929569.193... Pará BR15
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503... Mato Grosso BR51
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503... Pará BR15
Purus Amazon LINESTRING (3179817.343 8884480.014, 3233812.4... Acre BR12
Purus Amazon LINESTRING (3179817.343 8884480.014, 3233812.4... Amazonas BR13
Madeira Amazon LINESTRING (3784022.891 8842606.845, 3785447.5... Rondônia BR11
Madeira Amazon LINESTRING (3784022.891 8842606.845, 3785447.5... Amazonas BR13
Tapajos Amazon LINESTRING (4956154.209 9738475.606, 4908229.1... Pará BR15
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425... Amazonas BR13
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425... Pará BR15
Putamayo Amazon LINESTRING (3255840.239 9675268.883, 3294855.9... Amazonas BR13
Japura Amazon MULTILINESTRING ((3782432.967 9748590.005, 382... Amazonas BR13
Rio Negro Amazon MULTILINESTRING ((3592372.872 10116670.869, 35... Amazonas BR13
Rio Negro Amazon MULTILINESTRING ((3592372.872 10116670.869, 35... Roraima BR14
Rio Branco Amazon LINESTRING (4352974.168 10393701.397, 4312100.... Roraima BR14
In [73]:
# Get crosses result
crosses_result = gpd.sjoin(amazonSystem,states_brazil5880, how='inner', predicate='crosses')
crosses_result
Out[73]:
SYSTEM geometry state_name state_code
RIVER
Madre de Dios Amazon LINESTRING (3747782.967 8816805.526, 3784022.8... Rondônia BR11
Guapore Amazon MULTILINESTRING ((3784022.891 8842606.845, 378... Rondônia BR11
Rio Juruena Amazon LINESTRING (4488340.769 8382801.957, 4561243.9... Mato Grosso BR51
Rio Juruena Amazon LINESTRING (4488340.769 8382801.957, 4561243.9... Pará BR15
Rio Teles Pires Amazon LINESTRING (4890008.905 8388474.8, 4929569.193... Mato Grosso BR51
Rio Teles Pires Amazon LINESTRING (4890008.905 8388474.8, 4929569.193... Pará BR15
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503... Mato Grosso BR51
Xingu Amazon MULTILINESTRING ((4921614.129 8334835.678, 503... Pará BR15
Purus Amazon LINESTRING (3179817.343 8884480.014, 3233812.4... Acre BR12
Purus Amazon LINESTRING (3179817.343 8884480.014, 3233812.4... Amazonas BR13
Madeira Amazon LINESTRING (3784022.891 8842606.845, 3785447.5... Rondônia BR11
Madeira Amazon LINESTRING (3784022.891 8842606.845, 3785447.5... Amazonas BR13
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425... Amazonas BR13
Amazon Amazon MULTILINESTRING ((4191497.619 9597916.381, 425... Pará BR15
Putamayo Amazon LINESTRING (3255840.239 9675268.883, 3294855.9... Amazonas BR13
Japura Amazon MULTILINESTRING ((3782432.967 9748590.005, 382... Amazonas BR13
Rio Negro Amazon MULTILINESTRING ((3592372.872 10116670.869, 35... Amazonas BR13
Rio Negro Amazon MULTILINESTRING ((3592372.872 10116670.869, 35... Roraima BR14
Rio Branco Amazon LINESTRING (4352974.168 10393701.397, 4312100.... Roraima BR14
In [74]:
river_notCrossing=list(set(intersects_result.index)-set(crosses_result.index))
river_notCrossing
Out[74]:
['Tapajos']
In [75]:
# Find indexes/columns
state_notCrossed=intersects_result.loc[river_notCrossing,'state_name'].to_list()
state_notCrossed
Out[75]:
['Pará']

Now we know the river that is not crossing an state, and the name of that state.

In [76]:
base=states_brazil5880.loc[state_notCrossed,:].plot(color='w',edgecolor='k',figsize=(12, 8))
amazonSystem.plot(ax=base)
amazonSystem.loc[river_notCrossing,:].plot(color='red',ax=base)
Out[76]:
<Axes: >
No description has been provided for this image

IV. Spatial Overlay¶

As the name implies, you need two inputs. We might need to create or find some geometries from the geometries we already have. Using a set theory approach, we will see the use of intersection, union, difference, and symmetric difference. Let's remember these results:

In [77]:
# Create a figure and a 2x2 grid of axes
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 10))
axes = axes.flatten()

N_brazil.plot(color='pink',edgecolor='k',ax=axes[0])
axes[0].set_title("1. North")

S_brazil.plot(color='pink',edgecolor='k',ax=axes[1])
axes[1].set_title("2. South")

W_brazil.plot(color='pink',edgecolor='k',ax=axes[2])
axes[2].set_title("3. West")

E_brazil.plot(color='pink',edgecolor='k',ax=axes[3])
axes[3].set_title("4. East")

plt.show()
No description has been provided for this image

Let's play with these GDFs, keep in mind the position of the GDFs:

GDFleft.overlay(GDFright, ...)

a. Intersection¶

We keep what is common between left and right GDFs.

Intersection of North and South

In [78]:
NS_brazil=N_brazil.overlay(S_brazil, how="intersection",keep_geom_type=True)
NS_brazil.plot(color='pink',edgecolor='k')
Out[78]:
<Axes: >
No description has been provided for this image

Intersection of West and East

In [79]:
WE_brazil=W_brazil.overlay(E_brazil, how="intersection",keep_geom_type=True)
WE_brazil.plot(color='pink',edgecolor='k')
Out[79]:
<Axes: >
No description has been provided for this image

b. Union¶

In the union overlay, we do not return dissolved geometries. Instead, we return a complete set of new geometries that represent every unique spatial combination of the two input GeoDataFrames. The resulting GeoDataFrame will contain the columns from both GDFs, and the row count will be the sum of all parts of A, all parts of B, and the newly created intersection parts, potentially inflated by topological flaws

Unite the West/East intersection with the North/South intersection

In [80]:
mid_Brazil=NS_brazil.overlay(WE_brazil,how="union",keep_geom_type=True)
mid_Brazil.plot(color='pink',edgecolor='k')
Out[80]:
<Axes: >
No description has been provided for this image

c. Difference¶

Here, you keep what belongs to the left GDF that is not in the right GDF.

Keep nothern states that are not in the southern ones

In [81]:
NS_diff_brazil=N_brazil.overlay(S_brazil, how='difference')
NS_diff_brazil.plot(color='pink',edgecolor='k')
Out[81]:
<Axes: >
No description has been provided for this image

d. Symmetric Difference¶

Here, we keep what is not in the intersection but in both GDFs.

Unite Northern and Southern states, but keep states that are not in their intersection.

In [82]:
NS_simdiff_brazil=N_brazil.overlay(S_brazil, how='symmetric_difference')
NS_simdiff_brazil.plot(color='pink',edgecolor='k')
Out[82]:
<Axes: >
No description has been provided for this image

e. Cleaning and Overlay¶

Overlay and SJoin differ in that overlay, as it creates geometries, may not output clean results. Not because overlay fault, but due to the quality of the original maps.

Let me use the intersection as our topological detective.

  • Using set operations: What are the common states between northern and souther states.
In [83]:
setIntersection=set(N_brazil.state_name) & set(S_brazil.state_name)

# how many, which are they
len(setIntersection),setIntersection
Out[83]:
(8,
 {'Acre',
  'Alagoas',
  'Bahia',
  'Mato Grosso',
  'Piauí',
  'Rondônia',
  'Sergipe',
  'Tocantins'})
  • The spatial overlay with keep_geom_type as True:
In [84]:
NS_brazil_T=N_brazil.overlay(S_brazil, how="intersection",keep_geom_type=True)

# different from 8 (see above)
NS_brazil_T.shape[0]
Out[84]:
11
  • The spatial overlay with keep_geom_type as False:
In [85]:
NS_brazil_F=N_brazil.overlay(S_brazil, how="intersection",keep_geom_type=False)

# different from 8 and 11 (see above)
NS_brazil_F.shape[0]
Out[85]:
43

As you see, the intersection that we plotted before had several topological issues we could not see with bare eyes. There were several additional geometries that represent flaws. When we used "keep_geom_type=True" the results were polygons (left GDF geometries were polygons); but when we asked "keep_geom_type=False" the result may include polygons and more (even collections). Let's see this last case:

In [86]:
NS_brazil_F.geometry.geom_type.value_counts()
Out[86]:
MultiLineString       32
MultiPolygon           4
Polygon                4
GeometryCollection     3
Name: count, dtype: int64

Can this be fixed?

Let's see what can be done:

  • review the previous intersection:
In [87]:
NS_brazil_T
Out[87]:
state_name_1 state_code_1 state_name_2 state_code_2 geometry
0 Acre BR12 Acre BR12 MULTIPOLYGON (((3374472.826 8741412.368, 33746...
1 Alagoas BR27 Alagoas BR27 POLYGON ((7038233.546 8973146.179, 7038561.661...
2 Bahia BR29 Bahia BR29 MULTIPOLYGON (((6617883.775 7945834.225, 66175...
3 Bahia BR29 Minas Gerais BR31 POLYGON ((6473422.295 8203724.413, 6472353.598...
4 Mato Grosso BR51 Mato Grosso BR51 MULTIPOLYGON (((4718500.714 8080705.175, 47182...
5 Pernambuco BR26 Alagoas BR27 POLYGON ((6965686.284 8954189.214, 6965358.315...
6 Pernambuco BR26 Bahia BR29 POLYGON ((6611335.359 9024941.961, 6611714.257...
7 Piauí BR22 Piauí BR22 POLYGON ((6358809.561 9687374.554, 6358994.184...
8 Rondônia BR11 Rondônia BR11 MULTIPOLYGON (((4336668.469 8680873.21, 433691...
9 Sergipe BR28 Sergipe BR28 POLYGON ((6758965.277 8905155.665, 6759481.274...
10 Tocantins BR17 Tocantins BR17 POLYGON ((5765755.273 8998397.361, 5765506.449...
  • Detect problematic rows
In [88]:
NS_brazil_T[NS_brazil_T.state_name_1!=NS_brazil_T.state_name_2]
Out[88]:
state_name_1 state_code_1 state_name_2 state_code_2 geometry
3 Bahia BR29 Minas Gerais BR31 POLYGON ((6473422.295 8203724.413, 6472353.598...
5 Pernambuco BR26 Alagoas BR27 POLYGON ((6965686.284 8954189.214, 6965358.315...
6 Pernambuco BR26 Bahia BR29 POLYGON ((6611335.359 9024941.961, 6611714.257...

Pernambuco and Minas Gerais must have some of their points "invading" Bahia and Alagoas polygon. The strategy here is that Bahia and Aalagoas will not push away the invading points, but to 'abandon' the area in conflict (difference) for the sake of clean limits (dissolve).

  • We need to work with the original states. Let's make a copy of them before cleaning:
In [89]:
states_brazil_clean=states_brazil5880.copy()
states_brazil_clean.head()
Out[89]:
state_code geometry
state_name
Acre BR12 MULTIPOLYGON (((3374854.317 8740996.704, 33745...
Alagoas BR27 MULTIPOLYGON (((7038407.237 8973545.06, 703823...
Amapá BR16 MULTIPOLYGON (((5393641.625 10233907.833, 5394...
Amazonas BR13 MULTIPOLYGON (((4499820.936 9906611.648, 45003...
Bahia BR29 MULTIPOLYGON (((6618113.779 7946308.264, 66178...
  • Let's dissolve Pernambuco and Minas Gerais, the invaders:
In [90]:
alienStates=["Minas Gerais","Pernambuco"]
alienUnion_GDF=states_brazil_clean[states_brazil_clean.index.isin(alienStates)].dissolve()
alienUnion_GDF
Out[90]:
geometry state_code
0 MULTIPOLYGON (((5715427.146 8093800.386, 57154... BR31
  • Keep the states to be cleaned from the 'alienStates', no dissolving - just filtering:
In [91]:
forCleaning=["Alagoas","Bahia"] #order matters
forCleaning_GDF=states_brazil_clean[states_brazil_clean.index.isin(forCleaning)]
forCleaning_GDF
Out[91]:
state_code geometry
state_name
Alagoas BR27 MULTIPOLYGON (((7038407.237 8973545.06, 703823...
Bahia BR29 MULTIPOLYGON (((6618113.779 7946308.264, 66178...
  • Recreate boundaries, so that no intersection occurs... do you see different values in geometry column now?
In [92]:
recreatedPolygons=forCleaning_GDF.overlay(alienUnion_GDF,how="difference",keep_geom_type=False).dissolve(by="state_code")
recreatedPolygons
Out[92]:
geometry
state_code
BR27 POLYGON ((7038233.546 8973146.179, 7038561.661...
BR29 MULTIPOLYGON (((6618113.779 7946308.264, 66178...
  • Replace old geometries with recent values.

This is a sensible moment: You need to change just the geometry cells, not the whole rows:

In [93]:
#old values
states_brazil_clean.loc[forCleaning,'geometry']
Out[93]:
state_name
Alagoas    MULTIPOLYGON (((7038407.237 8973545.06, 703823...
Bahia      MULTIPOLYGON (((6618113.779 7946308.264, 66178...
Name: geometry, dtype: geometry
In [94]:
#new values - notice order
recreatedPolygons.geometry
Out[94]:
state_code
BR27    POLYGON ((7038233.546 8973146.179, 7038561.661...
BR29    MULTIPOLYGON (((6618113.779 7946308.264, 66178...
Name: geometry, dtype: geometry
In [95]:
# but use this for replacement
recreatedPolygons.geometry.values
Out[95]:
<GeometryArray>
[<POLYGON ((7038233.546 8973146.179, 7038561.661 8972873.722, 7038666.047 897...>, <MULTIPOLYGON (((6618113.779 7946308.264, 6617883.775 7945834.225, 6617580.6...>]
Length: 2, dtype: geometry

Then,

In [96]:
states_brazil_clean.loc[forCleaning,'geometry']=recreatedPolygons.geometry.values

#see
states_brazil_clean.loc[forCleaning,'geometry']
Out[96]:
state_name
Alagoas    POLYGON ((7038233.546 8973146.179, 7038561.661...
Bahia      MULTIPOLYGON (((6618113.779 7946308.264, 66178...
Name: geometry, dtype: geometry
  • We need to recreate northern and southern GDFs, let's redo the four ones at once:
In [97]:
# the north
N_brazil_clean=states_brazil_clean.reset_index().cx[:,mid_y:]
# the south
S_brazil_clean=states_brazil_clean.reset_index().cx[:,:mid_y]
# the west
W_brazil_clean=states_brazil_clean.reset_index().cx[:mid_x,:]
# the east
E_brazil_clean=states_brazil_clean.reset_index().cx[mid_x:,:]
  • Confirm you have improved the intersection with keep_geom_type=True:
In [98]:
N_brazil_clean.overlay(S_brazil_clean, how="intersection",keep_geom_type=True)
Out[98]:
state_name_1 state_code_1 state_name_2 state_code_2 geometry
0 Acre BR12 Acre BR12 MULTIPOLYGON (((3374472.826 8741412.368, 33746...
1 Alagoas BR27 Alagoas BR27 POLYGON ((7038561.661 8972873.722, 7038666.047...
2 Bahia BR29 Bahia BR29 MULTIPOLYGON (((6617883.775 7945834.225, 66175...
3 Mato Grosso BR51 Mato Grosso BR51 MULTIPOLYGON (((4718500.714 8080705.175, 47182...
4 Piauí BR22 Piauí BR22 POLYGON ((6358809.561 9687374.554, 6358994.184...
5 Rondônia BR11 Rondônia BR11 MULTIPOLYGON (((4336668.469 8680873.21, 433691...
6 Sergipe BR28 Sergipe BR28 POLYGON ((6758965.277 8905155.665, 6759481.274...
7 Tocantins BR17 Tocantins BR17 POLYGON ((5765755.273 8998397.361, 5765506.449...
  • Confirm you have improved the intersection with keep_geom_type=False:
In [99]:
N_brazil_clean.overlay(S_brazil_clean, how="intersection",keep_geom_type=False).geometry.geom_type.value_counts()
Out[99]:
MultiLineString    35
MultiPolygon        4
Polygon             4
Name: count, dtype: int64

All this work did help. But remember while we cleaned the neighborhood issues between 4 states, cleaning 2 of them there might be more cleaning remaining.

What about union?

The mid_Brazil had these many geometries (rows):

In [100]:
len(mid_Brazil)
Out[100]:
22

Just using sets, this is the target count:

In [101]:
interNS=set(N_brazil_clean.state_name)&set(S_brazil_clean.state_name)
interEW=set(E_brazil_clean.state_name)&set(W_brazil_clean.state_name)
union_interNSEW=interNS|interEW
len(union_interNSEW)
Out[101]:
15

Let's recreate intersection to recreate a clean mid Brazil:

In [102]:
NS_brazil_clean=N_brazil_clean.overlay(S_brazil_clean, how="intersection",keep_geom_type=True)
EW_brazil_clean=E_brazil_clean.overlay(W_brazil_clean, how="intersection",keep_geom_type=True)

# then
mid_Brazil_clean=NS_brazil_clean.overlay(EW_brazil_clean,how="union",keep_geom_type=True)

# count
len(mid_Brazil_clean)
Out[102]:
19

Again, we had improved, but you realize there is still more cleaning to do.

Finally, let's check the recent result:

In [103]:
mid_Brazil_clean
Out[103]:
state_name_1_1 state_code_1_1 state_name_2_1 state_code_2_1 state_name_1_2 state_code_1_2 state_name_2_2 state_code_2_2 geometry
0 Mato Grosso BR51 Mato Grosso BR51 Mato Grosso BR51 Mato Grosso BR51 MULTIPOLYGON (((4718274.258 8081710.261, 47189...
1 Acre BR12 Acre BR12 NaN NaN NaN NaN MULTIPOLYGON (((3374472.826 8741412.368, 33746...
2 Alagoas BR27 Alagoas BR27 NaN NaN NaN NaN POLYGON ((7038561.661 8972873.722, 7038666.047...
3 Bahia BR29 Bahia BR29 NaN NaN NaN NaN MULTIPOLYGON (((6617580.695 7946157.331, 66181...
4 Piauí BR22 Piauí BR22 NaN NaN NaN NaN POLYGON ((6358809.561 9687374.554, 6358994.184...
5 Rondônia BR11 Rondônia BR11 NaN NaN NaN NaN MULTIPOLYGON (((4336911.983 8681201.152, 43372...
6 Sergipe BR28 Sergipe BR28 NaN NaN NaN NaN POLYGON ((6758965.277 8905155.665, 6759481.274...
7 Tocantins BR17 Tocantins BR17 NaN NaN NaN NaN POLYGON ((5764722.95 8996734.494, 5764264.595 ...
8 NaN NaN NaN NaN Amapá BR16 Amapá BR16 MULTIPOLYGON (((5394219.767 10234053.175, 5394...
9 NaN NaN NaN NaN Distrito Federal BR53 Goiás BR52 MULTIPOLYGON (((5709441.711 8230228.217, 57094...
10 NaN NaN NaN NaN Goiás BR52 Goiás BR52 MULTIPOLYGON (((5715352.084 8093167.787, 57150...
11 NaN NaN NaN NaN Mato Grosso do Sul BR50 Mato Grosso do Sul BR50 MULTIPOLYGON (((4643554.93 7541102.839, 464377...
12 NaN NaN NaN NaN Minas Gerais BR31 Goiás BR52 MULTIPOLYGON (((5488048.073 7931930.787, 54880...
13 NaN NaN NaN NaN Pará BR15 Pará BR15 MULTIPOLYGON (((5670588.737 9923682.938, 56706...
14 NaN NaN NaN NaN Paraná BR41 Paraná BR41 MULTIPOLYGON (((5445907.946 7090332.337, 54456...
15 NaN NaN NaN NaN Paraná BR41 Santa Catarina BR42 MULTIPOLYGON (((5247750.046 7056248.157, 52477...
16 NaN NaN NaN NaN Rio Grande do Sul BR43 Rio Grande do Sul BR43 MULTIPOLYGON (((5390972.988 6769017.466, 53912...
17 NaN NaN NaN NaN Santa Catarina BR42 Paraná BR41 MULTIPOLYGON (((5247761.665 7056252.309, 52476...
18 NaN NaN NaN NaN Santa Catarina BR42 Santa Catarina BR42 MULTIPOLYGON (((5384989.919 6745543.64, 538466...

The UNION overlay is a simple idea. But the result above does not seem friendly.

Let's recreat this GDF from previous steps:

In [104]:
NS_brazil_clean
Out[104]:
state_name_1 state_code_1 state_name_2 state_code_2 geometry
0 Acre BR12 Acre BR12 MULTIPOLYGON (((3374472.826 8741412.368, 33746...
1 Alagoas BR27 Alagoas BR27 POLYGON ((7038561.661 8972873.722, 7038666.047...
2 Bahia BR29 Bahia BR29 MULTIPOLYGON (((6617883.775 7945834.225, 66175...
3 Mato Grosso BR51 Mato Grosso BR51 MULTIPOLYGON (((4718500.714 8080705.175, 47182...
4 Piauí BR22 Piauí BR22 POLYGON ((6358809.561 9687374.554, 6358994.184...
5 Rondônia BR11 Rondônia BR11 MULTIPOLYGON (((4336668.469 8680873.21, 433691...
6 Sergipe BR28 Sergipe BR28 POLYGON ((6758965.277 8905155.665, 6759481.274...
7 Tocantins BR17 Tocantins BR17 POLYGON ((5765755.273 8998397.361, 5765506.449...
In [105]:
# better format
NS_brazil_clean.drop(columns=['state_code_1','state_name_2','state_code_2'],inplace=True)
NS_brazil_clean.rename(columns={'state_name_1':'state_name_clean'},inplace=True)
# then
NS_brazil_clean
Out[105]:
state_name_clean geometry
0 Acre MULTIPOLYGON (((3374472.826 8741412.368, 33746...
1 Alagoas POLYGON ((7038561.661 8972873.722, 7038666.047...
2 Bahia MULTIPOLYGON (((6617883.775 7945834.225, 66175...
3 Mato Grosso MULTIPOLYGON (((4718500.714 8080705.175, 47182...
4 Piauí POLYGON ((6358809.561 9687374.554, 6358994.184...
5 Rondônia MULTIPOLYGON (((4336668.469 8680873.21, 433691...
6 Sergipe POLYGON ((6758965.277 8905155.665, 6759481.274...
7 Tocantins POLYGON ((5765755.273 8998397.361, 5765506.449...

Notice this one is not cleaned:

In [106]:
EW_brazil_clean
Out[106]:
state_name_1 state_code_1 state_name_2 state_code_2 geometry
0 Amapá BR16 Amapá BR16 MULTIPOLYGON (((5394219.767 10234053.175, 5394...
1 Distrito Federal BR53 Goiás BR52 MULTIPOLYGON (((5709441.711 8230228.217, 57094...
2 Goiás BR52 Goiás BR52 MULTIPOLYGON (((5715352.084 8093167.787, 57150...
3 Mato Grosso BR51 Mato Grosso BR51 MULTIPOLYGON (((4718500.714 8080705.175, 47182...
4 Mato Grosso do Sul BR50 Mato Grosso do Sul BR50 MULTIPOLYGON (((4643440.815 7540678.682, 46435...
5 Minas Gerais BR31 Goiás BR52 MULTIPOLYGON (((5488048.073 7931930.787, 54880...
6 Pará BR15 Pará BR15 MULTIPOLYGON (((5669998.17 9923194.452, 567022...
7 Paraná BR41 Paraná BR41 MULTIPOLYGON (((5445907.946 7090332.337, 54456...
8 Paraná BR41 Santa Catarina BR42 MULTIPOLYGON (((5247750.046 7056248.157, 52477...
9 Rio Grande do Sul BR43 Rio Grande do Sul BR43 MULTIPOLYGON (((5390972.988 6769017.466, 53912...
10 Santa Catarina BR42 Paraná BR41 MULTIPOLYGON (((5247761.665 7056252.309, 52476...
11 Santa Catarina BR42 Santa Catarina BR42 MULTIPOLYGON (((5384989.919 6745543.64, 538466...

We will not do the any reformatting as before, since we can not erase columns.

Let's see the 'mid' again:

In [107]:
mid_Brazil_clean=NS_brazil_clean.overlay(EW_brazil_clean,how="union",keep_geom_type=True)
mid_Brazil_clean
Out[107]:
state_name_clean state_name_1 state_code_1 state_name_2 state_code_2 geometry
0 Mato Grosso Mato Grosso BR51 Mato Grosso BR51 MULTIPOLYGON (((4718274.258 8081710.261, 47189...
1 Acre NaN NaN NaN NaN MULTIPOLYGON (((3374472.826 8741412.368, 33746...
2 Alagoas NaN NaN NaN NaN POLYGON ((7038561.661 8972873.722, 7038666.047...
3 Bahia NaN NaN NaN NaN MULTIPOLYGON (((6617580.695 7946157.331, 66181...
4 Piauí NaN NaN NaN NaN POLYGON ((6358809.561 9687374.554, 6358994.184...
5 Rondônia NaN NaN NaN NaN MULTIPOLYGON (((4336911.983 8681201.152, 43372...
6 Sergipe NaN NaN NaN NaN POLYGON ((6758965.277 8905155.665, 6759481.274...
7 Tocantins NaN NaN NaN NaN POLYGON ((5764722.95 8996734.494, 5764264.595 ...
8 NaN Amapá BR16 Amapá BR16 MULTIPOLYGON (((5394219.767 10234053.175, 5394...
9 NaN Distrito Federal BR53 Goiás BR52 MULTIPOLYGON (((5709441.711 8230228.217, 57094...
10 NaN Goiás BR52 Goiás BR52 MULTIPOLYGON (((5715352.084 8093167.787, 57150...
11 NaN Mato Grosso do Sul BR50 Mato Grosso do Sul BR50 MULTIPOLYGON (((4643554.93 7541102.839, 464377...
12 NaN Minas Gerais BR31 Goiás BR52 MULTIPOLYGON (((5488048.073 7931930.787, 54880...
13 NaN Pará BR15 Pará BR15 MULTIPOLYGON (((5670588.737 9923682.938, 56706...
14 NaN Paraná BR41 Paraná BR41 MULTIPOLYGON (((5445907.946 7090332.337, 54456...
15 NaN Paraná BR41 Santa Catarina BR42 MULTIPOLYGON (((5247750.046 7056248.157, 52477...
16 NaN Rio Grande do Sul BR43 Rio Grande do Sul BR43 MULTIPOLYGON (((5390972.988 6769017.466, 53912...
17 NaN Santa Catarina BR42 Paraná BR41 MULTIPOLYGON (((5247761.665 7056252.309, 52476...
18 NaN Santa Catarina BR42 Santa Catarina BR42 MULTIPOLYGON (((5384989.919 6745543.64, 538466...

Notice this UNION operation identifies the intersection in the result. MATO GROSSO is present in both.

However, missing values were created where no intersection exists. Remember, whatever is not in the intersection of these two, would be the result of symmetric difference:

In [108]:
NS_brazil_clean.overlay(EW_brazil_clean,how="symmetric_difference",keep_geom_type=True)
Out[108]:
state_name_clean state_name_1 state_code_1 state_name_2 state_code_2 geometry
0 Acre NaN NaN NaN NaN MULTIPOLYGON (((3374472.826 8741412.368, 33746...
1 Alagoas NaN NaN NaN NaN POLYGON ((7038561.661 8972873.722, 7038666.047...
2 Bahia NaN NaN NaN NaN MULTIPOLYGON (((6617580.695 7946157.331, 66181...
3 Piauí NaN NaN NaN NaN POLYGON ((6358809.561 9687374.554, 6358994.184...
4 Rondônia NaN NaN NaN NaN MULTIPOLYGON (((4336911.983 8681201.152, 43372...
5 Sergipe NaN NaN NaN NaN POLYGON ((6758965.277 8905155.665, 6759481.274...
6 Tocantins NaN NaN NaN NaN POLYGON ((5764722.95 8996734.494, 5764264.595 ...
7 NaN Amapá BR16 Amapá BR16 MULTIPOLYGON (((5394219.767 10234053.175, 5394...
8 NaN Distrito Federal BR53 Goiás BR52 MULTIPOLYGON (((5709441.711 8230228.217, 57094...
9 NaN Goiás BR52 Goiás BR52 MULTIPOLYGON (((5715352.084 8093167.787, 57150...
10 NaN Mato Grosso do Sul BR50 Mato Grosso do Sul BR50 MULTIPOLYGON (((4643554.93 7541102.839, 464377...
11 NaN Minas Gerais BR31 Goiás BR52 MULTIPOLYGON (((5488048.073 7931930.787, 54880...
12 NaN Pará BR15 Pará BR15 MULTIPOLYGON (((5670588.737 9923682.938, 56706...
13 NaN Paraná BR41 Paraná BR41 MULTIPOLYGON (((5445907.946 7090332.337, 54456...
14 NaN Paraná BR41 Santa Catarina BR42 MULTIPOLYGON (((5247750.046 7056248.157, 52477...
15 NaN Rio Grande do Sul BR43 Rio Grande do Sul BR43 MULTIPOLYGON (((5390972.988 6769017.466, 53912...
16 NaN Santa Catarina BR42 Paraná BR41 MULTIPOLYGON (((5247761.665 7056252.309, 52476...
17 NaN Santa Catarina BR42 Santa Catarina BR42 MULTIPOLYGON (((5384989.919 6745543.64, 538466...

As you see, the fact that we have not clean all the geometries affects all these overlay operations.

NEXT STEPS

  • Cleaning spatial data is not trivial. It is time consuming. We could keep working here to improve this.
  • We may need to take this fight to QGis or ArcGis to edit the points manually, it gives you more control, but it will affect replicability.
  • A smart decision would be to get a new map with a better quality - which might also require some money investment.