Processing tutorial
The following tutorial steps make up the full workflow for post-processing of raster classifications to a compiled, production-ready inventory of ice marginal lakes.
This tutorial is available as both a python script and jupyter notebook. We also provide a Binder environment within which you can run the notebook (with no need to set up the requirements yourself locally).
First we will begin by importing the GrIML functions we will be using.
[1]:
# Import GrIML functions for processing
from griml.convert.convert import convert
from griml.filter.filter_vectors import filter_vectors
from griml.merge.merge_vectors import merge_vectors
from griml.metadata.add_metadata import add_metadata
Converting raster classifications to vectors
GrIML’s convert module is used to convert binary raster classifications of water bodies to vectors; where values of 1 in the raster classification denote water has been classified, and zero values denote no water. We need the convert function to perform this.
We then need to define some input variables - the projection, band information and date range of the input raster. If you have followed the GEE script for classifying lakes available through the GrIMl repo, then each outputted raster band represents classifications using one of three approaches:
Multi-spectral classification from Sentinel-2
Backscatter threshold classification from Sentinel-1
Sink detection from ArcticDEM
And the default outputted projection is Polar Stereographic. The start and end date should match the defined date range used for the raster classifications.
Then we need to define the location of our raster file. A test raster file is provided with GrIML, which can be located in the top level test directory in the repository.
[2]:
# Define projection of input binary raster
proj = 'EPSG:3413'
# Define band information of binary raster
band_info = [{'b_number':1, 'method':'VIS', 'source':'S2'},
{'b_number':2, 'method':'SAR', 'source':'S1'},
{'b_number':3, 'method':'DEM', 'source':'ARCTICDEM'}]
# Define start and end dates of acquisitions from which rasters are created
start='20170701'
end='20170831'
# Define input binary raster
infile = '../test/test_data/test_north_greenland.tif'
And then the file can be converted from raster to vector classifications using the convert function and the input variables.
If an output directory is not provided then converted vectors will not be written to file.
Note that a single raster file is wrapped as a list, as the convert function expects a list of rasters normally. We recommend using glob to generate a list of rasters from converting, if you wish to convert multiple classifications.
[3]:
# Convert binary raster to vectors
out1 = convert([infile], proj, band_info, start, end)
print(out1)
1. Converting test_north_greenland.tif
Retrieving vectors from band S2...
Retrieving vectors from band S1...
Retrieving vectors from band ARCTICDEM...
[ geometry method source \
row_id
1 POLYGON ((-240140.000 -742150.000, -240140.000... VIS S2
2 POLYGON ((-240300.000 -742190.000, -240300.000... VIS S2
3 POLYGON ((-240170.000 -742190.000, -240170.000... VIS S2
4 POLYGON ((-241180.000 -742240.000, -241180.000... VIS S2
5 POLYGON ((-239900.000 -742250.000, -239900.000... VIS S2
... ... ... ...
702096 POLYGON ((-356570.000 -974130.000, -356560.000... DEM ARCTICDEM
702097 POLYGON ((-352970.000 -974040.000, -352960.000... DEM ARCTICDEM
702098 POLYGON ((-309630.000 -974140.000, -309630.000... DEM ARCTICDEM
702099 POLYGON ((-290390.000 -974120.000, -290370.000... DEM ARCTICDEM
702100 POLYGON ((-263360.000 -974130.000, -263310.000... DEM ARCTICDEM
startdate enddate
row_id
1 20170701 20170831
2 20170701 20170831
3 20170701 20170831
4 20170701 20170831
5 20170701 20170831
... ... ...
702096 20170701 20170831
702097 20170701 20170831
702098 20170701 20170831
702099 20170701 20170831
702100 20170701 20170831
[702100 rows x 5 columns]]
Filtering vector classifications
Vector classifications can be filtered by size and proximity to an inputted mask. In our case, we remove water bodies below a size of 0.05 sq km and further than 1 km from the ice margin (as we want to retain larger ice marginal lakes) with GrIML’s filter_vectors function.
GrIML is provided with a test vector file for filtering, and a test ice mask which is an ice margin polygon object with a 1 km buffer.
The size threshold is 0.05 sq km by default, but this can be altered with the optional min_area input variable.
[4]:
# Define input vector dataset
infile1 = '../test/test_data/test_filter.gpkg'
# Define ice mask for spatial filtering
infile2 = '../test/test_data/test_icemask.gpkg'
# Filter vectors by ice mask
out2 = filter_vectors([infile1], infile2)
print(out2)
1/1: Filtering vectors in test_filter.gpkg
1760 features over 0.05 sq km
0 features within 500 m of margin
No vectors present after filter. Moving to next file.
[]
Merging
When covering large areas, the classifications are usually split into different files. At this stage, we will merge all files together using the merge_vectors function, to form a complete inventory of ice marginal lake classifications. Test files are provided with GrIML to perform this. In this case, we will merge all vectors from the two files defined previously.
An output file can be defined in order to write the merged vectors to file if needed. If an output directory is not provided then the merged vectors will not be written to file. To retain the merged vectors in memory, make sure that an output variable is defined
[5]:
# Define vector datasets to merge together
infile1 = '../test/test_data/test_merge_1.gpkg'
infile2 = '../test/test_data/test_merge_2.gpkg'
# Merge vector datasets
out3 = merge_vectors([infile1,infile2])
print(out3)
geometry method source \
row_id
1 POLYGON ((173330.000 -2868710.000, 173340.000 ... VIS S2
2 POLYGON ((166430.000 -2869010.000, 166440.000 ... VIS S2
3 POLYGON ((220110.000 -2869650.000, 220140.000 ... VIS S2
4 POLYGON ((173160.000 -2871100.000, 173190.000 ... VIS S2
5 POLYGON ((168200.000 -2871450.000, 168270.000 ... VIS S2
... ... ... ...
935 POLYGON ((99650.000 -3306720.000, 99730.000 -3... DEM ARCTICDEM
936 POLYGON ((100990.000 -3307380.000, 101050.000 ... DEM ARCTICDEM
937 POLYGON ((93900.000 -3307800.000, 93980.000 -3... DEM ARCTICDEM
938 POLYGON ((85630.000 -3308390.000, 85650.000 -3... DEM ARCTICDEM
939 POLYGON ((86200.000 -3308290.000, 86230.000 -3... DEM ARCTICDEM
startdate enddate area_sqkm length_km
row_id
1 20170701 20170831 0.0531 2.96
2 20170701 20170831 0.0943 2.94
3 20170701 20170831 0.1139 4.18
4 20170701 20170831 0.1597 2.18
5 20170701 20170831 0.0557 1.26
... ... ... ... ...
935 20170701 20170831 0.0717 1.24
936 20170701 20170831 0.0514 1.16
937 20170701 20170831 0.7569 5.46
938 20170701 20170831 0.1487 2.52
939 20170701 20170831 0.3802 4.20
[4661 rows x 7 columns]
Adding metadata
Metadata can be added to the inventory with GrIML’s metadata.add_metadata function. This includes:
Adding a classification certainty value
Assigning an identification number per lake
Assigning a placename to each lake
Assigning a region to each lake
Assigning a list of all classification sources to each lake
Input files are needed for assigning a placename and a region to each lake. The placename file is a point vector file containing all placenames for a region. We use the placename database from Oqaasileriffik, the Language Secretariat of Greenland, for which there is an example data subset provided with GrIML. The region file is a polygon vector file containing all regions and their names. We use the Greenland Ice Sheet drainage basin regions as defined by Mouginot and Rignot, (2019), a dataset which is provided with GrIML.
[6]:
# Define vector dataset
infile1 = '../test/test_data/test_merge_2.gpkg'
# Define placenames vector dataset
infile2 = '../test/test_data/test_placenames.gpkg'
# Define ice sheet basins dataset
infile3 = '../test/test_data/greenland_basins_polarstereo.gpkg'
# Add metadata to vector dataset
out4 = add_metadata(infile1, infile2, infile3)
print(out4)
Assigning ID...
Assigning sources...
Assigning certainty scores...
Assigning regions...
Assigning placenames...
row_id method source startdate enddate area_sqkm length_km \
0 284 VIS S2 20170701 20170831 0.1738 3.10
1 216489 SAR S1 20170701 20170831 0.1468 2.26
2 216484 SAR S1 20170701 20170831 0.2597 2.88
3 386 VIS S2 20170701 20170831 0.3158 3.28
4 390 VIS S2 20170701 20170831 0.1271 2.64
.. ... ... ... ... ... ... ...
934 354174 DEM ARCTICDEM 20170701 20170831 0.1529 2.34
935 354221 DEM ARCTICDEM 20170701 20170831 0.1630 2.04
936 354335 DEM ARCTICDEM 20170701 20170831 0.0717 1.24
937 354362 DEM ARCTICDEM 20170701 20170831 0.0514 1.16
938 354409 DEM ARCTICDEM 20170701 20170831 0.7569 5.46
MaxSimpTol geometry lake_id \
0 10.0 POLYGON ((46610.000 -3247430.000, 46770.000 -3... 1
1 10.0 POLYGON ((47060.000 -3247430.000, 47440.000 -3... 1
2 10.0 POLYGON ((99340.000 -3247430.000, 100370.000 -... 2
3 10.0 POLYGON ((99310.000 -3247430.000, 100410.000 -... 2
4 10.0 POLYGON ((87580.000 -3247540.000, 87590.000 -3... 3
.. ... ... ...
934 10.0 POLYGON ((79080.000 -3301490.000, 79160.000 -3... 660
935 10.0 POLYGON ((60690.000 -3303050.000, 60760.000 -3... 661
936 10.0 POLYGON ((99650.000 -3306720.000, 99730.000 -3... 662
937 10.0 POLYGON ((100990.000 -3307380.000, 101050.000 ... 663
938 10.0 POLYGON ((93900.000 -3307800.000, 93980.000 -3... 664
all_src num_src certainty subregion placename
0 S1, ARCTICDEM, S2 3 1.0 SE Unknown
1 S1, ARCTICDEM, S2 3 1.0 SE Unknown
2 S1, ARCTICDEM, S2 3 1.0 SE Unknown
3 S1, ARCTICDEM, S2 3 1.0 SE Unknown
4 S1, ARCTICDEM, S2 3 1.0 SE Unknown
.. ... ... ... ... ...
934 S1, ARCTICDEM, S2 3 1.0 SE Unknown
935 S1, ARCTICDEM, S2 3 1.0 SE Sermeerunnerit
936 S1, ARCTICDEM, S2 3 1.0 SE Unknown
937 S1, ARCTICDEM, S2 3 1.0 SE Unknown
938 S1, ARCTICDEM, S2 3 1.0 SE Sermeq Kujalleq
[939 rows x 15 columns]