Processing tutorial

The following tutorial steps make up the full workflow for post-processing of raster classifications to a compiled, production-ready inventory of ice marginal lakes.

This tutorial is available as both a python script and jupyter notebook. We also provide a Binder environment within which you can run the notebook (with no need to set up the requirements yourself locally).

First we will begin by importing the GrIML functions we will be using.

[1]:
# Import GrIML functions for processing
from griml.convert.convert import convert
from griml.filter.filter_vectors import filter_vectors
from griml.merge.merge_vectors import merge_vectors
from griml.metadata.add_metadata import add_metadata

Converting raster classifications to vectors

GrIML’s convert module is used to convert binary raster classifications of water bodies to vectors; where values of 1 in the raster classification denote water has been classified, and zero values denote no water. We need the convert function to perform this.

We then need to define some input variables - the projection, band information and date range of the input raster. If you have followed the GEE script for classifying lakes available through the GrIMl repo, then each outputted raster band represents classifications using one of three approaches:

  1. Multi-spectral classification from Sentinel-2

  2. Backscatter threshold classification from Sentinel-1

  3. Sink detection from ArcticDEM

And the default outputted projection is Polar Stereographic. The start and end date should match the defined date range used for the raster classifications.

Then we need to define the location of our raster file. A test raster file is provided with GrIML, which can be located in the top level test directory in the repository.

[2]:
# Define projection of input binary raster
proj = 'EPSG:3413'

# Define band information of binary raster
band_info = [{'b_number':1, 'method':'VIS', 'source':'S2'},
            {'b_number':2, 'method':'SAR', 'source':'S1'},
            {'b_number':3, 'method':'DEM', 'source':'ARCTICDEM'}]

# Define start and end dates of acquisitions from which rasters are created
start='20170701'
end='20170831'

# Define input binary raster
infile = '../test/test_data/test_north_greenland.tif'

And then the file can be converted from raster to vector classifications using the convert function and the input variables.

If an output directory is not provided then converted vectors will not be written to file.

Note that a single raster file is wrapped as a list, as the convert function expects a list of rasters normally. We recommend using glob to generate a list of rasters from converting, if you wish to convert multiple classifications.

[3]:
# Convert binary raster to vectors
out1 = convert([infile], proj, band_info, start, end)
print(out1)

1. Converting test_north_greenland.tif
Retrieving vectors from band S2...
Retrieving vectors from band S1...
Retrieving vectors from band ARCTICDEM...
[                                                 geometry method     source  \
row_id
1       POLYGON ((-240140.000 -742150.000, -240140.000...    VIS         S2
2       POLYGON ((-240300.000 -742190.000, -240300.000...    VIS         S2
3       POLYGON ((-240170.000 -742190.000, -240170.000...    VIS         S2
4       POLYGON ((-241180.000 -742240.000, -241180.000...    VIS         S2
5       POLYGON ((-239900.000 -742250.000, -239900.000...    VIS         S2
...                                                   ...    ...        ...
702096  POLYGON ((-356570.000 -974130.000, -356560.000...    DEM  ARCTICDEM
702097  POLYGON ((-352970.000 -974040.000, -352960.000...    DEM  ARCTICDEM
702098  POLYGON ((-309630.000 -974140.000, -309630.000...    DEM  ARCTICDEM
702099  POLYGON ((-290390.000 -974120.000, -290370.000...    DEM  ARCTICDEM
702100  POLYGON ((-263360.000 -974130.000, -263310.000...    DEM  ARCTICDEM

       startdate   enddate
row_id
1       20170701  20170831
2       20170701  20170831
3       20170701  20170831
4       20170701  20170831
5       20170701  20170831
...          ...       ...
702096  20170701  20170831
702097  20170701  20170831
702098  20170701  20170831
702099  20170701  20170831
702100  20170701  20170831

[702100 rows x 5 columns]]

Filtering vector classifications

Vector classifications can be filtered by size and proximity to an inputted mask. In our case, we remove water bodies below a size of 0.05 sq km and further than 1 km from the ice margin (as we want to retain larger ice marginal lakes) with GrIML’s filter_vectors function.

GrIML is provided with a test vector file for filtering, and a test ice mask which is an ice margin polygon object with a 1 km buffer.

The size threshold is 0.05 sq km by default, but this can be altered with the optional min_area input variable.

[4]:
# Define input vector dataset
infile1 = '../test/test_data/test_filter.gpkg'

# Define ice mask for spatial filtering
infile2 = '../test/test_data/test_icemask.gpkg'

# Filter vectors by ice mask
out2 = filter_vectors([infile1], infile2)
print(out2)

1/1: Filtering vectors in test_filter.gpkg
1760 features over 0.05 sq km
0 features within 500 m of margin
No vectors present after filter. Moving to next file.
[]

Merging

When covering large areas, the classifications are usually split into different files. At this stage, we will merge all files together using the merge_vectors function, to form a complete inventory of ice marginal lake classifications. Test files are provided with GrIML to perform this. In this case, we will merge all vectors from the two files defined previously.

An output file can be defined in order to write the merged vectors to file if needed. If an output directory is not provided then the merged vectors will not be written to file. To retain the merged vectors in memory, make sure that an output variable is defined

[5]:
# Define vector datasets to merge together
infile1 = '../test/test_data/test_merge_1.gpkg'
infile2 = '../test/test_data/test_merge_2.gpkg'

# Merge vector datasets
out3 = merge_vectors([infile1,infile2])
print(out3)
                                                 geometry method     source  \
row_id
1       POLYGON ((173330.000 -2868710.000, 173340.000 ...    VIS         S2
2       POLYGON ((166430.000 -2869010.000, 166440.000 ...    VIS         S2
3       POLYGON ((220110.000 -2869650.000, 220140.000 ...    VIS         S2
4       POLYGON ((173160.000 -2871100.000, 173190.000 ...    VIS         S2
5       POLYGON ((168200.000 -2871450.000, 168270.000 ...    VIS         S2
...                                                   ...    ...        ...
935     POLYGON ((99650.000 -3306720.000, 99730.000 -3...    DEM  ARCTICDEM
936     POLYGON ((100990.000 -3307380.000, 101050.000 ...    DEM  ARCTICDEM
937     POLYGON ((93900.000 -3307800.000, 93980.000 -3...    DEM  ARCTICDEM
938     POLYGON ((85630.000 -3308390.000, 85650.000 -3...    DEM  ARCTICDEM
939     POLYGON ((86200.000 -3308290.000, 86230.000 -3...    DEM  ARCTICDEM

       startdate   enddate  area_sqkm  length_km
row_id
1       20170701  20170831     0.0531       2.96
2       20170701  20170831     0.0943       2.94
3       20170701  20170831     0.1139       4.18
4       20170701  20170831     0.1597       2.18
5       20170701  20170831     0.0557       1.26
...          ...       ...        ...        ...
935     20170701  20170831     0.0717       1.24
936     20170701  20170831     0.0514       1.16
937     20170701  20170831     0.7569       5.46
938     20170701  20170831     0.1487       2.52
939     20170701  20170831     0.3802       4.20

[4661 rows x 7 columns]

Adding metadata

Metadata can be added to the inventory with GrIML’s metadata.add_metadata function. This includes:

  • Adding a classification certainty value

  • Assigning an identification number per lake

  • Assigning a placename to each lake

  • Assigning a region to each lake

  • Assigning a list of all classification sources to each lake

Input files are needed for assigning a placename and a region to each lake. The placename file is a point vector file containing all placenames for a region. We use the placename database from Oqaasileriffik, the Language Secretariat of Greenland, for which there is an example data subset provided with GrIML. The region file is a polygon vector file containing all regions and their names. We use the Greenland Ice Sheet drainage basin regions as defined by Mouginot and Rignot, (2019), a dataset which is provided with GrIML.

[6]:
# Define vector dataset
infile1 = '../test/test_data/test_merge_2.gpkg'

# Define placenames vector dataset
infile2 = '../test/test_data/test_placenames.gpkg'

# Define ice sheet basins dataset
infile3 = '../test/test_data/greenland_basins_polarstereo.gpkg'

# Add metadata to vector dataset
out4 = add_metadata(infile1, infile2, infile3)
print(out4)
Assigning ID...
Assigning sources...
Assigning certainty scores...
Assigning regions...
Assigning placenames...
     row_id method     source startdate   enddate  area_sqkm  length_km  \
0       284    VIS         S2  20170701  20170831     0.1738       3.10
1    216489    SAR         S1  20170701  20170831     0.1468       2.26
2    216484    SAR         S1  20170701  20170831     0.2597       2.88
3       386    VIS         S2  20170701  20170831     0.3158       3.28
4       390    VIS         S2  20170701  20170831     0.1271       2.64
..      ...    ...        ...       ...       ...        ...        ...
934  354174    DEM  ARCTICDEM  20170701  20170831     0.1529       2.34
935  354221    DEM  ARCTICDEM  20170701  20170831     0.1630       2.04
936  354335    DEM  ARCTICDEM  20170701  20170831     0.0717       1.24
937  354362    DEM  ARCTICDEM  20170701  20170831     0.0514       1.16
938  354409    DEM  ARCTICDEM  20170701  20170831     0.7569       5.46

     MaxSimpTol                                           geometry  lake_id  \
0          10.0  POLYGON ((46610.000 -3247430.000, 46770.000 -3...        1
1          10.0  POLYGON ((47060.000 -3247430.000, 47440.000 -3...        1
2          10.0  POLYGON ((99340.000 -3247430.000, 100370.000 -...        2
3          10.0  POLYGON ((99310.000 -3247430.000, 100410.000 -...        2
4          10.0  POLYGON ((87580.000 -3247540.000, 87590.000 -3...        3
..          ...                                                ...      ...
934        10.0  POLYGON ((79080.000 -3301490.000, 79160.000 -3...      660
935        10.0  POLYGON ((60690.000 -3303050.000, 60760.000 -3...      661
936        10.0  POLYGON ((99650.000 -3306720.000, 99730.000 -3...      662
937        10.0  POLYGON ((100990.000 -3307380.000, 101050.000 ...      663
938        10.0  POLYGON ((93900.000 -3307800.000, 93980.000 -3...      664

               all_src  num_src  certainty subregion        placename
0    S1, ARCTICDEM, S2        3        1.0        SE          Unknown
1    S1, ARCTICDEM, S2        3        1.0        SE          Unknown
2    S1, ARCTICDEM, S2        3        1.0        SE          Unknown
3    S1, ARCTICDEM, S2        3        1.0        SE          Unknown
4    S1, ARCTICDEM, S2        3        1.0        SE          Unknown
..                 ...      ...        ...       ...              ...
934  S1, ARCTICDEM, S2        3        1.0        SE          Unknown
935  S1, ARCTICDEM, S2        3        1.0        SE   Sermeerunnerit
936  S1, ARCTICDEM, S2        3        1.0        SE          Unknown
937  S1, ARCTICDEM, S2        3        1.0        SE          Unknown
938  S1, ARCTICDEM, S2        3        1.0        SE  Sermeq Kujalleq

[939 rows x 15 columns]