{ "cells": [ { "cell_type": "markdown", "id": "8b588d8b", "metadata": {}, "source": [ "# Processing tutorial\n", "\n", "The following tutorial steps make up the full workflow for post-processing of raster classifications to a compiled, production-ready inventory of ice marginal lakes.\n", "\n", "This tutorial is available as both a [python script](https://github.com/GEUS-Glaciology-and-Climate/GrIML/blob/main/test/process_with_test_data.py) and [jupyter notebook](https://github.com/GEUS-Glaciology-and-Climate/GrIML/blob/main/tutorial/process_with_test_data.ipynb). We also provide a [Binder environment](https://mybinder.org/v2/gh/GEUS-Glaciology-and-Climate/GrIML/HEAD?urlpath=%2Fdoc%2Ftree%2Ftutorials%2Fdataset_tutorial.ipynb) within which you can run the notebook (with no need to set up the requirements yourself locally).\n", "\n", "First we will begin by importing the GrIML functions we will be using." ] }, { "cell_type": "code", "id": "2f2763bfa65faf2c", "metadata": { "ExecuteTime": { "end_time": "2025-12-18T12:55:39.476357591Z", "start_time": "2025-12-18T12:55:38.947883419Z" } }, "source": [ "# Import GrIML functions for processing\n", "from griml.convert.convert import convert\n", "from griml.filter.filter_vectors import filter_vectors\n", "from griml.merge.merge_vectors import merge_vectors\n", "from griml.metadata.add_metadata import add_metadata" ], "outputs": [], "execution_count": 1 }, { "cell_type": "markdown", "id": "fa12c490", "metadata": {}, "source": [ "## Converting raster classifications to vectors\n", "\n", "GrIML's convert module is used to convert binary raster classifications of water bodies to vectors; where values of 1 in the raster classification denote water has been classified, and zero values denote no water. We need the `convert` function to perform this." ] }, { "cell_type": "markdown", "id": "a60fbcd1", "metadata": {}, "source": [ "We then need to define some input variables - the projection, band information and date range of the input raster. If you have followed the [GEE script for classifying lakes](https://github.com/GEUS-Glaciology-and-Climate/GrIML/blob/main/gee_scripts/lake_classification.js) available through the GrIMl repo, then each outputted raster band represents classifications using one of three approaches:\n", "\n", "1. Multi-spectral classification from Sentinel-2\n", "2. Backscatter threshold classification from Sentinel-1\n", "3. Sink detection from ArcticDEM\n", "\n", "And the default outputted projection is [Polar Stereographic](https://nsidc.org/data/user-resources/help-center/guide-nsidcs-polar-stereographic-projection). The start and end date should match the defined date range used for the raster classifications.\n", "\n", "Then we need to define the location of our raster file. A test raster file is provided with GrIML, which can be located in the top level test directory in the repository." ] }, { "cell_type": "code", "id": "577b5fa8a7529165", "metadata": { "ExecuteTime": { "end_time": "2025-12-18T12:55:39.495710506Z", "start_time": "2025-12-18T12:55:39.478625710Z" } }, "source": [ "# Define projection of input binary raster\n", "proj = 'EPSG:3413'\n", "\n", "# Define band information of binary raster\n", "band_info = [{'b_number':1, 'method':'VIS', 'source':'S2'},\n", " {'b_number':2, 'method':'SAR', 'source':'S1'},\n", " {'b_number':3, 'method':'DEM', 'source':'ARCTICDEM'}]\n", "\n", "# Define start and end dates of acquisitions from which rasters are created\n", "start='20170701'\n", "end='20170831'\n", "\n", "# Define input binary raster\n", "infile = '../test/test_data/test_north_greenland.tif'" ], "outputs": [], "execution_count": 2 }, { "cell_type": "markdown", "id": "e1fa4f33", "metadata": {}, "source": [ "And then the file can be converted from raster to vector classifications using the `convert` function and the input variables.\n", "\n", "If an output directory is not provided then converted vectors will not be written to file.\n", "\n", "Note that a single raster file is wrapped as a list, as the `convert` function expects a list of rasters normally. We recommend using `glob` to generate a list of rasters from converting, if you wish to convert multiple classifications." ] }, { "cell_type": "code", "id": "55abc077bfbe57e4", "metadata": { "ExecuteTime": { "end_time": "2025-12-18T12:58:00.031413978Z", "start_time": "2025-12-18T12:55:39.497625283Z" } }, "source": [ "# Convert binary raster to vectors\n", "out1 = convert([infile], proj, band_info, start, end)\n", "print(out1)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "1. Converting test_north_greenland.tif\n", "Retrieving vectors from band S2...\n", "Retrieving vectors from band S1...\n", "Retrieving vectors from band ARCTICDEM...\n", "[ geometry method source \\\n", "row_id \n", "1 POLYGON ((-240140.000 -742150.000, -240140.000... VIS S2 \n", "2 POLYGON ((-240300.000 -742190.000, -240300.000... VIS S2 \n", "3 POLYGON ((-240170.000 -742190.000, -240170.000... VIS S2 \n", "4 POLYGON ((-241180.000 -742240.000, -241180.000... VIS S2 \n", "5 POLYGON ((-239900.000 -742250.000, -239900.000... VIS S2 \n", "... ... ... ... \n", "702096 POLYGON ((-356570.000 -974130.000, -356560.000... DEM ARCTICDEM \n", "702097 POLYGON ((-352970.000 -974040.000, -352960.000... DEM ARCTICDEM \n", "702098 POLYGON ((-309630.000 -974140.000, -309630.000... DEM ARCTICDEM \n", "702099 POLYGON ((-290390.000 -974120.000, -290370.000... DEM ARCTICDEM \n", "702100 POLYGON ((-263360.000 -974130.000, -263310.000... DEM ARCTICDEM \n", "\n", " startdate enddate \n", "row_id \n", "1 20170701 20170831 \n", "2 20170701 20170831 \n", "3 20170701 20170831 \n", "4 20170701 20170831 \n", "5 20170701 20170831 \n", "... ... ... \n", "702096 20170701 20170831 \n", "702097 20170701 20170831 \n", "702098 20170701 20170831 \n", "702099 20170701 20170831 \n", "702100 20170701 20170831 \n", "\n", "[702100 rows x 5 columns]]\n" ] } ], "execution_count": 3 }, { "cell_type": "markdown", "id": "6fd88d43", "metadata": {}, "source": [ "## Filtering vector classifications\n", "\n", "Vector classifications can be filtered by size and proximity to an inputted mask. In our case, we remove water bodies below a size of 0.05 sq km and further than 1 km from the ice margin (as we want to retain larger ice marginal lakes) with GrIML's `filter_vectors` function.\n", "\n", "GrIML is provided with a test vector file for filtering, and a test ice mask which is an ice margin polygon object with a 1 km buffer.\n", "\n", "The size threshold is 0.05 sq km by default, but this can be altered with the optional `min_area` input variable." ] }, { "cell_type": "code", "id": "af777aea3655663f", "metadata": { "ExecuteTime": { "end_time": "2025-12-18T14:00:28.109730765Z", "start_time": "2025-12-18T12:58:00.033052642Z" } }, "source": [ "# Define input vector dataset\n", "infile1 = '../test/test_data/test_filter.gpkg'\n", "\n", "# Define ice mask for spatial filtering\n", "infile2 = '../test/test_data/test_icemask.gpkg'\n", "\n", "# Filter vectors by ice mask\n", "out2 = filter_vectors([infile1], infile2)\n", "print(out2)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "1/1: Filtering vectors in test_filter.gpkg\n", "1760 features over 0.05 sq km\n", "0 features within 500 m of margin\n", "No vectors present after filter. Moving to next file.\n", "[]\n" ] } ], "execution_count": 4 }, { "cell_type": "markdown", "id": "406aa5ec", "metadata": {}, "source": [ "## Merging\n", "\n", "When covering large areas, the classifications are usually split into different files. At this stage, we will merge all files together using the `merge_vectors` function, to form a complete inventory of ice marginal lake classifications. Test files are provided with GrIML to perform this. In this case, we will merge all vectors from the two files defined previously.\n", "\n", "An output file can be defined in order to write the merged vectors to file if needed. If an output directory is not provided then the merged vectors will not be written to file. To retain the merged vectors in memory, make sure that an output variable is defined" ] }, { "cell_type": "code", "id": "5f5079f57f16711d", "metadata": { "ExecuteTime": { "end_time": "2025-12-18T14:00:31.308856469Z", "start_time": "2025-12-18T14:00:28.123426673Z" } }, "source": [ "# Define vector datasets to merge together\n", "infile1 = '../test/test_data/test_merge_1.gpkg'\n", "infile2 = '../test/test_data/test_merge_2.gpkg'\n", "\n", "# Merge vector datasets\n", "out3 = merge_vectors([infile1,infile2])\n", "print(out3)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " geometry method source \\\n", "row_id \n", "1 POLYGON ((173330.000 -2868710.000, 173340.000 ... VIS S2 \n", "2 POLYGON ((166430.000 -2869010.000, 166440.000 ... VIS S2 \n", "3 POLYGON ((220110.000 -2869650.000, 220140.000 ... VIS S2 \n", "4 POLYGON ((173160.000 -2871100.000, 173190.000 ... VIS S2 \n", "5 POLYGON ((168200.000 -2871450.000, 168270.000 ... VIS S2 \n", "... ... ... ... \n", "935 POLYGON ((99650.000 -3306720.000, 99730.000 -3... DEM ARCTICDEM \n", "936 POLYGON ((100990.000 -3307380.000, 101050.000 ... DEM ARCTICDEM \n", "937 POLYGON ((93900.000 -3307800.000, 93980.000 -3... DEM ARCTICDEM \n", "938 POLYGON ((85630.000 -3308390.000, 85650.000 -3... DEM ARCTICDEM \n", "939 POLYGON ((86200.000 -3308290.000, 86230.000 -3... DEM ARCTICDEM \n", "\n", " startdate enddate area_sqkm length_km \n", "row_id \n", "1 20170701 20170831 0.0531 2.96 \n", "2 20170701 20170831 0.0943 2.94 \n", "3 20170701 20170831 0.1139 4.18 \n", "4 20170701 20170831 0.1597 2.18 \n", "5 20170701 20170831 0.0557 1.26 \n", "... ... ... ... ... \n", "935 20170701 20170831 0.0717 1.24 \n", "936 20170701 20170831 0.0514 1.16 \n", "937 20170701 20170831 0.7569 5.46 \n", "938 20170701 20170831 0.1487 2.52 \n", "939 20170701 20170831 0.3802 4.20 \n", "\n", "[4661 rows x 7 columns]\n" ] } ], "execution_count": 5 }, { "cell_type": "markdown", "id": "bb3ae5ef", "metadata": {}, "source": [ "## Adding metadata\n", "\n", "Metadata can be added to the inventory with GrIML's `metadata.add_metadata` function. This includes:\n", "\n", "- Adding a classification certainty value\n", "- Assigning an identification number per lake\n", "- Assigning a placename to each lake\n", "- Assigning a region to each lake\n", "- Assigning a list of all classification sources to each lake\n", "\n", "Input files are needed for assigning a placename and a region to each lake. The placename file is a point vector file containing all placenames for a region. We use the placename database from [Oqaasileriffik](https://oqaasileriffik.gl/en/), the Language Secretariat of Greenland, for which there is an example data subset provided with GrIML. The region file is a polygon vector file containing all regions and their names. We use the Greenland Ice Sheet drainage basin regions as defined by [Mouginot and Rignot, (2019)](https://doi.org/10.7280/D1WT11), a dataset which is provided with GrIML." ] }, { "cell_type": "code", "id": "49b3b901d44b2c97", "metadata": { "ExecuteTime": { "end_time": "2025-12-18T14:00:43.943926991Z", "start_time": "2025-12-18T14:00:31.313514707Z" } }, "source": [ "# Define vector dataset\n", "infile1 = '../test/test_data/test_merge_2.gpkg'\n", "\n", "# Define placenames vector dataset\n", "infile2 = '../test/test_data/test_placenames.gpkg'\n", "\n", "# Define ice sheet basins dataset\n", "infile3 = '../test/test_data/greenland_basins_polarstereo.gpkg'\n", "\n", "# Add metadata to vector dataset\n", "out4 = add_metadata(infile1, infile2, infile3)\n", "print(out4)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Assigning ID...\n", "Assigning sources...\n", "Assigning certainty scores...\n", "Assigning regions...\n", "Assigning placenames...\n", " row_id method source startdate enddate area_sqkm length_km \\\n", "0 284 VIS S2 20170701 20170831 0.1738 3.10 \n", "1 216489 SAR S1 20170701 20170831 0.1468 2.26 \n", "2 216484 SAR S1 20170701 20170831 0.2597 2.88 \n", "3 386 VIS S2 20170701 20170831 0.3158 3.28 \n", "4 390 VIS S2 20170701 20170831 0.1271 2.64 \n", ".. ... ... ... ... ... ... ... \n", "934 354174 DEM ARCTICDEM 20170701 20170831 0.1529 2.34 \n", "935 354221 DEM ARCTICDEM 20170701 20170831 0.1630 2.04 \n", "936 354335 DEM ARCTICDEM 20170701 20170831 0.0717 1.24 \n", "937 354362 DEM ARCTICDEM 20170701 20170831 0.0514 1.16 \n", "938 354409 DEM ARCTICDEM 20170701 20170831 0.7569 5.46 \n", "\n", " MaxSimpTol geometry lake_id \\\n", "0 10.0 POLYGON ((46610.000 -3247430.000, 46770.000 -3... 1 \n", "1 10.0 POLYGON ((47060.000 -3247430.000, 47440.000 -3... 1 \n", "2 10.0 POLYGON ((99340.000 -3247430.000, 100370.000 -... 2 \n", "3 10.0 POLYGON ((99310.000 -3247430.000, 100410.000 -... 2 \n", "4 10.0 POLYGON ((87580.000 -3247540.000, 87590.000 -3... 3 \n", ".. ... ... ... \n", "934 10.0 POLYGON ((79080.000 -3301490.000, 79160.000 -3... 660 \n", "935 10.0 POLYGON ((60690.000 -3303050.000, 60760.000 -3... 661 \n", "936 10.0 POLYGON ((99650.000 -3306720.000, 99730.000 -3... 662 \n", "937 10.0 POLYGON ((100990.000 -3307380.000, 101050.000 ... 663 \n", "938 10.0 POLYGON ((93900.000 -3307800.000, 93980.000 -3... 664 \n", "\n", " all_src num_src certainty subregion placename \n", "0 S1, ARCTICDEM, S2 3 1.0 SE Unknown \n", "1 S1, ARCTICDEM, S2 3 1.0 SE Unknown \n", "2 S1, ARCTICDEM, S2 3 1.0 SE Unknown \n", "3 S1, ARCTICDEM, S2 3 1.0 SE Unknown \n", "4 S1, ARCTICDEM, S2 3 1.0 SE Unknown \n", ".. ... ... ... ... ... \n", "934 S1, ARCTICDEM, S2 3 1.0 SE Unknown \n", "935 S1, ARCTICDEM, S2 3 1.0 SE Sermeerunnerit \n", "936 S1, ARCTICDEM, S2 3 1.0 SE Unknown \n", "937 S1, ARCTICDEM, S2 3 1.0 SE Unknown \n", "938 S1, ARCTICDEM, S2 3 1.0 SE Sermeq Kujalleq \n", "\n", "[939 rows x 15 columns]\n" ] } ], "execution_count": 6 } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.2" } }, "nbformat": 4, "nbformat_minor": 5 }