Using custom input data

The GrIML package can be used with custom input data. This guide will outline what type of data can be used, and how it can be integrated.

What type of features can be classified using GrIML?

GrIML has primarily been created to classify ice-marginal lakes across Greenland from satellite imagery. An ice-marginal lake is a water body in contact with an ice sheet, ice cap or glacier. It can be classified based on the following criteria:

  • Size (in this case, an areal footprint extent)

  • Proximity to an ice margin

GrIML has primarily been developed so it can continue to be used for producing the Greenland ice-marginal lake inventory. In addition, a big motivation for its design and documentation is so that others can use its functionality to produce ice-marginal lakes across other regions of the world.

Other cryosphere features may be classified using the criteria defined in GrIML, such as supraglacial lakes (which are water bodies present on the ice surface). Therefore, GrIML is not just limited to classifying one type of feature and we encourage users to explore GrIML’s functionality and classification criteria to assess whether it is suitable for your needs.

What type of input data can be used?

Input data has to be a raster format file, such as a .tif. This format should be compatible with rasterio’s open functionality, which is used in GrIML to open the dataset. The function can be used to test that the input data is valid. Here, we will open a test data file over a section of North Greenland, which is provided in the GrIML repo.

[7]:
import rasterio

infile = '../test/test_data/test_north_greenland.tif'

with rasterio.open(infile) as dataset:
    print(dataset.profile)
{'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 37888, 'height': 37888, 'count': 3, 'crs': CRS.from_wkt('PROJCS["WGS 84 / NSIDC Sea Ice Polar Stereographic North",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",70],PARAMETER["central_meridian",-45],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",SOUTH],AXIS["Northing",SOUTH],AUTHORITY["EPSG","3413"]]'), 'transform': Affine(10.0, 0.0, -406590.0,
       0.0, -10.0, -595270.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzw', 'interleave': 'pixel'}

The input data should be mapped across a uniform projection, such as a coordinate system associated with a satellite image or orthophoto. This coordinate system should be a standardised reference system. The rasterio package can be used to check that a defined projection string is valid. In the example here, we will check to see if EPSG:3005 is a valid projection name.

[8]:
from rasterio.crs import CRS

crs = CRS.from_string("EPSG:3005")
print(crs)
EPSG:3005

The input data should contain a binary classification of features, with values of 0 signifying no classification and 1 signifying a positive classification.

[9]:
infile = '../test/test_data/test_north_greenland.tif'

src = rasterio.open(infile)
print(src)
<open DatasetReader name='../test/test_data/test_north_greenland.tif' mode='r'>

You can plot the binary dataset using matplotlib. In this example, the test file is too big to plot and show so we will first clip the dataset to a defined window and plot this instead. This is purely for a visual check of the dataset, if needed.

[10]:
import matplotlib.pyplot as plt
from rasterio.windows import Window

with rasterio.open(infile) as src:

    # Define a window (start_x, start_y, width, height)
    window = Window(col_off=10000, row_off=10000, width=1000, height=1000)

    # Read band 1 from the window
    data = src.read(1, window=window)

# Plot the clipped data
plt.imshow(data, cmap='gray')
plt.title("Clipped Window of Band 1")
plt.colorbar()
plt.show()
../_images/tutorials_using_custom_input_data_9_0.png

Example input data

If you have followed the GEE script for classifying ice-marginal lakes available through the GrIML repo, then the returned dataset is a .tif raster with coordinates in a defined projection. This dataset is from Greenland, with a projection in EPSG:3413 Polar Stereographic. The returned dataset consists of three bands representing binary classification using the following approaches:

  • Multi-spectral classification from Sentinel-2

  • Backscatter threshold classification from Sentinel-1

  • Sink detection from ArcticDEM

Here we will open a test data file over a section of North Greenland, processed using the GEE script mentioned above. This test data file is provided in the GrIML repo. Let’s first define the location of this raster file, which is located in the top level test directory in the repository.

[14]:
# Define input binary raster
infile = '../test/test_data/test_north_greenland.tif'
print(infile)
../test/test_data/test_north_greenland.tif

The GrIML processing workflow starts by converting a binary raster classification to vectors, so first let’s import this function.

[15]:
from griml.convert import convert

The projection and band information is needed to perform the vector conversion. The projection is defined as a string with the EPSG reference name. The band information is provided as a list of dictionaries. In this example, each band signifies a binary classification using each of the classification methods. If the image only consists of one band then a list still needs to be provided, but should only contain one dictionary with the relevant information.

[16]:
# Define projection of input binary raster
proj = 'EPSG:3413'

# Define band information of binary raster
band_info = [{'b_number':1, 'method':'VIS', 'source':'S2'},
            {'b_number':2, 'method':'SAR', 'source':'S1'},
            {'b_number':3, 'method':'DEM', 'source':'ARCTICDEM'}]

print(proj)
print(band_info)
EPSG:3413
[{'b_number': 1, 'method': 'VIS', 'source': 'S2'}, {'b_number': 2, 'method': 'SAR', 'source': 'S1'}, {'b_number': 3, 'method': 'DEM', 'source': 'ARCTICDEM'}]

Along with this, the acquisition time range is needed. In this example, classifications were made from averaged satellite imagery acquired between 01-07-2017 and 31-08-2017. In the case that classifications are defined from one image/scene, the start and end time range should be defined as the same date.

[17]:
# Define start and end dates of acquisitions from which rasters are created
start='20170701'
end='20170831'

print(start)
print(end)
20170701
20170831

The file can be converted from raster to vector classifications using the convert function and the input variables.

Note that a single raster file is wrapped as a list, as the convert function expects a list of rasters normally. We recommend using glob to generate a list of rasters from converting, if you wish to convert multiple classifications.

[18]:
# Convert binary raster to vectors
out1 = convert([infile], proj, band_info, start, end)
print(out1)

1. Converting test_north_greenland.tif
Retrieving vectors from band S2...
Retrieving vectors from band S1...
Retrieving vectors from band ARCTICDEM...
[                                                 geometry method     source  \
row_id
1       POLYGON ((-240140 -742150, -240140 -742190, -2...    VIS         S2
2       POLYGON ((-240300 -742190, -240300 -742200, -2...    VIS         S2
3       POLYGON ((-240170 -742190, -240170 -742220, -2...    VIS         S2
4       POLYGON ((-241180 -742240, -241180 -742260, -2...    VIS         S2
5       POLYGON ((-239900 -742250, -239900 -742260, -2...    VIS         S2
...                                                   ...    ...        ...
702096  POLYGON ((-356570 -974130, -356560 -974130, -3...    DEM  ARCTICDEM
702097  POLYGON ((-352970 -974040, -352960 -974040, -3...    DEM  ARCTICDEM
702098  POLYGON ((-309630 -974140, -309630 -974150, -3...    DEM  ARCTICDEM
702099  POLYGON ((-290390 -974120, -290370 -974120, -2...    DEM  ARCTICDEM
702100  POLYGON ((-263360 -974130, -263310 -974130, -2...    DEM  ARCTICDEM

       startdate   enddate
row_id
1       20170701  20170831
2       20170701  20170831
3       20170701  20170831
4       20170701  20170831
5       20170701  20170831
...          ...       ...
702096  20170701  20170831
702097  20170701  20170831
702098  20170701  20170831
702099  20170701  20170831
702100  20170701  20170831

[702100 rows x 5 columns]]