1 Background information
The Irish Space Agency has launched Aigean, an Earth observation satellite to monitor an area around Lough Ree. Recently, rainfall has decreased in the area, and during the latest years droughts have become more frequent and more severe. With the instruments on board Aigean the scientific community will be able to obtain better data about the water levels and the erosion of the land, and therefore will be able to generate more accurate predictions.
However, the Irish Space Agency sadly hasn’t provided any software tools to do this analysis!
Thankfully, Aoife O’Callaghan, a geology PhD student at the Athlone City Institute, has set the objective to solve this problem by creating an open-source package to analyse Aigean data. Aoife has some ideas of what she would like the package to do, but she doesn’t have a research software development background beyond how to install and use Python libraries. That’s why Aoife has contacted you!
You and your group members agree this is a great tool to offer to the community and have decided to put all your brains together to come up with an easy-to-use Python library to analyse and visualise Aigean satellite data.
What do we know? What do we have? What do we want?
- Aigean has multiple instruments, as an starting point we only need to focus on the imagers and the radar.
- There are three imagers on board of the spacecraft. Their only differences are in their resolution (how much area they cover per pixel) and their field-of-view (how much they can see in a single image).
- The three imagers are called: Lir, Manannan and Fand.
- Lir has the largest field-of-view, but the smaller resolution with a pixel size of 20 m per pixel;
- Manannan provides a smaller field-of-view with a better resolution of 10 m per pixel; and
- Fand has the smallest field-of-view but a very high resolution of 1 m per pixel.
- The radar is called Ecne and it provides three measurements for the deepest areas in the region.
- Each instrument provides data in a different format, but the imagers share a common set of metadata.
- A number of images are taken every day, however not all the land is fully covered in a single day, it depends on the satellite orbits. Ecne, however, takes always measurements of the same points.
- All the data is available at the Irish Space Agency webservice archive.
- The Python library – aigeanpy – should be able to query, download, open, process and visualise the satellite images.
- We want to create three command line tools to provide access to some functionality from outside Python.
- We have a script from a post-doc of Aoife’s group that implements the so-called k-means algorithm for clustering data points. We want to include it in our library too! It will help people to analyse different land areas based in their parameters.
- We are also interested on how to make our code, specifically the k-means algorithm, more efffficient.
This will be used to analyse Ecne’s data.
- We want this tool to be used by any researcher, so it needs to be easy to install and use. This includes having good documentation about how to use it and how to acknowledge it in the publications that benefit from it.
- And we also want to make it easier to others to contribute so we need to provide information about how we would like others to contribute.
Let’s look at what we’ve got access to already:
1.1 The data archive webservice
The Irish Space Agency data archive is located at: https://dokku-app.dokku.arc.ucl.ac.uk/isa-archive/ and their main page provides some information about how to query this service.
The website offers two services. One is used to query the catalogue, and the other to download a file from the archive.
The results from the query service are provided as JSON files with the properties of the observations found in the specified time range (and instruments). These files include information about the date and time of the observations, the instrument used, the field of view observed and the filename where that observation is stored. We can download that files using the filename as an argument to the download service. The format from the observation files vary depending on the instrument (specified in the following section).
Read the information on the archive website to understand how to query the service, what parameters are accepted and what are the defaults.
We need to create a set of tools within the Python package to query and download the files. They need to be available from aigeanpy.net.query_isa and aigeanpy.net.download_isa. They must accept all the parameters listed on the website. Additionally, the download_isa need to allow the user to specify where to download the file (save_dir).
Path objects can also be used to write_bytes into a file. Check Path and request’s Response documentation to see how you could write the content of a requests.Response into a file.
1.2 Different instruments, different file types
Data from each instrument is provided in a different type of file.
Lir uses the Advanced Scientific Data Format (ASDF). The asdf Python library can read them and extract the data and metadata from these files.
Manannan uses Hierarchical Data Format 5 (HDF5). As with the asdf, this type of file contains the data and the metadata together. The h5py Python library can load them.
The Fand instrument stores the data in npy format and the metadata in JSON files. npy files can be read from NumPy’s load and the Python Standard Library provides support to load JSON files. The archive provides that pair of files in a single zip file (for which Python Standard Library also provides a module to load: zipfile).
Finally, the Ecne instrument doesn’t take images, but infers some measurements of the 300 deepest areas in the region. The measurements are turbulence, salinity and algal density for these points. They are stored in CSV.
Ideally a user shouldn’t need to unzip the file before loading it with the library. The io.BytesIO class can help you to load the file in memory. Take a look at how it’s used on the exemplar at the beginning of our course notes.
1.2.1 Getting the coordinates right
Arrays are stored in Python as (rows, columns). However, we normally refer to places in a map as (x, y) coordinates (with x running from left to right, and y running from bottom to top). Also when displaying an image in matplotlib with imshow, by default, you’d get the axis as its origin is in the top-left corner and positive y-values going downwards. For this library, we will need to manage two type of coordinate systems: pixels and earth.