FDL 2019: CUMULO - A dataset for learning about clouds

 

Image Credit: NASA

Climate and Clouds

One of the greatest sources of uncertainty in future climate projections comes from limitations in modelling clouds and in understanding how different cloud types interact with the climate system. Clouds can be made up of liquid drops or ice crystals, and can be dense and opaque, or thin and flocculent. They transition between states as they evolve and the lifetime of each state is a key parameter of climate models. Only a few percent change in the amount of thick, reflective cloud cover would cancel out heating due to human activity!

A key first step in reducing the uncertain effect of clouds is to accurately map and classify cloud types at high spatial and temporal resolution - we can’t understand what we don’t measure.

How to enhance cloud measurements using ML

During FDL Europe 2019, the Atmospheric Phenomena team created a unique new dataset of labelled cloud types covering the entire globe over the course of a year. This CUMULO data merges hyperspectral images from the NASA MODIS instrument with pixel-width tracks of cloud type labels from the CloudSat Profiling Radar. The images contain 36 spectral bands and pixels are 1km on a side, with only one out of every 1354 pixels labelled. A scene from a single day is shown below.

The team applied a hybrid Invertible Residual Network to the data, expanding the cloud-type labels to the global scale (see below) and revealing evidence for new fine-grained classes of cloud. Indeed, cloud types are not limited to the well-studied World Meteorological Organisation genera: being able to identify more species of clouds is an open-question in the cloud community.

How well does the ML algorithm work?

Before training, twenty percent of the data were split off as an isolated testing set and used to assess robustness of the best predicted labels. The table below shows the results for eight different cloud types.

As a sanity check, the results for some adjacent days were inspected by eye and compared to known cloud properties. Encouragingly, predicted classes are spatially coherent, with each type localised to expected regions (e.g., Sc clouds occur mostly over upwelling regions of major oceans), and clouds are temporarily contiguous across days. In addition, the predicted distributions of liquid water path (LWP - the total column of atmospheric water at a position) and cloud optical thickness (COT - a measure of cloud thickness and opacity) are almost identical to the distributions measured directly by CloudSat.



Summary and next steps

The CUMULO dataset represents the first high spatial resolution prediction of cloud types covering the whole globe for a year in time. As an excellent baseline, it issues a challenge to the machine learning community to engage with climate scientists in developing innovative solutions to climate challenges. We need these more than ever now.

Global maps of different cloud types identified during one day of CUMULO data.


Resources:

FDL Tech Memo [LINK]

NeurIPS 2019 Paper [LINK] (Best Paper Award)



Acknowledgments

This work is the result of the 2019 ESA Frontier Development Lab (FDL) Atmospheric Phenomena and Climate Variability challenge. We are grateful to all organisers, mentors and sponsors for providing us this opportunity. We thank Google Cloud for providing computing and storage resources to complete this work. Finally, we thank Yarin Gal for helpful discussions and Sylvester Kaczmarek for his help and support in coordinating the work.



 
Previous
Previous

FDL 2020: How does atmospheric dust affect marine clouds?

Next
Next

FDL 2020 Applications Closed