Informal Settlement Detection: The Results
Informal Settlement Detection
Tracking down a better path to mapping informal settlements using machine learning.
Note: FDL Europe focused on informal settlement detection in 2018, with key support of partners such as the European Space Agency, University of Oxford, the Satellite Applications Catapult, NVIDIA, and the additional support of UNICEF, Digital Globe, Afrobarometer and AIDData. This article follows a team of machine learning and Earth observation researchers during an eight-week acceleration program hosted at the University of Oxford. For a more in-depth look at the need for informal settlement detection you can read part one of this article here: Informal Settlements: Hiding In Plain Sight.
The following takes place at FDL Europe, during the Summer of 2018
[Scene One: Kellogg College at Oxford University, England]
“I can’t wait to hear what you’re working on.”
After brief introductions, the senior executive from the European Space Agency (ESA, was ready to get down to business. No one present would have been surprised if the tall, square-jawed man in a sharp navy sport coat had introduced himself as a former astronaut. His voice commanded respect. Dr Iarla Kilbane-Dawe was the new Head of ESA’s Φ-lab, a special group based in Esrin, Italy, tasked with accelerating research and discovery in Earth observation at ESA. ESA’s Φ-lab is the lead partner in FDL Europe, and Iarla was here to learn more about how the FDL Europe process worked, and in what direction this specific team was headed.
It was an unusually warm that day at Oxford University’s Kellogg College. The heatwave which stretched across Western Europe in the summer of 2018 turned the Victorian university building into a sauna, but the people gathered around the working table that day barely noticed. Iarla knew that for the last three weeks, this team had been given the charter to find a way to do “something cool” with artificial intelligence in the area of Earth observation. He just didn’t know what the cool thing was going to be.
This four-person FDL Europe team was comprised of both machine learning and Earth observation researchers from around the world. Bradley Gram-Hansen was a PhD candidate at Oxford University with a focus on machine learning. The other machine learning specialist, Patrick Herber, had already been combining the fields of machine learning and Earth observation as a PhD candidate at Technical University of Kaiserslautern in Germany. Indhu Varatharajan (PhD candidate at German Aerospace Center, Berlin) & Faiza Azam (former post-doc researcher, University of Bremen, Germany) brought complementary backgrounds in remote sensing. While Bradley had never worked in Earth observation before, Indhu and Faiza had never worked with machine learning. The first couple of weeks had been an intense learning exercise for all of them, however having Patrick on the team helped quickly connect the dots between new concepts and disparate terminology. By the time of their meeting with Iarla, the team had a clear focus and an approach and were ready to move forward.
As Bradley and the team began taking Iarla through an overview of their direction and their technical approach, Iarla stopped him to offer them a suggestion.
“When talking about your work - whether it’s with the public, with colleagues or with potential funders and partners, it’s always best to start with the WHY?”
Bradley began nodding his head.
Iarla continued, “I understand that you are looking to map informal settlements using AI. And it’s intriguing. But to really pull me into wanting to know more about your project, it would be best to help me understand WHY we need to map informal settlements in the first place.”
The team took note and quickly jumped ahead to a different section of his presentation. They explained that during the first week of the program, the team had met with an advisor from UNICEF that helped explain why the location and scale of informal settlements are not easily understood. Indhu chimed in to explain why mapping these settlements is a key part of better understanding the challenges that need to be addressed to offer billions of residents a path to a better life. (You can read more about the “WHY” behind informal settlement detection in Part-1 of of this two-part article here).
Next, Patrick and Fazia explained certain requirements that became apparent during additional consultation with UNICEF. To be broadly used by a wide range of NGOs and other stakeholders, any solution would need to deliver:
Global Coverage - While informal settlements are growing fastest across the African continent, the same issues are greatly impacting the rest of the world including Asia, Latin America and Southern/Eastern Europe. Furthermore, the variation in these settlements from one region to the next, increases the detection challenge.
Low Cost - One of reasons so little is known about informal settlements is that NGOs, and even governments, of developing countries rarely have the resources or budget required to gather information on the ground, or purchase data via private providers. Any viable solution would need to minimise the need for additional budget and large-scale computational power.
Updateable - In addition, the very nature of these settlements mean they change in place and size over time, sometimes quickly. Large effort, point-in-time mapping efforts would be helpful, but could quickly lose value as these communities shift in size and location.
“Got it. And you believe you can meet these requirements to automatically create accurate maps of informal settlements using artificial intelligence?” Iarla asked.
“Yes,” Indhu responded confidently.
“Show me how.”
The team then went onto to describe how they were pursuing a combination of different approaches to detecting informal settlements from above - high above with Earth observation satellites in low-Earth orbit.
First, for the most part, informal settlements tend to have very different roof materials vs formal settlements. The team believed that they could train a neural network to learn to recognize informal settlements by identifying the specific combination of materials used in slums vs formal settlements and other land use such as industry, rural, farming, etc.
However, there are areas where roof material alone won’t work. For instance, the team knew that in areas of Sudan, both the formal and informal settlements used roofs made of concrete. There needed to be a secondary approach for these exceptions. For this, the team decided to look at building density. Informal settlements are considerably more compact and dense than planned development. While it may require using higher resolution satellite imagery, they planned a secondary machine learning approach to support the primary method.
In the end, they hoped their work would eventually lead to an application that would not only create current, up-to-date maps of informal settlements around the world, but also show how these settlements have changed and grown over time by applying the same techniques to historical satellite data. If successful, the results could become a game changer.
Once they finished the presentation, Iarla sat further back in his chair to think for a moment. All four of the team members looked at him trying to guess what he was going to say next. Did he like what they were doing? Did he think they were on the right track? Or would he advise them to go back to the drawing board?
“I like it,” he finally said. “But you’re going to need some help.” Iarla was already scrolling through his mental rolodex for people he knew who may be able to provide additional support to this team. While not an AI expert himself, he knew enough to appreciate the significant challenges this team was about to face over the next several weeks in pursuit of this goal.
[Scene Two: Same Room, Two Weeks Later]
“Guys, come look at this.”
Bradley pushed back his chair from his laptop with a look of surprise mixed with a healthy dash of “I knew it all along.” The past 20+ days had been far from easy. At times it looked like they may not be able to pull it off. But right there in front of him where the results. Their networks were detecting informal settlements with high accuracy rates. A few more tweaks and the team knew they could push those numbers even higher.
To appreciate this, let’s look at the various elements that needed to come together to make this possible.
High Resolution Satellite Images (Public): Iarla’s lab at ESA had helped the team access high resolution imagery of the whole planet’s land mass via the Copernicus Programme. Specifically, the Sentinel 2 satellite constellation has several features that make it ideal for a project like this. First, it maps the entire globe’s land mass in 10 metre by 10 metre pixels (although some bands of Sentinel-2 data use lower resolutions of up to 20 metre x 20 metre pixels depending on the spectral band). Importantly, these images are refreshed EVERY FIVE DAYS. Furthermore, Sentinel 2 captures not only images of what we can see (RGB frequencies) but also many spectral bands we can’t see. This additional spectral data holds a specific signature for each pixel that is a direct result of the materials on the ground. The neural network lead by Bradley would analyze this spectral information to match pixels that look like those from informal settlements. Even more, data from ESA’s Copernicus Programme is free and open to the public potentially making any resulting tool more accessible to a wider range of stakeholders.
Very High Resolution Satellite Images (Private). Standing behind Bradley that day was Adrien Muller from the UK Satellite Catapult, another FDL Europe partner. His organisation was able to coordinate access to a second source of satellite data. Digital Globe is a private satellite company that provides Very High Resolution data (with pixel size down to 30cm x 30cm) to companies and organisations around the world. With this higher definition, the team could build a second neural network led by Patrick that would be able see the ground “texture” much more clearly and recognise the tightly packed structures that typify informal settlements.
What is Training Data and Test Data?:
If you were going to “train” a neural network to be able to tell the difference between a cat and a dog, you would need to show the network a large number of pictures already labeled as either a “cat” or a “dog.”
Once the network was able to understand differences between these groups, you could then start showing it unlabeled pictures and it would then be able to predict whether or not the unlabeled image was more likely a dog or a cat. This same concept applies for a neural network looking to understand the difference between an informal settlement and a formal settlement - or for that matter anything else that isn’t an informal settlement. The more “training” data the team could get access to, the more confident they could be in their predictions.
In addition, the team needed even more pre-labeled data to verify the accuracy of their predictions. How often did the network get the right answer? How often did it generate a false positive? How often was it missing an area of informal settlement? Without this additional pre-labeled data, none of this would be possible.
In the end, the team would randomly allocate 80% of the annotated data they were able to collect for use as training data and reserve the other 20% as test data.
Training Data. The acquisition of training and test data proved to be one of the key challenges for the team. There simply wasn’t enough accurate data about the exact locations of informal settlements available. Sources did exist, but they needed to be tracked down and accessed.
Once again, the Satellite Applications Catapult proved to be invaluable by connecting the team with annotated satellite imagery for the locations of informal settlements in parts of Kenya, South Africa, Nigeria, Sudan, Colombia and Mumbai. The team was able to isolate these areas in the Sentinel-2 images and extract the necessary spectral information at those specific points.
Another win for the team was getting getting labeled data with geo-located roof materials identified within 36 African countries. This data came from partner AIDData and Afrobarometer and was based on a survey which asked respondents what the roof of their home or shelter was made of. While tremendously useful, the team found this data was not always reliable as many cases the location of survey data corresponded to an area that didn’t contain any structures. Part of this was most likely human error. Another part of this was due to purposeful distortion of geo-located coordinates added to protect the privacy of respondents.
While none of the data sources were perfect, the team needed to find ways around these obstacles. For instance, the team needed to perform multiple steps to calibrate various pixel size resolutions from various bands of satellite data. They needed to create several pre-processing steps to reduce noise and balance data in the Afrobarometer survey data. During an interview toward the end of the eight-week session, Patrick Helber would note that one of the biggest differences between working with theoretical problems in an academic setting and working on real-world solutions was the need to deal with messy data. In academic theory, all data sets are clean and organized. In the real world, the data is almost never perfect.
The Machine Learning Networks. As the state-of-the-art pushes further and further in the field of artificial intelligence, more and more specialized types of machine learning networks are being developed and becoming available to researchers. Sometimes, an existing network will work perfectly for a problem “off the shelf”. Sometimes, modifications need to be made to existing methods to adapt them for the particular use. Other times, multiple methods need to be used and combined in new ways to accomplish the task. This last path turned out to be the required method for FDL Europe team.
For the primary network that would carry the load in processing Sentinel-2 images analyzing the spectral signals, the team used a Canonical Correlation Forest (CCF) method. This new method was chosen because it fit a number of the team’s very specific requirements. First, it’s designed to classify information (such as informal settlement vs. not informal settlement). CCF’s are also particularly good at providing accurate results in situations where you don’t have a lot of training data to work with. Finally, this method is so simple to use and computationally efficient, that it offers many advantages to projected users, such as NGOs.
For the secondary network that would analyze the optical data of private, very high resolution data, the team needed to go in a different direction. For this task they used a Convolutional Neural Network (CNN). While more computationally intense, a CNN is ideal when analyzing / classifying visual images. The work developed by FDL Europe leverage both types of networks (CCF and CNN) in order to provide multiple detection methods depending on the restrictions and specific characteristics of the region.
Compute Resources. For years, FDL Europe partner NVIDIA, has been developing technologies to make video games run faster. A key result has been the GPU or Graphical Processing Unit. It just so happens that the GPU has proven to be substantially better at simultaneously processing large blocks of data than the traditional CPU found in most computers. And when you’re trying to train neural networks by analyzing large blocks of very high resolution satellite images, NVIDIA’s GPU’s are massively helpful. NVIDIA enabled the FDL Europe team to remotely use eight GPUs (with 16-GB of memory each) to power through both the training data and the test data.
ESA, Satellite Applications Catapult, University of Oxford, UNICEF, NVIDIA, Afrobarometer, AIDData, and many others all played key roles in enabling these four researchers to create a breakthrough. And yet, this was only the first big step.
It was these test results that were putting a smile on the team’s face this morning.
[Scene three: London, England, August 16th, 2018]
“We would love to talk to you.”
The team’s presentation to a packed audience at the venerable British Interplanetary Society in London had been a great success so far. Indhu had begun by vividly provided the background on the growing issues of informal settlements and the need for accurate mapping.
They had shown that for the first time, informal settlements can be detected effectively using only freely and openly accessible multi-spectral satellite imagery.
But once Patrick had walked through his portion of the presentation and acknowledged the many partners that had helped get them this far, he changed the tone of the evening. Instead of speaking to slides, he now began speaking directly to the audience asking them to engage with the team members present to talk about extending the work they’ve done.
Now that they achieved what amounted to a proof-of-concept, the team knew there was much more to be done to bridge the gap between what they had built so far and a working, usable application that would help drive more informed urban planning across both public and private spheres.
The ending slide of the presentation included a call to action: “For collaboration, suggestions, comments, please contact us.”
While the team had only worked with a small number of test locations in only certain regions, this work needs to be extended to cover the entire globe. This means acquiring significantly more labeled data from many more locations. Next, this global coverage would ideally be applied to historical data to create models that illustrate changes over time and make predictions about the future. Finally, this work would need to be be packaged and delivered in such a way that non-technical stakeholders can use these tools in real time to impact decision making.
Accomplishing these next steps will require ongoing support from not only FDL Europe, Oxford University, Satellite Applications Catapult, UNICEF and NVIDIA, but also from new partners that would need to be brought on board. Perhaps some of these new partners were represented in the audience that night. Perhaps that night they were only two or three degrees of separation away. Perhaps right now one of those important new partners is reading.
To get in touch with the FDL Europe Program, please reach out directly to