Extracting raster data

Authors: Michael T. Hallworth
Contributors: Clark S. Rushing

R Skill Level: Intermediate - this activity assumes you have a working knowledge of R

This activity incorporates the skills learned in the Points, Polygons, Raster & Projections activities.

In case you missed it

SpatialPoints
SpatialPolygons
Rasters
Projections

Objectives & Goals

Upon completion of this activity, you will:

know how to extract raster data to spatial layers
know how to summarize extracted data

Required packages

To complete the following activity you will need the following packages installed: raster velox sp rgeos

Installing the packages

If installing these packages for the first time consider adding dependencies=TRUE install.packages("raster") install.packages("velox") install.packages("sp") install.packages("rgeos")

Download R script Last modified: 2019-09-20 18:26:28

library(raster)
library(velox)
library(sp)
library(rgeos)

Extracting raster data to an area or location of interest is a common GIS task. There are a few things to keep in mind while extracting raster data in R. First, as we’ve seen in Projections it’s best to use projected data. Second, make sure the two layers have the same coordinate reference system. Lastly, make sure you know the data class you’re extracting (i.e., continuous, factor, etc.)

Extracting Raster to SpatialPoints

Extracting raster data to SpatialLayers can be done using the extract function in the raster package. See extract function documentation for details regarding options.

Gathering spatial data

For this first example, we’ll use a digital elevation model (DEM) of Hubbard Brook Experimental Forest in New Hampshire. The dem is of class numerical. Which means we can perform numerical calculations like finding the mean and range of the data. Below, we’ll use a data set where the data are factors.

Reading in the DEM

# read in DEM of HBEF
DEM <- raster::raster("../Spatial_Layers/hb10mdem.txt")

Each summer, the Smithsonian Conservation Biology Institute’s Migratory Bird Center conducts point counts at nearly 400 locations throughout Hubbard Brook. The point count locations have been georeferenced. Let’s extract the elevation of each survey location. First, we’ll need to read in the in the survey locations.

SurveyLocs <- raster::shapefile("../Spatial_Layers/hbef_valleywideplots.shp")

## class       : SpatialPointsDataFrame 
## features    : 395 
## extent      : 275500, 283000, 4866300, 4871200  (xmin, xmax, ymin, ymax)
## crs         : +proj=utm +zone=19 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0 
## variables   : 4
## names       : X8.digit.Pl, Plot.Tag, UTM.Eastin, UTM.Northi 
## min values  :    75567500,        1,     275500,    4866300 
## max values  :    83070200,       99,     283000,    4871200

Before we proceed, let’s make sure the CRS is the same in for both the raster layer (DEM) and the SpatialPoints layer.

crs(SurveyLocs)

## CRS arguments:
##  +proj=utm +zone=19 +datum=NAD83 +units=m +no_defs +ellps=GRS80
## +towgs84=0,0,0

crs(DEM)

## CRS arguments:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

The two layers have different coordinate reference systems. We’ll need to transform the SpatialPoints layer to match the raster (DEM). See caution projecting rasters for details on why we’ll project the SpatialPoints instead of the raster.

# Transform points to match CRS of the DEM
SurveyLocs_proj <- sp::spTransform(x = SurveyLocs,
                                   CRSobj = crs(DEM))

Now that the two layers have the same CRS we can extract the elevation to survey points. We’ll use the extract function available in the raster package. The first argument is the raster values you want to extract. The second argument (y) can be a SpatialObject or matrix.

# Extract elevation to SpatialPoints
surveyElev <- raster::extract(x = DEM,
                              y = SurveyLocs_proj)

The resulting object is a vector with the same length as SurveyLocs_proj.

##  num [1:395] 475 494 555 588 575 574 574 588 602 630 ...

We can add the data directly into the SpatialPointsDataFrame with the following code.

SurveyLocs_proj$Elev_m <- raster::extract(x = DEM,
                                          y = SurveyLocs_proj)

## class       : SpatialPointsDataFrame 
## features    : 395 
## extent      : -71.79809, -71.70358, 43.91606, 43.95997  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 5
## names       : X8.digit.Pl, Plot.Tag, UTM.Eastin, UTM.Northi, Elev_m 
## min values  :    75567500,        1,     275500,    4866300,    240 
## max values  :    83070200,       99,     283000,    4871200,    965

plot of chunk unnamed-chunk-12

Extract raster to polygon/s

Below, we’ll determine the proportion of landcover types within Alaska. In order to complete this task, some new functions will be introduced but we’ll also use some skills we learned in other activities.

Gathering spatial data

We’re interested in quantifying landcover around Anchorage, AK or more broadly the landcover types found in Alaska.

First, we’ll read in the landcover classes. In this example, we’ll use landcover data with a relatively coarse resolution for simplicity and speed.

# Read in landcover data
LandCover <- raster::raster("../Spatial_Layers/MCD12C1_T1_2011-01-01_rgb_3600x1800.TIFF")

## class      : RasterLayer 
## dimensions : 1800, 3600, 6480000  (nrow, ncol, ncell)
## resolution : 0.1, 0.1  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : /home/travis/build/mhallwor/mhallwor.github.io/Spatial_Layers/MCD12C1_T1_2011-01-01_rgb_3600x1800.TIFF 
## names      : MCD12C1_T1_2011.01.01_rgb_3600x1800 
## values     : 0, 255  (min, max)

Code	Land Cover Class
01	Evergreen Needleleaf forest
02	Evergreen Broadleaf forest
03	Deciduous Needleleaf forest
04	Deciduous Broadleaf forest
05	Mixed forest
06	Closed shrublands
07	Open shrublands
08	Woody savannas
09	Savannas
10	Grasslands
11	Permanent wetlands
12	Croplands
13	Urban and built-up
14	Cropland/Natural vegetation mosaic
15	Snow and ice
16	Barren or sparsely vegetated
17	Water bodies
18	Tundra

*National Center for Atmospheric Research Staff (Eds). Last modified 10 Feb 2017. “The Climate Data Guide: CERES: IGBP Land Classification.” Retrieved from https://climatedataguide.ucar.edu/climate-data/ceres-igbp-land-classification.

Now that we have a basic Landcover layer, we’ll can gather spatial data for Alaska. We’ll grab the entire United States and then subset out Alaska.

States <- raster::getData("GADM", country = "United States", level = 1)

AK <- States[States$NAME_1=="Alaska",]

## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : -179.1506, 179.7734, 51.20972, 72.6875  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 10
## names       : GID_0,        NAME_0,   GID_1, NAME_1, VARNAME_1, NL_NAME_1, TYPE_1, ENGTYPE_1, CC_1, HASC_1 
## value       :   USA, United States, USA.2_1, Alaska, AK|Alaska,        NA,  State,     State,   NA,  US.AK

Great - now we have a SpatialPolygonsDataFrame of Alaska. Take note of the CRS.

Landcover types are factors since they’re categorical. The reason that’s important is because when extracting raster data there are options to apply a function to the underlying raster data. For example, the extract function allows users to specify a function like mean and the function will return the mean raster value under the area of interest. However, because landcover types are factors, returning the mean landcover type is meaningless. Instead, as a final result were interested in having a table giving the percent of each landcover type in Alaska. We’ll work our way up to that. We’ll start by first extracting the landcover types in Alaska using the extract function available in the raster package.

The velox package is much faster for extracting raster values, especially with large data sets. See what else velox can do velox documentation

# Extract using the raster package 

# x = raster layer
# y = SpatialLayer

a <- Sys.time()
AK_lc <- extract(x = LandCover,
                 y = AK)
Sys.time()-a

## Time difference of 2.391696 mins

# Extract using velox 
LandCover_velox <- velox::velox(LandCover)

b <- Sys.time()
AK_lc_velox <- LandCover_velox$extract(AK)
Sys.time() - b

## Time difference of 25.34263 secs

On your own:

What does the extract function return? How are the data stored? How do you think the resulting object would differ if we had more than one SpatialPolygon?

## List of 1
##  $ : num [1:27934] NA 10 10 10 10 10 10 10 10 10 ...

The extract function returns a list with the raster values that fall within our area of interest. Let’s summarize the data a little bit. We’ll use the table function to briefly summarize the landcover types within Alaska.

table(AK_lc)

## AK_lc
##     1     3     4     5     7     8     9    10    11    13    14    15 
##   877     5     3   426 13745  6333    59  4264    25     2     5  1349 
##    16 
##     6

plot of chunk unnamed-chunk-21

Custom function to summarize landcover data

Let’s write a simple function that returns the landcover data as a percentage. The first thing we’ll do is make a sorted vector that has all the unique landcover types that are found in the landcover data set.

# sort the unique values in LandCover
z <- sort(unique(raster::values(LandCover)))

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

Now that we have a sorted vector of all the landcover types in the LandCover raster - let’s summarize the landcover types in Alaska. To do that we’ll need to write a custom function. Our function will be called summarizeLC. The summarizeLC function will except three arguments: x = extracted landcover data, LC_classes = land cover classes in the data set & LC_names a vector that contains the names of the landcover types. We’ll set LC_names = NULL so it’s optional.

summarizeLC <- function(x,LC_classes,LC_names = NULL){
               # Find the number of cells 
               y <- length(x)
               # Make a table of the cells
               tx <- table(x)
               # Create an empty array to store landcover data
               LC <- array(NA,c(1,length(LC_classes)))
               # Loop through the landcover types & return 
               # the number of cells within each landcover type
               for(i in seq(LC_classes)){
               LC[1,i] <- ifelse(LC_classes[i] %in% dimnames(tx)[[1]], 
                                 #if true
                                 tx[which(dimnames(tx)[[1]]==LC_classes[i])],
                                 # if false
                                 0) 
               } # end loop
               # Convert to percentage 
               LC <- LC/y
               # 
               if(!is.null(LC_names)){
                 colnames(LC)<-LC_names}
               else{colnames(LC)<-LC_classes}
        
               return(LC)
}

Here we’ll apply our function to the extracted raster data. Remember, the extracted raster data (AZ_lc) are stored as a list. Because of that we’ll use the lapply function to apply our function (summarizeLC) to all elements of a list. This is especially useful when you extract raster values to more than one polygon, as the extracted values are stored in separate lists.

summaryValues <- lapply(AK_lc,FUN = summarizeLC,LC_classes = z)

Here is what the final result may look like.

##                                    Percent
## Evergreen Needleleaf forest          0.031
## Evergreen Broadleaf forest           0.000
## Deciduous Needleleaf forest          0.000
## Deciduous Broadleaf forest           0.000
## Mixed forest                         0.015
## Closed shrublands                    0.000
## Open shrublands                      0.492
## Woody savannas                       0.227
## Savannas                             0.002
## Grasslands                           0.153
## Permanent wetlands                   0.001
## Croplands                            0.000
## Urban and built-up                   0.000
## Cropland/Natural vegetation mosaic   0.000
## Snow and ice                         0.048
## Barren or sparsely vegetated         0.000

On your own:

Find the proportion of each landcover type within your state. Use the extract function and the custom summarizeLC function return the landcover percentages.

Challenge:

Compare and contrast the landcover classes for two or more states.

##                                    New Hampshire West Virginia
## Deciduous Broadleaf forest                 0.111         0.905
## Mixed forest                               0.867         0.012
## Grasslands                                 0.000         0.002
## Urban and built-up                         0.011         0.003
## Cropland/Natural vegetation mosaic         0.004         0.078