Amy Whitehead's Research

the ecological musings of a conservation biologist

Extracting raster data using a shapefile

Leave a comment

I recently had an email from a PhD student in Austria who had a raster showing the distribution of Douglas Fir in Europe and wanted to know what proportion of each European country was covered in this species. They had a raster with presence (1) and absence (0) of Douglas-fir in Europe and wanted to calculate the number of cells with 1 and 0 within each country of the Europe. I’ve put together a dummy example below which shows how to R script to extract the number of raster cells in each country that meet a certain condition.


Douglas Fir (source: Pixabay)

Essentially the script works through the following steps:

  1. Loads the relevant shapefile and raster datasets.
  2. Identifies all of the countries within the shapefile.
  3. Within a loop, masks the presence-absence raster by each country and counts the number of cells that meet the required condition.

The key thing to note is that your raster and shapefile need to be in the same spatial projection or the process won’t give you the answer you were expecting.
I’ve created a dummy presence-absence raster here showing the worldwide distribution of foxes to demonstrate the process.

The function below is used to create a presence-absence raster from point data (see here).

presence.absence.raster <- function (mask.raster,,raster.label="") {

# set the background cells in the raster to 0
mask.raster[!] <- 0

#set the cells that contain points to 1
speciesRaster <- rasterize(,mask.raster,field=1)
speciesRaster <- merge(speciesRaster,mask.raster)

#label the raster
names(speciesRaster) <- raster.label

Then I generate some data for the worldwide distribution of foxes.


# read in species point data and extract data for foxes
mySpecies species <- "VulpesVulpes"

# extract fox data from larger dataset and keep only the x and y coordinates <- mySpecies[,c("X_WGS84&quot", "Y_WGS84",species)] <-[$VulpesVulpes==1,c("X_WGS84", "Y_WGS84")]

# read in a raster of the world
myRaster <- raster(system.file( "external/bioclim/current/bio3.grd",package="biomod2"))

# create presence absence raster for foxes
pa.raster <- presence.absence.raster(mask.raster=myRaster,, raster.label=species)
plot(pa.raster, main=names(pa.raster))
# import world polygon shapefile from package maptools
data(wrld_simpl, package = "maptools")
plot(wrld_simpl, add = T)
create a dummy presence-absence raster-1

The global (natural) distribution of foxes (at a ridiculously large-scale resolution)

Now here is the interesting bit. The code works by looping through each country and
1. masking the original raster by the country,
2. counting the number of cells that equal 1,
3. storing the output in a dataframe,
4. printing the country name to the screen so you know how far through the loop you are.

# identify the countries in your shapefile
countries <- levels(wrld_simpl$NAME)

# create a dataframe to store the output
output <- data.frame(country = countries, presence=NA, absence=NA, total=NA)

for(c in 1:length(output$country)){
r <- pa.raster
# mask the raster
masked.raster <- mask(r,wrld_simpl[wrld_simpl$NAME == output$country[c],])
# count the number of presence and absence cells & store in dataframe
output$presence[c] <- length(masked.raster[masked.raster==1])
output$absence[c] <- length(masked.raster[masked.raster==0])
output$total[c] <- length(masked.raster[!])
# print the country

This table shows the first six countries that have presence-absence data for foxes contained within the raster. Note that some countries have no data in this example as this world raster has a very large resolution that doesn’t represent some countries very well. This shouldn’t be a problem if your raster and shapefile data are well aligned for your question.

##        country presence absence total
## 2  Afghanistan        7       0     7
## 4      Algeria        7      16    23
## 7       Angola        0      12    12
## 11   Argentina        0      29    29
## 14   Australia        0      77    77
## 16  Azerbaijan        1       0     1

This process could easily be adapted to extract any type of data from a raster using a set of overlapping shapefile regions. This is obviously a very simple example with very big raster cells, so it will run very quickly. A raster with a much finer resolution may take some time to run, meaning that it may be more efficient to write this as a function rather than as a loop, but this method should work okay.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s