Amy Whitehead's Research

the ecological musings of a conservation biologist


3 Comments

Santa’s Little Helper – a function for randomly assigning Secret Santa presents

Christmas can be a stressful and expensive time of year, with many gifts to buy for many people. My family have tried to eliminate some of this stress by having a Secret Santa gifting system where we all buy one present for one member of the family on behalf of everybody else, with a cap of $50 per present.

In our family scheme, there are 3 rules:
1. You can’t give a gift to yourself,
2. You can’t give a gift to your partner (we figure that this will happen anyway),
3. You can’t give a gift to someone you gave one to last year.

Somewhere along the line, I’ve become the designated present-assigner. Now I could write names on pieces of paper and put them in a hat but 1) writing out 11 names is tedious, 2) I don’t have a suitable hat and 3) the rules above mean that there are quite often gifting conflicts arise that require the need for a redraw. So I’ve whipped up a little function in R that does the job. (Admittedly it probably would have been faster to use bits of paper but now I’m prepared for next year).

SantasLittleHelper <- function(myFrame,guestList,conflictCols = NULL){
  myTest <- TRUE
  nElves <- 0

  while (myTest == TRUE){
    myOut <- data.frame(gifter = myFrame[,guestList],
                        giftee = sample(myFrame[,guestList],
                                        replace = FALSE,
                                        size=nrow(myFrame))
                        )

    # check that guests haven't drawn themselves
    guestTest<- unlist(lapply(1:nrow(myOut),function(x) {
      myOut$giftee[x] == myFrame[x,guestList]
      }))

    # check for gifting conflicts
    if(!is.null(conflictCols)){

    conflictTest <- unlist(lapply(1:nrow(myOut),function(x) {
      grepl(myOut$giftee[x],myFrame[x,conflictCols])
      }))

    myTest <- any(c(guestTest,conflictTest[!is.na(conflictTest)]))

    } else{
      myTest <- any(guestTest)
    }

    # count the number of iterations needed to avoid conflicts
    nElves <- nElves + 1
  }

  message(paste(nElves,"elves were needed to generate the gift list"))
  return(myOut)
}

SantasLittleHelper takes three arguments: myFrame is a dataframe containing the list of guests to be assigned gifts; GuestList is the character name of the column containing the list of guests; and conflictCols is an optional character vector of column names that identify gifting conflicts that need to be avoided [1]. The function randomly assigns a giftee to each guest and then checks for conflicts. If conflicts exist, the gifts are reassigned and this continues using a while loop until a suitable solution is found. The function returns a dataframe containing detailing the gifter – giftee combinations, as well as outputting the number of elves (or iterations) needed to generate the list.

To run this for the 2016 Whitehead Family Christmas, I generated a dataframe guests where the column guest lists all of the people coming to Christmas, partner identifies everybody’s significant others and presents2013 lists the assigned giftees from 2013 (the last time we did the Secret Santa thing).

guest partner presents2013
John Fay Jo
Sue Simon Fay
Amy Hamish Sue
Jo Phil Hamish
Ashley Naomi Phil
Naomi Ashley John
Phil Jo Simon
Hamish Amy Naomi
Simon Sue Ashley
Fay John Amy
Zac NA NA

These data are fed into SantasLittleHelper and, bingo, the Whitehead Family Christmas is sorted for another year.

giftList <- SantasLittleHelper(myFrame = guests,
                               guestList = "guest",
                               conflictCols  = c("partner","presents2013")
                               )
27 elves were needed to generate the gift list
gifter giftee
John Zac
Sue Ashley
Amy Phil
Jo Amy
Ashley Fay
Naomi Sue
Phil John
Hamish Jo
Simon Naomi
Fay Simon
Zac Hamish

Hope everyone survives the silly season!

[Merry Christmas everyone!](http://www.r-bloggers.com/merry-christmas-5/)

[1] Technically you could use this feature to rig the outcome to avoid getting a present from Great Aunt Myrtle or Bob from Accounts by adding in an additional conflict column. Use your powers wisely or the ghosts of christmas past may come back to bite you!

Advertisements


Leave a comment

Extracting raster data using a shapefile

I recently had an email from a PhD student in Austria who had a raster showing the distribution of Douglas Fir in Europe and wanted to know what proportion of each European country was covered in this species. They had a raster with presence (1) and absence (0) of Douglas-fir in Europe and wanted to calculate the number of cells with 1 and 0 within each country of the Europe. I’ve put together a dummy example below which shows how to R script to extract the number of raster cells in each country that meet a certain condition.

fir-drove-1110793_640

Douglas Fir (source: Pixabay)

Essentially the script works through the following steps:

  1. Loads the relevant shapefile and raster datasets.
  2. Identifies all of the countries within the shapefile.
  3. Within a loop, masks the presence-absence raster by each country and counts the number of cells that meet the required condition.

Continue reading


8 Comments

Copying files with R

Following on from my recent experience with deleting files using R, I found myself needing to copy a large number of raster files from a folder on my computer to a USB drive so that I could post them to a colleague (yes, snail mail – how old and antiquated!).  While this is not typically a difficult task to do manually, I didn’t want to copy all of the files within the folder and there was no way to sort the folder in a sensible manner that meant I could select out the files that I wanted without individually clicking on all 723 files (out of ~4,300) and copying them over.  Not only would this have been incredibly tedious(!), it’s highly likely that I would have made a mistake and missed something important or copied over files that they didn’t need. So enter my foray into copying files using R. Continue reading


18 Comments

Converting shapefiles to rasters in R

I’ve been doing a lot of analyses recently that need rasters representing features in the landscape. In most cases, these data have been supplied as shapefiles, so I needed to quickly extract parts of a shapefile dataset and convert them to a raster in a standardised format. Preferably with as little repetitive coding as possible. So I created a simple and relatively flexible function to do the job for me.

The function requires two main input files: the shapefile (shp) that you want to convert and a raster that represents the background area (mask.raster), with your desired extent and resolution. The value of the background raster should be set to a constant value that will represent the absence of the data in the shapefile (I typically use zero).

The function steps through the following:

  1. Optional: If shp is not in the same projection as the mask.raster, set the current projection (proj.from) and then transform the shapefile to the new projection (proj.to) using transform=TRUE.
  2. Convert shp to a raster based on the specifications of mask.raster (i.e. same extent and resolution).
  3. Set the value of the cells of the raster that represent the polygon to the desired value.
  4. Merge the raster with mask.raster, so that the background values are equal to the value of mask.raster.
  5. Export as a tiff file in the working directory with the label specified in the function call.
  6. If desired, plot the new raster using map=TRUE.
  7. Return as an object in the global R environment.

The function is relatively quick, although is somewhat dependant on how complicated your shapefile is. The more individual polygons that need to filtered through and extracted, the longer it will take. Continue reading


3 Comments

Remotely deleting files from R

Sometimes programs generate a LOT of files while running scripts. Usually these are important (why else would you be running the script?). However, sometimes scripts generate mountains of temporary files to create summary outputs that aren’t really useful in their own right. Manually deleting such temporary files can be a very time consuming and tedious process, particularly if they are mixed in with the important ones. Not to mention the risk of accidentally deleting things you need because you’ve got bored and your mind has wandered off to more exciting things…

...like watching orca swim past from the hut window

…like watching orca swim past the hut window!

I had exactly this problem a few months ago when I had ~65,000 temp files from a modelling process that were no longer needed, inconveniently mixed in with the things I needed to keep. Clearly deleting these files manually wasn’t really going to be an option. There are a number of ways to tackle this problem but R provided a simple two-line solution to the problem. Continue reading


13 Comments

Creating a presence-absence raster from point data

I’m working on generating species distribution models at the moment for a few hundred species. Which means that I’m trying to automate as many steps as possible in R to avoid having to click buttons hundreds of times in ArcView.

One of the tasks that I need to do is to convert presence-only latitude and longitude data into a presence-absence raster for each species. It seems like this would be something that relatively simple but it took me longer than it should have to figure it out. So I’m posting my code here so 1) I don’t forget how I did it; and 2) because I had someone ask me how to exactly this thing this afternoon and it took me ages to hunt through my poorly organised files to find this piece of code! So here it is:

Because I’m a function kinda girl, I wrote this as a function. It basically goes through three steps:

1. Take an existing raster of the area you are interested in mask.raster and set the background cells to zero (absences).

2. rasterize the presence points for your species species.data and set those cells to one (presences).

3. Label the new raster by your species names raster.label and save it as a new raster.

presence.absence.raster <- function (mask.raster,species.data,raster.label="") {
require(raster)

# set the background cells in the raster to 0
mask.raster[!is.na(mask.raster)] <- 0

#set the cells that contain points to 1
speciesRaster <- rasterize(species.data,mask.raster,field=1)
speciesRaster <- merge(speciesRaster,mask.raster)

#label the raster
names(speciesRaster) <- raster.label
return(speciesRaster)
}

Below is an example of how the function works using data on the global distribution of foxes data from the biomod2 package.

library(biomod2)

# read in species point data and extract data for foxes
mySpecies <- read.csv(system.file("external/species/mammals_table.csv", package="biomod2"), row.names = 1)
species <- "VulpesVulpes"

# extract fox data from larger dataset and keep only the x and y coordinates
fox.data <- mySpecies[,c("X_WGS84", "Y_WGS84",species)]
fox.data <- fox.data[fox.data$VulpesVulpes==1,c("X_WGS84", "Y_WGS84")]

# read in a raster of the world
myRaster <- raster(system.file( "external/bioclim/current/bio3.grd",package="biomod2"))

# create presence absence raster for foxes
pa.raster <- presence.absence.raster(mask.raster=myRaster, species.data=fox.data, raster.label=species)
plot(pa.raster, main=names(pa.raster))

run_function

In this plot, the presences (1) are shown in green and the absences (0) in light grey.

Helpful things to remember (or things I learnt the hard way)

  1. Make sure your species point data and raster are in the same projection and that they actually overlap!
  2. Set your desired raster extent and resolution in the mask.raster before you get started.
  3. The species point data that you feed into the function should just be a list of x and y co-ordinates – no species names or abundances or you’ll confuse the poor beast and it won’t work!

And yes, foxes are also present in Australia where they are a pest. I guess this map shows their natural range before people started doing silly things.

Bought to you by the powers of knitr & RWordpress 

(Well it was and then things went a little awry, so I had to do some tinkering!)


35 Comments

Combining dataframes when the columns don’t match

Most of my work recently has involved downloading large datasets of species occurrences from online databases and attempting to smoodge1 them together to create distribution maps for parts of Australia. Online databases typically have a ridiculous number of columns with obscure names which can make the smoodging process quite difficult.

For example, I was trying to combine data from two different regions into one file, where one region had 72 columns of data and another region had 75 columns. If you try and do this using rbind, you get an error but going through and identifying non-matching columns manually would be quite tedious and error-prone.

Here's an example of the function in use with some imaginary data. You'll note that Database One and Two have unequal number of columns (5 versus 6), a number of shared columns (species, latitude, longitude, database) and some unshared columns (method, data.source).

  species latitude longitude        method     database
1       p   -33.87     150.5   camera trap database.one
2       a   -33.71     151.3 live trapping database.one
3       n   -33.79     151.8   camera trap database.one
4       w   -34.35     151.3 live trapping database.one
5       h   -31.78     151.8   camera trap database.one
6       q   -33.17     151.2 live trapping database.one
      database species latitude longitude data.source accuracy
1 database.two       d   -33.95     152.7   herbarium    3.934
2 database.two       f   -32.60     150.2      museum    8.500
3 database.two       z   -32.47     150.7   herbarium    3.259
4 database.two       f   -30.67     150.6      museum    2.756
5 database.two       e   -32.73     149.4   herbarium    4.072
6 database.two       x   -33.49     153.3      museum    8.169
rbind(database.one, database.two)
Error: numbers of columns of arguments do not match

So I created a function that can be used to combine the data from two dataframes, keeping only the columns that have the same names (I don't care about the other ones). I'm sure there are other fancier ways of doing this but here's how my function works.

The basics steps
1. Specify the input dataframes
2. Calculate which dataframe has the greatest number of columns
3. Identify which columns in the smaller dataframe match the columns in the larger dataframe
4. Create a vector of the column names that occur in both dataframes
5. Combine the data from both dataframes matching the listed column names using rbind
6. Return the combined data

rbind.match.columns <- function(input1, input2) {
    n.input1 <- ncol(input1)
    n.input2 <- ncol(input2)

    if (n.input2 < n.input1) {
        TF.names <- which(names(input2) %in% names(input1))
        column.names <- names(input2[, TF.names])
    } else {
        TF.names <- which(names(input1) %in% names(input2))
        column.names <- names(input1[, TF.names])
    }

    return(rbind(input1[, column.names], input2[, column.names]))
}

rbind.match.columns(database.one, database.two)
   species latitude longitude     database
1        p   -33.87     150.5 database.one
2        a   -33.71     151.3 database.one
3        n   -33.79     151.8 database.one
4        w   -34.35     151.3 database.one
5        h   -31.78     151.8 database.one
6        q   -33.17     151.2 database.one
7        d   -33.95     152.7 database.two
8        f   -32.60     150.2 database.two
9        z   -32.47     150.7 database.two
10       f   -30.67     150.6 database.two
11       e   -32.73     149.4 database.two
12       x   -33.49     153.3 database.two

Running the function gives us a new dataframe with the four shared columns and twelve records, reflecting the combined data. Awesome!

Edited to add:

Viri asked a good question in the comments – what if you want to keep all of the columns in both data frames? The easiest solution to this problem is to add dummy columns to each dataframe that represent the columns missing from the other data frame and then use rbind to join them together. Of course, you won't actually have any data for these additional columns, so we simply set the values to NA. I've wrapped this up into a function as well.

rbind.all.columns <- function(x, y) {

    x.diff <- setdiff(colnames(x), colnames(y))
    y.diff <- setdiff(colnames(y), colnames(x))

    x[, c(as.character(y.diff))] <- NA

    y[, c(as.character(x.diff))] <- NA

    return(rbind(x, y))
}
rbind.all.columns(database.one, database.two)

And here you can see that we now have one dataframe containing all seven columns from our two sources, with NA values present where we are missing data from one of the dataframes. Nice!

   species latitude longitude        method     database data.source	accuracy
1        p   -33.87     150.5   camera trap database.one        <NA>	NA
2        a   -33.71     151.3 live trapping database.one        <NA>	NA
3        n   -33.79     151.8   camera trap database.one        <NA>	NA
4        w   -34.35     151.3 live trapping database.one        <NA>	NA	
5        h   -31.78     151.8   camera trap database.one        <NA>	NA
6        q   -33.17     151.2 live trapping database.one        <NA>	NA
7        d   -33.95     152.7          <NA> database.two   herbarium	3.934
8        f   -32.60     150.2          <NA> database.two      museum	8.500
9        z   -32.47     150.7          <NA> database.two   herbarium	3.259
10       f   -30.67     150.6          <NA> database.two      museum	2.756
11       e   -32.73     149.4          <NA> database.two   herbarium	4.072
12       x   -33.49     153.3          <NA> database.two      museum	8.169

Happy merging everyone!

1 A high technical and scientific term!

Bought to you by the powers of knitr & RWordpress