Amy Whitehead's Research

the ecological musings of a conservation biologist

Remotely deleting files from R

3 Comments

Sometimes programs generate a LOT of files while running scripts. Usually these are important (why else would you be running the script?). However, sometimes scripts generate mountains of temporary files to create summary outputs that aren’t really useful in their own right. Manually deleting such temporary files can be a very time consuming and tedious process, particularly if they are mixed in with the important ones. Not to mention the risk of accidentally deleting things you need because you’ve got bored and your mind has wandered off to more exciting things…

...like watching orca swim past from the hut window

…like watching orca swim past the hut window!

I had exactly this problem a few months ago when I had ~65,000 temp files from a modelling process that were no longer needed, inconveniently mixed in with the things I needed to keep. Clearly deleting these files manually wasn’t really going to be an option. There are a number of ways to tackle this problem but R provided a simple two-line solution to the problem.

The first step is to identify if there are any patterns in the file names that will help you remove only the files that you want to delete (and not the really important ones!). Then construct a regular expression that matches the pattern. A handy reference guide to regular expressions can be found here. In my case, all the file names to delete contained a text string followed by this ".xxxx" pattern, where x is a number (i.e. Iamafiletodelete.1234.csv). Therefore, my regex pattern looked like this: ".[0-9]" (but see note below)*.

Then we can simply point R at the appropriate folder using dir(), identify the list of offending files, and delete them using file.remove(). Note that this has the potential to go horribly wrong if you aren’t careful! Make sure that you check very carefully that the pattern recognition selects only those files that you want to delete before you delete anything! This will result in a permanent delete (i.e. no rescuing things back from the recycle bin) and cannot be undone!

files.to.delete <- dir("C:/the folder I want to delete from/",pattern=".[0-9]",recursive=T,full.names=T)
file.remove(files.to.delete)

You can also see if the files exist either before or after you delete them as a useful check to make sure it worked.
file.exists(files.to.delete)

Go forth and delete things but use at your own peril!

*As Patrick pointed out in the comments, the way that I have written the regex pattern technically isn’t correct. While it worked, it could also have gone horribly wrong! A good example of why checking the selected file strings before you actually delete the files is a very good idea. Patrick’s suggestion for the correct pattern is “\.\d{4}\.". He also points out that you can test regex code at Rubular, which seems like a very good idea!

Related posts 
Advertisements

3 thoughts on “Remotely deleting files from R

  1. Wow! You know regex – this is awesome… if only I knew as much as you do about all things conservation biologyish… well.. I’d have to get a bigger head to fit my brain in.
    Now… that said, regex is a _very_ powerful language to learn.. and I think (not that I know R’s exact regex ) that it *might* be able to be improved on. If you wanted to match 4 numbers exactly (which would be a safer delete) then:
    “\.\d{4}\.”
    would be a safer pattern. In fact, in your regex above.. the “.” in your “.[0-9]” .. actually matches *any* character (not a period). you need to escape it (hence the slashes in my one). As I said, I’ve not used R, so your escaping may be different.
    BUT!!! Really really _really_ great work, regex is a _very_ useful thing to get better at. I’m still learning. Some tools I use:
    http://rubular.com (AND! you can permalink regex’s…. e.g. check out http://rubular.com/r/gJMqsmhvBw – a link to the one I was testing)
    Slightly geekier, but very well presented…
    http://www.confreaks.com/videos/2678-gogaruco2013-beneath-the-surface-regular-expressions-in-ruby it shows how regex works beneath the hood. This is in the language Ruby, but it’ll be the same for other things.
    And finally … a game to get better at regex!
    http://regex.alf.nu/
    Ok… I’m off now… but I was just so very impressed I had to comment!

    • Thanks Patrick! Luckily my pattern worked for what I was trying to do and I didn’t delete anything important (at least that I know of!). This was my first foray into refectory, so I have a LOT to learn. Thanks for the links – I’ll have to start playing Regex Golf!

  2. Pingback: Copying files with R | Amy Whitehead's Research

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s