Using R to Analyze and Visualize Actual data

This tutorial details how users can further analyze and visualize tracking and scoring data output from Actual.

This is performed off-line (i.e not via the browser) using R - the de-facto standard for statistical computing. R is FREE to download and use under the GNU Public Licence. A fuller introduction to can be found here: http://cran.r-project.org/doc/manuals/R-intro.html.

The assumption of this tutorial is that the reader has relatively little (or even no) experience with R. Or that they might not be overly familiar with some of the analysis packages it provides. We show here the exciting possibilities of combining R with the wealth of data generated by using Actual's products in order to better analyse and enhance the understanding of experimental results.

The reader must also have some data to work with - generated by using the ActualTrack product to produce tracking data for the trials within the experiment.

Getting Started:

First, an installed version of R is required.

In the examples below we used the latest R version 1.35, installed along with version 2.12.0 of its User Interface.
For Mac OS 10.6.5 we downloaded and installed from here: http://cran.r-project.org/bin/macosx/. Simply select the "R-2.12.0.pkg" (or other latest version) file, and follow the normal drag-and-drop procedure for installation. Start R on Mac by clicking on the resulting application icon...

R Logo

An alternative, Windows download of R, is also available here: http://cran.r-project.org/bin/windows/base/. Click on "Download R 2.12.0 for Windows" and run the resulting executable, which will walk you through the installation process (the default options will be fine). Start R on Windows by double clicking on the resulting desktop icon.

R Windows Icon

Once installed, start up R so that the console window appears (Figure 1 below for both Mac and Windows). This is where we will type in the necessary commands.

RGui

RGui mac

Figure 1: The R console on Windows and MacOS.

Reading in the Data:

We are now ready to start. But first we must download and unzip the CSV files containing the data for the experiment we want to load and analyse. This is achieved by going to the "Reports" tab of the online Actual Analytics application and clicking on the "CSV" action link. Note that each trial to be included must have been tracked using a ActualTrack subscription for the entirety of the trial. This can be confirmed by looking at the "Locomotor_Results.csv" file in the unzipped experiment folder - which will indicate which trials have been tracked, by values greater than zero (i.e. distances).

The unzipped folder for the experiment will also include the individual trial "_Data.csv" files which contain the raw data to be loaded.
In the R console, it is simplest to first navigate to the unzipped directory containing all the data *.csv files. Type in the R console:

> dir()

to see the current directories in the working directory on the computers file system. NOTE seeing "character(0)" indicates an empty directory.
Use the command:

> setwd("Data")

to then change to the directory called "Data" (or whatever the path is to your unzipped experiment). If you want to change "up" a directory on the file-system type:

> setwd("..")

You can confirm your current working directory with:

> getwd()

Having then navigated to the unzipped folder containing your experiment data, it is then possible to load one of the data files into R using:

> trial <- read.csv(file="Trial_1_Data.csv",head=TRUE)

This line is creating an R object called "trial" which we assign with the operator <- and which contains the data read in from the CSV file "Trial_1_Data.csv".The head=TRUE option means that the column headings become property names of the object, as we can see if we then query the object with:

> names(trial)
[1] "Trial.Time..sec." "BodyCenter_X..cm."
[3] "BodyCenter_Y..cm." "NosePoint_X..cm."
[5] "NosePoint_Y..cm." "TailBase_X..cm."
[7] "TailBase_Y..cm." "BodyCenter_HEADING..radians."
[9] "NosePoint_HEADING..radians." "TailBase_HEADING..radians."
[11] "Rodent_ORIENTATION..radians."

As we can see, they are all the same column names as in the CSV file - but now the data has been loaded into R with the same names. NOTE: this data was generated for a Rodent experiment (which enables the NosePoint and TailBase locations) - other experiments that have tracking may only have the BodyCenter attributes. Furthermore, individual behaviors as defined by the user in ActualScore will appear here with the names they were given.

Reorganizing the Data:

We can then quickly access our data and assign it to another object like so:

> times <- trial$Trial.Time..sec

Now there is a separate object for the time data. By typing:

> length(times)
[1] 6000

to see that times has 6000 values (correct for a 10 minute video at 100 milliseconds per sample point). We can also combine co-ordinate values into a single matrix object (for easier computation):

> body <- matrix(c(trial$BodyCenter_X..cm,-trial$BodyCenter_Y..cm),length(times),2)

where we combine (with the c command) our data into a new matrix of the same length of the number of times values (6000) by 2. Notice that we invert the y coordinate for correctness, since all ActualTrack results are based on a imaging coordinate system that locates the origin in the top left corner.

We can then check its size as:

> dim(body)
[1] 6000 2

and we can access the values in the matrix very simply with:

> body[1,]
[1] 17 92

for the first row of data (i.e. the first x and y coordinate). If you also type:

> summary(body)
V1 V2
Min. :-7.00 Min. : 5.00
1st Qu.:13.00 1st Qu.:26.00
Median :20.00 Median :57.00
Mean :27.66 Mean :57.29
3rd Qu.:48.00 3rd Qu.:91.00
Max. :72.00 Max. :97.00

You get a summary of the two columns of values (V1 & V2) contained in the matrix. If you also type just:

> body
[,1] [,2]
[1,] 17.00 91.00
[2,] 17.00 92.00
[3,] 17.00 92.00
[4,] 17.00 91.00
...
[5998,] 50.00 6.00
[5999,] 50.00 6.00
[6000,] 50.00 6.00

you will get a print out of all the values as well.

Analyzing the Data:

As an example of what we can then do with the data, we can work out how much of the total time the animal instantaneously "freezes". We define this as moments when it's speed is zero. First we must calculate instantaneous speed, for which we need to derive distance.

We can calculate all the distances between all the points using the the result of the distance function (dist) converted into a matrix (as.matrix). Note that this command can take a while to run:

> everydist <- as.matrix(dist(body))

Now, for example, to find the distance between the 50th and 2300th point it is only necessary to do:

> everydist[50,2300]
[1] 68.26419

We want a matrix of all the distances from one point to the next. This is simply the second diagonal line in the distance matrix (at [1,2],[2,3],[3,4], etc.). To retrieve this we first shift all the values up 1 row:

> shiftup <- everydist[2:dim(everydist)[2],]

This simply selected rows from 2 to (:) the second dimension (dim(everydist)[2]). We then access the diagonal, via the diag function, and convert the result to a matrix:

> distances <- matrix(diag(shiftup))

We also require to calculate the differences between each time stamp. This is achieved by simply subtracting all the values from first to the second last value (1:5999) from the second value to the end (2:6000). Again we also convert the result to a matrix:

> timesteps <- matrix(times[2:6000] - times[1:5999]))

Notice that because in both the distances (because we shifted up one row) and with the timestep (because we substract neighbouring values) we end up with two matrices of the same length (5999) slightly short of the "full" number of timesteps.

Now we are able to calculate instantaneous speeds by dividing each distance over each time step:

> speeds <- distances/timesteps

It is then possible to select only those values equal to zero (or any other value) with:

> stopped <- speeds==0.0

This return a list of TRUE or FALSE values for every speed value, at every time step. We can query the number of TRUE values with:

> sum(stopped)
[1] 2969

And the number of FALSE values (using the negation "not" operator "!" ) with:

> sum(!stopped)
[1] 3030

Finally, if we sum up a count of the number of non-zero values, versus the number of sample points, we can express the percentage of total time the speed is zero:

> sum(stopped)/length(times)*100.0
[1] 49.48333

Further Notes:

All the commands listed above can be combined into a single script and executed directly with one call. There are a number of ways to do this in R - but the simplest is to have the script file located in the same directory as the data, and then call the command:

> source("your_script_file.R")

This will then execute the commands in "your_script_file.R" in turn (as if you were typing them into the console).
It is useful to remember that objects (such as bodyx, bodyy, times, dists) you create in R interactive on the console persist until the are overwritten or deleted. You can always type:

> ls()
[1] "trial" "bodyx" "bodyy"

To see the object current existing. You can remove all objects (to start afresh) by typing:

> rm(list=ls())

More extensive help on all the commands we have described above is always available from the R console by prefixing the command with a question mark, e.g.:

> ?setwd