Data Types and Objects in R

Data are the most basic ingredients used in "data analysis". R supports a wide variety of data types including scalars, vectors, matrices, data frames, and lists. In this tutorial, we will go over some commonly used data types and briefly cover the idea of "Object" in the end.

Scalars


In computer programming, scalar refers to an atomic quantity that can hold only one value at a time. Scalars are the most basic data types that can be used to construct more complex ones. Let's take a look of some common types of scalars with simple R commands.

Number

> x <- 1
> y <- 2.5
> class(x)
[1] "numeric"
> class(y)
[1] "numeric"
> class(x+y)
[1] "numeric"

Logical value

> m <- x > y      # Is x larger than y?
> n <- x < y      # Is x smaller than y?
> m
[1] FALSE
> n
[1] TRUE
> class(m)
[1] "logical"
> class(NA)       # NA is another logical value: 'Not Available'/Missing Values
[1] "logical"

Here are some logical operators you may want to try.

> m & n           # AND
[1] FALSE
> m | n           # OR
[1] TRUE
> !m              # Negation
[1] TRUE

Character(string)

> a <- "1"; b <- "2.5"       # Are they different from x and y we used earlier?
> a;b
[1] "1"
[1] "2.5"
> a+b                        # a+b=3.5?
Error in a + b : non-numeric argument to binary operator
> class(a)
[1] "character"
> class(as.numeric(a))       # but you can coerce this character into a number
[1] "numeric"
> class(as.character(x))     # vice resa

[1] "character"

Vector


A vector is a sequence of data elements of the same basic type.

> o <- c(1,2,5.3,6,-2,4)                             # Numeric vector
> p <- c("one","two","three","four","five","six")    # Character vector
> q <- c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE)            # Logical vector
> o;p;q
[1]  1.0  2.0  5.3  6.0 -2.0  4.0
[1] "one"   "two"   "three" "four"  "five"  "six"
[1]  TRUE  TRUE FALSE  TRUE FALSE

We talked about component extraction briefly in our first tutorial. Here are some other fun ways of doing that.

> o[q]                                               # Logical vector can be used to extract vector components
[1] 1 2 6 4
> names(o) <- p                                      # Give each component a name
> o
  one   two three  four  five   six
  1.0   2.0   5.3   6.0  -2.0   4.0
> o["three"]                                         # Extract your components by "calling" their names
three
  5.3

Matrix


A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. Same as vector, the components in a matrix must be of the same basic type. The following is an example of a matrix with 4 rows and 3 columns. 

> t <- matrix(
+     1:12,                 # the data components (Don't type "+"!)
+     nrow=4,               # number of rows
+     ncol=3,               # number of columns
+     byrow = FALSE)        # fill matrix by columns
> t                         # print the matrix
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

Similar to vectors, matrices also use [] to reference elements.

> t[2,3]                    # component at 2nd row and 3rd column
[1] 10
> t[,3]                     # 3rd column of matrix
[1]  9 10 11 12
> t[4,]                     # 4th row of matrix
[1]  4  8 12
> t[2:4,1:3]                # rows 2,3,4 of columns 1,2,3
     [,1] [,2] [,3]
[1,]    2    6   10
[2,]    3    7   11
[3,]    4    8   12

Data Frame


A data frame is more general than a matrix, in that different columns can have different basic data types. Data frame is the most common data type we are going to use in this class.

> d <- c(1,2,3,4)
> e <- c("red", "white", "red", NA)
> f <- c(TRUE,TRUE,TRUE,FALSE)
> mydata <- data.frame(d,e,f)
> names(mydata) <- c("ID","Color","Passed")      # variable names
> mydata
  ID Color Passed
1  1   red   TRUE
2  2 white   TRUE
3  3   red   TRUE
4  4  <NA>  FALSE

Extracting components from data frames is somehow similar to what we did for matrices, but after assigning names to each column (variable), it becomes more flexible.

> mydata$ID                       # try mydata["ID"] or mydata[1]
[1] 1 2 3 4
> mydata$ID[3]                    # try mydata[3,"ID"] or mydata[3,1]
[1] 3
> mydata[1:2,]                    # first two records
  ID Color Passed
1  1   red   TRUE
2  2 white   TRUE

List


A list is a generic vector containing other objects. There is no restriction on data types or length of the components. Usually, we work with lists that have named components.

> l <-list(vec=p, mat=t, fra=mydata, count=3)                   # a list with a vector, a matrix, a data frame defined earlier and a scalar
> l
$vec
[1] "one"   "two"   "three" "four"  "five"  "six" 

$mat
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

$fra
  ID Color Passed
1  1   red   TRUE
2  2 white   TRUE
3  3   red   TRUE
4  4  <NA>  FALSE

$count
[1] 3
> l$vec                                                         # extract components from list
[1] "one"   "two"   "three" "four"  "five"  "six" 
> l$mat[2,3]
[1] 10
> l$fra$Color
[1] red   white red   <NA>
Levels: red white

Object


In R, all types of data are treated as objects. However, objects are not simply collections of data. They are particular instances (instantiations) of particular classes. Operations, or functions, are defined for specific classes. Let's try working on something such as a point pattern. 

# This time I will not show R outputs with codes. Just type or paste these lines into R and see what you get.
x <- rnorm(50, 10, 3)                 # creates 50 random x values from a normal distribution
y <- rnorm(50, 10, 4)                 # creates 50 random y values
mypoints <- as.data.frame(cbind(x,y)) # makes a data frame
class(mypoints)
mypoints
summary(mypoints)
plot(mypoints)                        # Gee, it looks like a point pattern...
box <- bbox(mypoints)                 # Type in library(splancs) first. Bounding Box - did this work? Why not?

It seems that most functions above work well with this data frame but "bbox" does not. See help(bbox). It didn't work because "bbox" doesn't work on objects of class data.frame. "bbox" operates on objects of class points (or a matrix of x and y values). Therefore you need to change the class accordingly. The following four approaches all work (try each one separately): 

 box <- bbox(cbind(x,y))
 box <- bbox(as.matrix(mypoints))
 box <- bbox(as.points(x,y))
 box <- bbox(as.points(mypoints))

← Return        ↑ Top        → Next

Updated 9/12/2013

Written by Xue Li 09/2013
References:
   
Quick-R: http://www.statmethods.net/input/datatypes.html
    R Tutorial: http://www.r-tutor.com/r-introduction
    Dr Ashton Shortridge's old course material "Understanding More Complex R Objects"