R, R Studio and R Markdown are very powerful and can do a lot of things. More than we could even cover in a single semester, so we’re not. We’re going to take it slowly and introduce new features over time. By the end of the semester your documents will be near professional looking. This is how you will submit assignments, so you will get lots of practice.
At this point it is expected that you have followed the class_setup assignment which walks you through downloading & installing R, R Studio, and \(\LaTeX\), getting your class folder setup and downloading your homework 1 assignment.
The basis of programming is that we write down instructions for the computer to follow, and then we tell the computer to follow those instructions.
We write, or code, instructions in R because it is a common language that both the computer and we can understand.
We call the instructions commands and we tell the computer to follow the instructions by executing (also called running) those commands.
The console pane is the place where commands written in the R language can be typed and executed immediately by the computer. It is also where the results will be shown for commands that have been executed.
You can type commands directly into the console and press Enter
to execute those commands, but they will be forgotten when you close the session.
In these notes, code is displayed like this:
2+2
## [1] 4
where the output or result of the code is displayed with two pound signs (##
)
Type the following in the console and hit Enter
to run these commands one at a time.
1+1
4-3
3*7
8/3
2^3
^2 pi
There are many built in functions in R such as log
or exp
. Using these functions in R is not much different than using them on your calculator. The function log
has two arguments, one is required and one is set to a default, \(log_{b}(x)\) where b is the base. The default is b which is set to exp(1)
(i.e. the natural logarithm).
Below are some examples of built in functions in R. Run them from your console.
exp(2)
log(8)
log(8,base=2)
Now lets try a more complicated equation.
2 + 5*(8^3)- 3*log10)
## Error: <text>:1:21: unexpected ')'
## 1: 2 + 5*(8^3)- 3*log10)
## ^
Uh oh, we got an Error. Nothing to worry about, errors happen all the time. Put a open parenthesis (
before log10
to fix it and try again.
In the console type the following code, then press Enter
.
2 + 5*(8^3)- 3*log(10
Notice the console shows a +
prompt. This means that you haven’t finished entering a complete command.
This is because you have not ‘closed’ a parenthesis or quotation, i.e. you don’t have the same number of left-parentheses as right-parentheses, or the same number of opening and closing quotation marks.
When this happens, and you thought you finished typing your command, click inside the console window and press Esc
; this will cancel the incomplete command and return you to the >
prompt.
Most of the time we will want to save our results. To do so we use the assignment operator <-
notation. This lets us save a value into an object, which we can then use later similar to a variable \(x\) in algebra.
<- 62 height
height
object in your Global Environment (top right panel)<- 62) (height
## [1] 62
height
## [1] 62
Naming Conventions Be creative, yet informative with your variable names. You will be writing code a lot this semester, and you want your code to be unique to you and not look like a carbon copy of your neighbor. While pants
and nifty
are valid variable names, they may not be the best to describe the results from a die roll.
Run each of the following commands in your console one at a time. Print out the value of height
after each time. What happens?
<- height + 2)
(height <- 3 * height) (height
To do finite integration you first define a function:
<- function(x){x+5} myfun
Then pass it to the integrate
function.
<- integrate(myfun, lower=0, upper=3)) (myint
## 19.5 with absolute error < 2.2e-13
The result of this integration can be accessed using $value
$value myint
## [1] 19.5
Ideally, such analysis reports are reproducible documents: If an error is discovered, or if some additional subjects are added to the data, you can just re-compile the report and get the new or corrected results rather than having to reconstruct figures, paste them into a Word document, and hand-edit various detailed results.
This process is known as literate programming
**text**
_text_
#text
##text
test.Rmd
.$
so $x^{2}
resolves as \(x^{2}\).$$
, and for readability put a blank line before and after your equation in your Markdown document, and put the $$
each on their own line. Example:$$
k_{n+1} = n^2 + k_n^2 - k_{n-1}
$$
resolves as
\[ k_{n+1} = n^2 + k_n^2 - k_{n-1} \]
^
_
\alpha, \Alpha, \beta, \Beta, \gamma, \Gamma
sum
(\(\sum\)), \sum_{i=1}^{10} t_i
(\(\sum_{i=1}^{10} t_i\)), \int
(\(\int\)), \int_0^\infty
(\(\int_0^\infty\))More help writing math in \(\LaTeX\): https://en.m.wikibooks.org/wiki/LaTeX/Mathematics
RStudio has a nice visual editor to help you see what your compiled document will look like, and to help you with making your work more nicely formatted. See here for more information.. This also has help for technical writing such as LaTeX
Follow the link above and switch to visual editor mode for your new Rmarkdown test document.
Then type out the Pythagorean theorem, and knit to PDF to make sure it looks right.
A vector is a list of values in order to be able to work with them. For us they will usually represent data collected on a characteristic of the population. In general, we want to give the vector a name so that we can call it later when needed.
<- c(2,3,5,7,11)
primes primes
## [1] 2 3 5 7 11
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
<- seq(1, 10, by=2)) (odds
## [1] 1 3 5 7 9
rep
function to repeat a sequence of numbers in varying patterns. Read ?rep
for a text explanation of the differences.rep(c(2,3), times=c(4,3))
## [1] 2 2 2 2 3 3 3
rep(c(2,3), each=2)
## [1] 2 2 3 3
rep(c(2, 3), length.out = 3)
## [1] 2 3 2
rep(c("Bryan", "Darrin"), each = 2)
## [1] "Bryan" "Bryan" "Darrin" "Darrin"
We can index vectors to pull off values at a particular position on a vector.
Returns first and second number of the vector we called “primes” above
1] primes[
## [1] 2
2] primes[
## [1] 3
Returns first three numbers of a vector
1:3] primes[
## [1] 2 3 5
We can change a vector to TRUE and FALSE by writing a logical statement.
>6 primes
## [1] FALSE FALSE FALSE TRUE TRUE
This is super useful to do things like identify the numbers in primes that are greater than 6
>6] primes[primes
## [1] 7 11
And count the number of primes greater than 6
sum(primes>6)
## [1] 2
It may not seem useful to you now, but these are fundamental to calculating probabilities in the next chapter.
Often we are going to want to compare elements of a vector to a value, or perhaps another vector. The standard comparison operators that will return either TRUE
or FALSE
are =
, !=
, >
, <
, >=
, and <=
.
== 4] primes[primes
## numeric(0)
%in%
operator.4 %in% primes
## [1] FALSE
odds
, on whether or not it is also an element of primes
.%in% primes odds
## [1] FALSE TRUE TRUE TRUE FALSE
What’s cool about TRUE and FALSE, is that TRUE resolves as 1 and FALSE resolves as 0 when doing arithmetic.
sum(odds %in% primes)
## [1] 3
This can be very useful if we want to count the number of elements in a vector that meet a certain criteria.
Often we will want to find out if two events are true at the same time, or if at least one of them is true. Using parenthesis to help keep our statements organized, we can ask multiple logical statements at the same time using either AND &
or OR |
9 %in% odds) & (9 %in% primes) (
## [1] FALSE
The and &
results in a TRUE only if both values are TRUE. Here, 9 is an odd, but not a prime. So (9 %in% odds)
is TRUE and (9 %in% primes)
is FALSE. The combined statement “TRUE & FALSE” resolves as FALSE.
Is 9 an odd or a prime?
9 %in% odds) | (9 %in% primes) (
## [1] TRUE
The or “|” results in a TRUE if either value is true. The combined statement “TRUE OR FALSE” is TRUE.
All data in R has a data type, and certain functions only work on certain data types. Here are common ones you will see in this class.
int
integernum
numberchr
character. aka string, aka textlogi
logical. Can only be TRUE
or FALSE
Which data types do you think you could take the mean()
of?
This is a quick reference. Chapter 2 goes into more detail
Draw 10 samples from the numbers 1,2 or 3 with replacement.
sample(c(1,2,3), 10, replace=TRUE)
## [1] 1 1 3 2 1 1 2 2 3 2
Conduct an experiment multiple times. Only the object last referenced will be saved out. E.g., x
is not retained, only the value of mean(x)
.
replicate(5, {
<- sample(c(1,2,3), 10, replace=TRUE)
x mean(x)
})
## [1] 2.2 2.0 2.4 1.6 2.2
For starters we are going to stick with the simple plotting method in R, namely the plot
function. Short and simple, the arguments are plot(x, y)
<- -10:10
x <- x^2
y plot(x, y)
The default is to just plot the points, but sometimes we may want to connect those points with a line (l
is a lower case L). See ?plot
for more plotting types.
plot(x, y, type='l')
Create a plot of \(y = log(x+1)\) where \(x\) is a sequence of non-negative numbers from \(a\) to \(b\) and YOU get to choose \(a\) and \(b\).
<- seq(from=0, to=10, by=.01)
x <- log(x + 1)
y plot(x, y)
Frequency table
<- sample(1:10, 1000, replace=TRUE) # generate fake data
get.numbers table(get.numbers) # create the table
## get.numbers
## 1 2 3 4 5 6 7 8 9 10
## 89 104 86 100 93 103 108 102 109 106
Proportions
proportions(table(get.numbers))
## get.numbers
## 1 2 3 4 5 6 7 8 9 10
## 0.089 0.104 0.086 0.100 0.093 0.103 0.108 0.102 0.109 0.106
Plot the table of proportions.
plot(proportions(table(get.numbers)))
We can plot functions directly, or create histograms and density curves from simulated values.
<- seq(0,1, by=0.01) # create values in the domain
x <- 4*x^{3} # pdf
y plot(x,y, type = 'l') # the lower case 'l' draws a line
<- rnorm(1000) # draw 1000 values from a standard normal distribution
x hist(x, nclass=30) # create a frequency histogram with 30 bins
prob=TRUE
to change the y axis to a density so it’s on the same scale as the curve.hist(x, nclass=30, prob=TRUE)
curve(dnorm(x), add=TRUE, col="red") # note, this always stays x