Introduction To R Programming

Getting started with the R programming language and using RStudio

For someone like me, who has only had some programming experience in Python, the syntax of R feels alienating initially. However, I believe it’s just a matter of time before adapting to the unique logicality of a new language. And indeed, the grammar of R flows more naturally to me after having to practice for a while, and I began to grasp its kind of remarkable beauty, that has captivated the heart of countless statisticians throughout the years.

If you don’t know what R is, it’s basically a programming language created for statistician by statistician. Hence, it easily becomes one of the most fluid and powerful tools in the field of Data Science.

Here I’d like to walk through my study notes with the most explicit step-by-step directions to introduce you to the world of R.

Why Learn R for Data Science?

Before diving in, you might want to know why should you learn R for Data Science. There are two major reasons:

Powerful Analytic Packages for Data Science

Firstly, R has an extremely vast package ecosystem. It provides robust tools to master all the core skill sets of Data Science, from data manipulation, data visualization, to machine learning. The vivid community keeps the R language’s functionalities growing and improving.

High Industry Popularity and Demand

With its great analytical power, R is becoming the lingua franca for data science. It is widely used in the industry and is in heavy use at several of the best companies who are hiring Data Scientists including Google and Facebook. It is one of the highly sought after skills for a Data Science job.

Quickstart Installation Guide

To start programming with R on your computer, you need two things: R and RStudio.

Install R Language

You have to first install the R language itself to your computer (It doesn’t come by default). To download R, go to CRAN, https://cloud.r-project.org/ (the comprehensive R archive network). Choose your system and select the latest version to install.

Install RStudio

You also need a hefty tool to write and compile R codes. And RStudio is the most robust and popular IDE (integrated development environment) for R programming. It is available on http://www.rstudio.com/download. (open source and for free!)

Overview of RStudio

Author

Cecilia Lee is a junior data scientist based in Hong Kong

Cecilia LeeGithub

Now you have everything ready. Let’s have a brief overview at RStudio. Fire up RStudio, the interface looks as such:

 

Go to File > New File > R Script to open a new script file. You’ll see a new section appear at the top left side of your interface. A typical RStudio workspace composes of the 4 panels you’re seeing right now:

 

RStudio Interface

Here’s a brief explanation of the use of the 4 panels in the RStudio interface:

Script
This is where your main R script located.

Console
This area shows the output of code you run from script. You can also directly write codes in the console.

Environment
This space displays the set of external elements added, including dataset, variables, vectors, functions etc.

Output
This space displays the graphs created during exploratory data analysis. You can also seek help with embedded R’s documentation here.

Running R Codes

After knowing your IDE, the first thing you want to do is to write some codes.

Using the Console Panel

You can use the console panel directly to write your codes. Hit Enter, the output of your codes will be returned and displayed immediately after. However, codes entered in the console cannot be traced later. (i.e. you can’t save your codes) This is where script comes to use. But console is good for quick experiment before formatting your codes in script.

 

Using the Script Panel

To write proper R codes, you start with a new script by going to File > New File > R Script, or hit Shift + Ctrl + N. You can then write your codes in the script panel. Select the line(s) to run and press Ctrl + Enter. The output will be shown in the console section beneath. You can also click on little Run button located at the top right corner of this panel. Codes written in script can be saved for later review (File > Save or Ctrl + S).

 

Basics of R Programming

Finally, with all the set-ups, you can write your first piece of R script. The following paragraphs introduce you to the basics of R programming.

A quick tip before going: all lines after the symbol # will be treated as a comment and will not be rendered in the output.

Arithmetics

Let’s start with some basic arithmetics. You can do some simple calculations with the arithmetic operators:

OperatorFunction
+Addition
-Subtraction
*Multiplication
/Division
^Exponentiation
%%Modulo
%/%Integer Division

Addition +, subtraction -, multiplication *, division / should be intuitive.

# Addition
1 + 1
#[1] 2

# Subtraction
2 - 2
#[1] 0

# Multiplication
3 * 2
#[1] 6

# Division
4 / 2
#[1] 2

The exponentiation operator ^ raises the number to its left to the power of the number to its right: for example 3 ^ 2 is 9.

# Exponentiation
2 ^ 4
#[1] 16

The modulo operator %% returns the remainder of the division of the number to the left by the number on its right, for example 5 modulo 3 or 5 %% 3 is 2.

# Modulo
5 %% 2
#[1] 1

Lastly, the integer division operator %/% returns the maximum times the number on the left can be divided by the number on its right, the fractional part is discarded, for example, 9 %/% 4 is 2.

# Integer division
5 %/% 2
#[1] 2

You can also add brackets () to change the order of operation. Order of operations is the same as in mathematics (from highest to lowest precedence):

  • Brackets
  • Exponentiation
  • Division
  • Multiplication
  • Addition
  • Subtraction
# Brackets
(3 + 5) * 2
#[1] 16

Variable Assignment

A basic concept in (statistical) programming is called a variable.

A variable allows you to store a value (e.g. 4) or an object (e.g. a function description) in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable.

Create New Variables

Create a new object with the assignment operator <-. All R statements where you create objects and assignment statements have the same form: object_name <- value.

num_var <- 10

chr_var <- "Ten"

To access the value of the variable, simply type the name of the variable in the console.

num_var
#[1] 10

chr_var
#[1] "Ten"

You can access the value of the variable anywhere you call it in the R script, and perform further operations on them.

first_var <- 1
second_var <- 2

first_var + second_var
#[1] 3

sum_var <- first_var + second_var
sum_var
#[1] 3

Naming Variables

Not all kinds of names are accepted in R. Variable names must start with a letter, and can only contain lettersnumbers. and _. Also, bear in mind that R is case-sensitive, i.e. Cat would not be identical to cat.

Your object names should be descriptive, so you’ll need a convention for multiple words. It is recommended to snake_case where you separate lowercase words with _.

i_use_snake_case
otherPeopleUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention

Assignment Operators

If you’ve been programming in other languages before, you’ll notice that the assignment operator in R is quite strange as it uses <- instead of the commonly used equal sign = to assign objects.