class: center, middle, inverse, title-slide # Intro to R --- class: center, middle # Introduction to R Adam Rawles --- ## Intro to R Course -- 1. Introduction to R (Part 1) -- + What is R? + What is RStudio? + Packages + Variables and data types -- 2. Data Structures and Functions + Data structures + Vectors + Lists + Dataframes + Functions + What is a function? + How to use a function --- ## Intro to R Course (Part 2) 3. Data Analysis + Workflows + `%>%` and the Tidyverse + Loading and cleaning data -- 4. Data Analysis Pt. 2 + Summarising data + Plotting --- ## Additional resources + [teacheR](https://teacher.arawles.co.uk) + [These presentations](https://arawles.github.io/teacheR-extras/) --- ## Overview - What is R? -- - What is RStudio? -- - Packages -- - Variables -- - Data types --- ## Learning Objectives - Understand the basics of R and RStudio -- - Know how to assign a variable in R -- - Know the difference between the data types in R -- - Know where/how to look things up! --- ## What is R? - Background -- + Statistical programming language -- + A number of applications: -- + Data analysis + Machine learning + Data-driven apps + Reporting/Presentation + Academic research -- + Supported with many packages -- - Interface -- + R uses a command-line interface -- + But there are many GUIs available (such as RStudio) --- ## RStudio - RStudio is one of the most popular R environments -- - RStudio has 4 main panes: --- ## RStudio - Source -- + This is where your scripts (pre-written lines of codes) are displayed -- + If you click on a data structure in the Environment pane, a preview will be displayed in the source window -- - Console -- + This is where the commands are actually executed -- + When you run a script from the source, it is passed to the console line by line --- ## RStudio - Environment -- + This pane displays any variables or data structures you've created/assigned -- + This pane will also display any custom functions you've created, which we'll move onto in later modules -- - History/Files/Plots/Help -- + This pane can display a number of things -- + The most useful of which are the Files tab, which displays all the files in the folder you're working in and the Plots tab, which will show any plots that you've created -- **Note**: You can move which pane shows which windows via the Tools -> Global Options dialog --- ## Packages - R comes with a lot of functionality out of the box -- - However, sometimes you need a specific set of tools for a particular job -- - People create and distribute these tools as packages -- - Good examples of packages include [`dplyr`](https://dplyr.tidyverse.org/), which is used in data analysis (and we'll be using in later sessions), [`ggplot2`](https://ggplot2.tidyverse.org/), which is used for making plots -- - We install these packages with the `install.packages()` function, which pulls from a central repository called `CRAN` --- ## Exercise -- - Open RStudio - Create an R Script called my_script.R - Add `# My first comment` at the top of the file - Save it anywhere - Write `print("hello world")` in the script and run it --- ## Variable assignment - Assigning variables allows you store values for later use -- ```r variable_1 <- 5 variable_2 <- 10 variable_3 <- variable_2/variable_1 variable_3 ``` ``` ## [1] 2 ``` -- - For variable assignment, the `<-` and `=` operators can be used interchangeably - But it's better to use `<-` -- - **Note**: the value of a variable is not printed when it is assigned (i.e. when we assign `5` to `variable_1`, we don't get any output in the console) --- ## Variable assigment - Variables can also be modified after they've been assigned -- ```r variable_1 <- 1 variable_1 ``` ``` ## [1] 1 ``` ```r variable_1 <- "hello world" variable_1 ``` ``` ## [1] "hello world" ``` --- ## Exercise - Create two variables - Try and create a variable called `1test` - what happens? --- ## Variable names -- - Variable naming is pretty lenient, but there are some reserved words (e.g. `break`, `if`, `else`, etc.) -- - Two main rules: + Names must start with a character (including a `.`) + Nmes can only contain letters, numbers, underscores and dots -- - There are a couple of different naming conventions: -- + Camel case + myVariable -- + `.`/`_` + `my.variable`/`my_variable` -- + Choose your own, but be consistent! --- ## Data types - Data comes in lots of different forms -- - Is "True" equal to TRUE? -- - The main data types are: -- + logical; TRUE, FALSE -- + double; 12.5, 1, 999 -- + integer; 2L, 34L, 1294L -- + character; "hello world", "True" -- + dates; 2019-06-01 -- + datetimes; 2019-06-01 12:00:00 --- ## Data types - Assigning the right data type is important, as it determines how the data is stored and how data can be manipulated -- ```r variable_1 <- 5 variable_2 <- "5" variable_1 + variable_2 ``` ``` ## Error in variable_1 + variable_2: non-numeric argument to binary operator ``` --- ## Data types - R will automatically decide the data type when you assign a variable, but you can force R to store the value as a different data type using the as.xxxxx functions -- ```r variable_1 <- as.numeric("5") variable_2 <- as.integer("5") variable_1 + variable_2 ``` ``` ## [1] 10 ``` -- - But the value has to be coercible in the first place! -- ```r variable_1 <- as.numeric("hello world") ``` ``` ## Warning: NAs introduced by coercion ``` ```r variable_1 ``` ``` ## [1] NA ``` --- ## Data types - Factors - There is another data type which is fairly common in R called a factor -- - A factor is made up of one or more levels that represent some form of grouping -- - For example, if we had a dataframe containing data for different countries, we might store "Country" as a factor, with the levels UK, France, USA, etc. --- ## Exercise - Data Types * Try the `as.numeric()`, `as.integer()`, `as.character()` and `as.logical()` functions * If you have time, try using the `as.Date()` function (HARD) --- ## Recap - R is a statistical programming language* - RStudio is a development environment that makes programming in R easier* - R is supported with many more functions via packages* - Variables can be assigned using `<-`* - There is a limit to what you can name your variable - Data is stored in R in lots of different forms