Data Analysis Using R
Data Analysis Using R
This page briefly introduces with R and guides you through basic to advanced level of data analysis and visualizations. It covers data set creation, manipulation, saving, and opening. Implementation of descriptive statistics (e.g. mean, standard deviation, standard error, frequency and percentage distribution) and inferential statistics (e.g. chi-square test, independent two sample t-test, paired t-test, Welch test, Wilcoxon test, one-way ANOVA, two-way ANOVA, three-way ANOVA, post-hoc analysis with Tukey and Bonferroni adjustment, Kruskal Wallis test, Pearson and Spearman correlation, and multiple regression with detecting and solving multicollinearity, heteroskedasticity and endogeneity problems). Along with these analysis, related visualization techniques are explained in this manual. It is strongly advised to practice the R codes mentioned in this manual while reading it.
Installation of R and RStudio
R is usually known as an open source data analysis tool though it can do much more than just data analysis, such as website design, article writing, thesis preparation, book compilation, and web applet creation. R is a programming language that uses our personal computer resources to translate R programming language into useful outputs. Installation of R is very easy. However, the functionality of R is popularly enhanced by RStudio. Therefore, you need to download and install both R and RStudio for completing this manual smoothly. You can download R and R studio from https://posit.co/download/rstudio-desktop/. It should be noted that RStudio is a software that enables us to write and execute R script. You can also integrate R with Jupyter Notebook in visual Studio Code. After writing the codes and markdown in Jupyter Notebook, export as HTML, open in RStudio, publish in RPubs. Alternatively, you can convert .ipynb (jupyter notebook) to .Rmd (R Makrdown) in R using rmarkdown::convert_ipynb(input, output)
function.
In jupyter lab or notebook, you may need additional installation to activate co-pilot (pip install notebook-intelligence
in command prompt terminal). Details are here. Custom shortcut can also be added by editing the .json file. Details are here. You need to add the following codes in the User Preference Pane. Be careful for not messing the original file. Paste this codes before the last curly bracket and put comma after the ] (square bracket) of the previous line.
{
"shortcuts": [
{
"command": "notebook:replace-selection",
"selector": ".jp-Notebook",
"keys": ["Ctrl Shift M"],
"args": {"text": '%>% '}
}
]
}
This course has 30 hours of active face-to-face classroom activities and 50 hours of independent study, homework, assignment and presentation.
R is widely used for data science for several reasons, such as:
It is a free software.
You can do whatever you want to do regarding data analysis.
It’s publication ready visualization power is unique.
Program script can be saved, shared, and reuse easily for further reference.
Highly dedicated, motivated and enthusiastic user forum and helpful online free learning materials provide confidence among users.
Course contents
- Presentation
- Data sets
Please double check and ensure your datasets and R script/markdown files are in the same folder/directory. Set the working the directory to the folder containing the script and dataset.
Assessment
- Class test 1: Lesson 1-6: Bring your calculator for practical task
- Class test 2: Lesson 7-10: Bring your calculator for practical task
- Assignment: Lesson 11-16: Homework/group project, presentation
- Final exam: Lesson 1-16: Concept, selection of test, requirement of assumptions, cleaning the outputs of analysis for report, writing interpretation of given analysis results
- Example questions: To be discussed in the makeup class.
Resources
Basics of R Programming Language for Data Wrangling and Analysis – [Class notes, will be supplied]
YouTube video links may be accessed from the course contents.
Website for learning: https://www.geeksforgeeks.org/r-tutorial/?ref=outind