Machine Learning Using R

This course is an advanced course designed by the integration of Research Methodology and Applied Statistics Using R. Complete the courses on your own to understand this machine learning course to a greater extent.

Introduction Slides

Hypothesis test

Hypothesis is a scientific guess which has not tested yet. Hypothesis test is done to make decision on the nature of a variable or statistics. Before performing a test we need to make a null hypothesis (H₀).

Example:

H₀: Two means are equal (t-test)
H₀: All means are equal (ANOVA)
H₀: Two variances are equal (Levene test)
H₀: Correlation coefficient = 0
H₀: Regression coefficient = 0
H₀: No association between the variables (chi-square test)
H₀: No effects/relationships (correlation, regression, ANOVA)
H₀: Data is normally distributed (Shapiro test)

Look at this applet https://kamrulext.shinyapps.io/p-values/

p-values indicate evidence to support the H₀.
p-value shows level of error (type I: true negative) in repeated trials.
FYI, type II error = false positive (fails to reject a false H₀)
p-values remain in the rejection area.
p-value is the area under the curve left side of the z-value.
Each of the p-values correspond to a z-value.
Z-value => quartile value, variable value, observed value, x-axis value, test statistics (t, r, F, beta).
H₀ is born to be rejected, so, we can never accept it. We can only reject or not reject it.
So, in a simple sense, small p-value (<= 0.05) means insufficient evidence to support the H0 (rejected).
Large p-value (>0.05) means sufficient evidence to support the H₀ (cannot be rejected).
Here is the twist: H₀ rejected => significant relationship or effect because H₀ was saying no relationship
Further twist: Test statistic (t, r, F, beta) value increases p-value decreases.
Higher test statistic gives small p-values => significant result.
Test statistic equivalent to (Coefficient)/(Standard error)

Schedule and data sets

Lesson	Topic	Duration	Links
1	Installation of R and RStudio	1 hr	Instruction, YouTube
2	Introduction to R	3 hr 30 min	YouTube, R Codes Used in the Class
3	Tidy Thinking in R	3 hr 30 min	R Codes Used in the Class
4	Linear Regression	3 hr	R Codes Used in the Class
5	Bias-Variance Trade-Off	2 hr	R Codes Used in the Class
6	Step-wise regression	1 hr	R Codes Used in the Class
7	Ridge regression	1 hr	R Codes Used in the Class
8	Lasso Regression	0.5 hr	R Codes Used in the Class
9	Validation: Train-Test Split	1 hr	R Codes Used in the Class
10	Cross Validation	1 hr	R Codes Used in the Class
11	Bootstrapping	0.5 hr	R Codes Used in the Class
12	Logistic Regression	2 hr	R Codes Used in the Class
13	Moving Beyond Linearity (Polynomials)	3 hr	R Codes Used in the Class
14	K-Nearest Neighbors (KNN)	3 hr	R Codes Used in the Class
15	Linear and Quadratic Discriminant Analysis	1 hr	R Codes Used in the Class
16	Support Vector Machines	1 hr	R Codes Used in the Class
17	Random Forest	1 hr	R Codes Used in the Class
18	Gradient Boosting	1 hr	R Codes Used in the Class
19	Neural Network	2 hr	R Codes Used in the Class
20	Principal Component Analysis	2 hr	R Codes Used in the Class
21	Cluster Analysis	2 hr	R Codes Used in the Class
22	Time Series Analysis	2 hr	R Codes Used in the Class

Machine Learning Using R

Introduction Slides

Hypothesis test

Schedule and data sets

Installation of R and RStudio

Introduction to R

Tidy Thinking in R

Linear Regression

Bias-Variance Trade-Off

Step-wise regression

Ridge regression

Lasso Regression

Validation: Train-Test Split

Cross Validation

Bootstrapping

Logistic Regression

Moving Beyond Linearity (Polynomials)

K-Nearest Neighbors (KNN)

Linear and Quadratic Discriminant Analysis

Support Vector Machines

Random Forest

Gradient Boosting

Neural Network

Principal Component Analysis

Cluster Analysis

Time Series Analysis