Unit 11 | kNN - DA5020 | Collect-Store-Analyze Data

Unit 11 | Machine Learning with kNN

| 9.5 hrs

Upon completion of this module, you will be able to:

explain the nearest neighbor algorithm for lazy classification
apply k-NN to predict categorical and continuous variables
tune machine learning algorithms
normalize data for better algorithm performance
implement k-NN in R through code and through packages

Lesson 1
Lesson 2
Lesson 3
Lesson 4
Lesson 5

Understanding Nearest Neighbor Classification

Required Work

View the lectures and work alongside with R where you'll learn about a supervised machine learning algorithm called k-NN: the nearest neighbor approach to classification.
Read Analytics Vidhya (2014). Introduction to k-nearest neighbors : Simplified

Additional Resources

None

Slide Deck & Data Sets

Slide Deck | Food Data Set | kNN R Code

Normalizing Features & Distance Measures

Required Work

View the tutorials on normalizing feature values and different types of distance measures.

Additional Resources

Distance Measures for k-NN

Guest Lecture: k Nearest Neighbor Classification Algorithm

Required Work

Watch the guest lecture to get another perspective on the k-NN algorithm. Pay particular attention to the mathematics of the algorithm.

Additional Resources

Worked Example II: Using kNN from the caret Package

Work through the example presented in this tutorial using the Wine dataset. Be sure to install the caret package in your R environment before you work through the code. Here's some code you can use to download the data into R directly from the URL:

Evaluating a Classification Algorithm

Classification algorithm performance (i.e., how well they likely predict) is done in several ways. One way is to calculate it's accuracy. This is done by applying the algorithm to the testing (or validation) data where we know what the class label is. We then run the algorithm and compare that with the actual label. We then calculate how often it makes the right prediction: accuracy. We can also count how often it makes the wrong negative prediction, the wrong positive prediction, etc. That results in a confusion matrix. Here a good tutorial on confusion matrices from DataCamp that help you understand how to do that in R.

Unit 11 | Machine Learning with kNN

| 9.5 hrs

Understanding Nearest Neighbor Classification

Required Work

Additional Resources

Slide Deck & Data Sets

Normalizing Features & Distance Measures

Required Work

Additional Resources

​Guest Lecture: k Nearest Neighbor Classification Algorithm

​Required Work

​Additional Resources

Worked Example II: Using kNN from the caret Package

Evaluating a Classification Algorithm

Data Analytics and Data Science as a Profession

Guest Lecture: k Nearest Neighbor Classification Algorithm

Required Work

Additional Resources