Unit 11 | Machine Learning with kNN |
| 9.5 hrs |
Upon completion of this module, you will be able to:
- explain the nearest neighbor algorithm for lazy classification
- apply k-NN to predict categorical and continuous variables
- tune machine learning algorithms
- normalize data for better algorithm performance
- implement k-NN in R through code and through packages
Understanding Nearest Neighbor Classification
|
Required Work
Additional Resources
Slide Deck & Data Sets |
Normalizing Features & Distance Measures
|
Required Work
Additional Resources |
Guest Lecture: k Nearest Neighbor Classification Algorithm
|
Required Work
Additional Resources
|
Worked Example II: Using kNN from the caret Package
Work through the example presented in this tutorial using the Wine dataset. Be sure to install the caret package in your R environment before you work through the code. Here's some code you can use to download the data into R directly from the URL:
Evaluating a Classification Algorithm
Classification algorithm performance (i.e., how well they likely predict) is done in several ways. One way is to calculate it's accuracy. This is done by applying the algorithm to the testing (or validation) data where we know what the class label is. We then run the algorithm and compare that with the actual label. We then calculate how often it makes the right prediction: accuracy. We can also count how often it makes the wrong negative prediction, the wrong positive prediction, etc. That results in a confusion matrix. Here a good tutorial on confusion matrices from DataCamp that help you understand how to do that in R.
Data Analytics and Data Science as a Profession
|
|
|
|