Numerous algorithms exist in Machine Learning for both classification and regression objectives. Among the more straightforward algorithms is K-Nearest Neighbors (KNN). This is a non-parametric, supervised learning technique. It assists in categorizing new data points based on their similarity to pre-existing data. The method functions by locating the ‘k’ nearest points (neighbors) to a specified input and subsequently predicts its category or value informed by the predominant category or the mean of its neighbors.
In this article, we will delve into the principles of the KNN algorithm and illustrate its application utilizing Python. So, let’s dive in!
KNN is a basic instance-driven learning algorithm. It is utilized for classification and regression tasks. The algorithm operates by identifying the nearest data points (neighbors) within the dataset. It subsequently predicts outcomes based on the dominant category (for classification) or the mean value (for regression).
For instance, if we aim to categorize a fruit as an orange, KNN will search for the top K nearest fruits in the dataset. If the majority of these are oranges, the new fruit will be classified as an orange.
How Does KNN Function?
The operation of KNN is a simple and clear-cut procedure.
Select K (the number of neighbors).
Measure the distance between the new data point and all existing data points in the dataset.
Select the K closest points.
Finally, make a prediction based on the majority category or the mean value.
Example:
If we set K=3, and the three closest points comprise 2 apples and 1 orange, the new data point will be classified as an apple.
Selecting the Appropriate Value of K
The selection of K significantly impacts the effectiveness of KNN.
Small K (e.g., K=1, K=3):
Utilizing a small value for K renders the model very sensitive to noisy data. This could potentially result in overfitting. Although it captures local trends, it may also misclassify due to outliers.
Large K (e.g., K=10, K=20)
A larger value of K tends to smooth out results and minimize noise. However, this may also cause underfitting and lead the model to overlook critical details.
Optimal Practices:
The most effective method to determine the K value is to follow a common guideline, such as setting K = √(total number of data points). Additionally, experimenting with varied K values while validating model performance through cross-validation is advisable.
KNN Implementation in Python
In this section, we will employ the Iris dataset to exemplify KNN for classification purposes.
Step 1: Import Libraries and Load Dataset
Example:
Python
Code Copied!
var isMobile = window.innerWidth “);
editor40785.setValue(decodedContent); // Set the default text
editor40785.clearSelection();
editor40785.setOptions({
maxLines: Infinity
});
function decodeHTML40785(input) {
var doc = new DOMParser().parseFromString(input, “text/html”);
return doc.documentElement.textContent;
}
// Function to copy code to clipboard“`javascript
function copyCodeToClipboard40785() {
const code = editor40785.getValue(); // Retrieve code from the editor
navigator.clipboard.writeText(code).then(() => {
// alert(“Code successfully copied to clipboard!”);
function closeoutput40785() {
var code = editor40785.getSession().getValue();
jQuery(".maineditor40785 .code-editor-output").hide();
}
// Bind event listeners to the buttons
document.getElementById("copyBtn40785").addEventListener("click", copyCodeToClipboard40785);
document.getElementById("runBtn40785").addEventListener("click", runCode40785);
document.getElementById("closeoutputBtn40785").addEventListener("click", closeoutput40785);
Output:
Clarification:
The code provided above is utilized to import the Iris dataset. It divides the dataset into training (80%) and testing (20%) portions, ultimately printing a success notification.
Step 2: Training the KNN Model
Sample:
Python
Code Copied!
var isMobile = window.innerWidth ");
editor76806.setValue(decodedContent); // Set the initial text
editor76806.clearSelection();
editor76806.setOptions({
maxLines: Infinity
});
function decodeHTML76806(input) {
var doc = new DOMParser().parseFromString(input, "text/html");
return doc.documentElement.textContent;
}
// Function to copy code to clipboard
function copyCodeToClipboard76806() {
const code = editor76806.getValue(); // Retrieve code from the editor
navigator.clipboard.writeText(code).then(() => {
// alert("Code successfully copied to clipboard!");
function closeoutput76806() {
var code = editor76806.getSession().getValue();
jQuery(".maineditor76806 .code-editor-output").hide();
}
// Bind event listeners to the buttons
document.getElementById("copyBtn76806").addEventListener("click", copyCodeToClipboard76806);
document.getElementById("runBtn76806").addEventListener("click", runCode76806);
document.getElementById("closeoutputBtn76806").addEventListener("click", closeoutput76806);
Output:
Clarification:
The code presented above is intended to establish a K-Nearest Neighbors (KNN) classifier. It assigns the value of K to be 3, and subsequently trains it using the training dataset. Finally, it outputs a success notification.
Step 3: Generating Predictions
Sample:
Python
``````html
class="codelab-right-section-body">
Code Copied!
var isMobile = window.innerWidth “);
editor51646.setValue(decodedContent); // Set the initial content
editor51646.clearSelection();
editor51646.setOptions({
maxLines: Infinity
});
function decodeHTML51646(input) {
var doc = new DOMParser().parseFromString(input, “text/html”);
return doc.documentElement.textContent;
}
// Function to copy code to clipboard
function copyCodeToClipboard51646() {
const code = editor51646.getValue(); // Retrieve code from the editor
navigator.clipboard.writeText(code).then(() => {
// alert(“Code copied to clipboard!”);
function closeoutput51646() {
var code = editor51646.getSession().getValue();
jQuery(".maineditor51646 .code-editor-output").hide();
}
// Connect event listeners to the buttons
document.getElementById("copyBtn51646").addEventListener("click", copyCodeToClipboard51646);
document.getElementById("runBtn51646").addEventListener("click", runCode51646);
document.getElementById("closeoutputBtn51646").addEventListener("click", closeoutput51646);
Output:
Explanation:
The preceding code utilizes the trained KNN model, which assists in forecasting labels for the test dataset. It subsequently prints the predicted class labels.
Step 4: Evaluating the Model
Example:
Python
Code Copied!
var isMobile = window.innerWidth ");
editor58027.setValue(decodedContent); // Establish the default content
editor58027.clearSelection();
editor58027.setOptions({
maxLines: Infinity
});
function decodeHTML58027(input) {
var doc = new DOMParser().parseFromString(input, "text/html");
return doc.documentElement.textContent;
}
// Function to copy code to clipboard
function copyCodeToClipboard58027() {
const code = editor58027.getValue(); // Retrieve code from the editor
navigator.clipboard.writeText(code).then(() => {
// alert("Code copied to clipboard!");
The preceding code is utilized to determine the accuracy of the KNN model. It contrasts the predicted labels (y_pred) with the actual labels (y_test) and subsequently displays the accuracy score.
Step 5: Identifying the Optimal K Value
Instance:
Python
Code Copied!
var isMobile = window.innerWidth ");
editor26216.setValue(decodedContent); // Set the default text
editor26216.clearSelection();
editor26216.setOptions({
maxLines: Infinity
});
function decodeHTML26216(input) {
var doc = new DOMParser().parseFromString(input, "text/html");
return doc.documentElement.textContent;
}
// Function to duplicate code to clipboard
function copyCodeToClipboard26216() {
const code = editor26216.getValue(); // Retrieve code from the editor
navigator.clipboard.writeText(code).then(() => {
// alert("Code copied to clipboard!");
function closeoutput26216() {
var code = editor26216.getSession().getValue();
jQuery(".maineditor26216 .code-editor-output").hide();
}
// Attach event listeners to the buttons
document.getElementById("copyBtn26216").addEventListener("click", copyCodeToClipboard26216);
document.getElementById("runBtn26216").addEventListener("click", runCode26216);
document.getElementById("closeoutputBtn26216").addEventListener("click", closeoutput26216);
Outcome:
Clarification:
The preceding code is designed to assess the KNN model’s accuracy for various values of K (from 1 to 19). It subsequently archives the accuracy scores and plots a graph that aids in visualizing the correlation between K and the accuracy of the model.
Drawbacks of KNN Algorithm
Some limitations of the KNN algorithm are mentioned below:
Simple yet lacks inherent learning: KNN does not rely on internal machine learning frameworks for predictions. Instead, it categorizes based on the present data points.
It struggles to predict Rare Events: KNN finds it challenging to detect unusual occurrences. For instance, new diseases. Due to its lack of prior knowledge about rare data distributions, it finds it difficult to identify new diseases.
It is computationally Intensive: KNN necessitates considerable time and memory to maintain the complete dataset. It then executes distance computations for each prediction.
It is Sensitive to Feature Scales: In KNN, features with higher magnitudes overshadow those with lower values. This occurs due to the Euclidean distance metric, resulting in skewed predictions.
It is not Fitting for High-Dimensional Data: In expansive-dimensional datasets, KNN becomes ineffective due to the “curse of dimensionality”. This results in decreased efficiency as the feature count increases.
Optimal Practices for Utilizing the KNN Algorithm
Several best practices to improve the efficiency of the K-nearest neighbors (KNN) algorithm are outlined below:
Select the Ideal K Value: Choosing a suitable value of K is vital because a small
```K (for instance, 1 or 3) could result in overfitting, whereas a larger K (for example, 10 or 20) might lead to underfitting. One way to optimize K is by selecting it as the square root of the overall count of data points.
Standardize Features: As KNN relies on distance calculations, adjusting the dataset or standardizing it guarantees that features with significant magnitudes do not overpower the outcomes.
Implement an Effective Distance Metric: While Euclidean distance is frequently utilized, alternative metrics like Manhattan or Minkowski distance may yield better results depending on the dataset's characteristics.
Address Class Imbalance: In cases where your dataset exhibits imbalanced classes, employing weighted KNN, where nearer neighbors exert more influence on classification, is advisable.
Enhance Computational Efficiency: KNN can be sluggish with extensive datasets. Utilizing KD-trees or Ball trees for quicker nearest neighbor searches can significantly enhance performance.
Employ Cross-Validation: Utilizing cross-validation rather than depending on a singular train-test division helps in identifying the optimal K value and also mitigates the risk of overfitting.
Conclusion
KNN (K-Nearest Neighbors) is a straightforward yet potent algorithm, widely employed in classification and regression endeavors. It is simple to grasp, non-parametric by design, and adaptable to various dataset types, making it a valuable asset in machine learning. Nevertheless, KNN comes with its drawbacks. Some of the challenges include elevated computational demands and sensitivity to feature scaling and dimensionality. By normalizing data, meticulously determining the best K value, opting for an apt distance metric, and employing dimensionality reduction strategies, you can enhance both the efficiency and precision of your model. While KNN may not be ideal for massive datasets, it remains an excellent selection for smaller datasets where interpretability and simplicity are prioritized.
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.