finding-k-nearest-neighbors-and-its-implementation

Numerous algorithms exist in Machine Learning for both classification and regression objectives. Among the more straightforward algorithms is K-Nearest Neighbors (KNN). This is a non-parametric, supervised learning technique. It assists in categorizing new data points based on their similarity to pre-existing data. The method functions by locating the ‘k’ nearest points (neighbors) to a specified input and subsequently predicts its category or value informed by the predominant category or the mean of its neighbors.

In this article, we will delve into the principles of the KNN algorithm and illustrate its application utilizing Python. So, let’s dive in!

Table of Contents

What is K-Nearest Neighbors (KNN)?

KNN is a basic instance-driven learning algorithm. It is utilized for classification and regression tasks. The algorithm operates by identifying the nearest data points (neighbors) within the dataset. It subsequently predicts outcomes based on the dominant category (for classification) or the mean value (for regression).

For instance, if we aim to categorize a fruit as an orange, KNN will search for the top K nearest fruits in the dataset. If the majority of these are oranges, the new fruit will be classified as an orange.

How Does KNN Function?

The operation of KNN is a simple and clear-cut procedure.

  1. Select K (the number of neighbors).
  2. Measure the distance between the new data point and all existing data points in the dataset.
  3. Select the K closest points.
  4. Finally, make a prediction based on the majority category or the mean value.

Example:

If we set K=3, and the three closest points comprise 2 apples and 1 orange, the new data point will be classified as an apple.

Selecting the Appropriate Value of K

The selection of K significantly impacts the effectiveness of KNN.

  • Small K (e.g., K=1, K=3):

Utilizing a small value for K renders the model very sensitive to noisy data. This could potentially result in overfitting. Although it captures local trends, it may also misclassify due to outliers.

  • Large K (e.g., K=10, K=20)

A larger value of K tends to smooth out results and minimize noise. However, this may also cause underfitting and lead the model to overlook critical details.

  • Optimal Practices:

The most effective method to determine the K value is to follow a common guideline, such as setting K = √(total number of data points). Additionally, experimenting with varied K values while validating model performance through cross-validation is advisable.

KNN Implementation in Python

In this section, we will employ the Iris dataset to exemplify KNN for classification purposes.

Step 1: Import Libraries and Load Dataset

Example:

Python

Code Copied!

var isMobile = window.innerWidth “);

editor40785.setValue(decodedContent); // Set the default text editor40785.clearSelection();

editor40785.setOptions({ maxLines: Infinity });

function decodeHTML40785(input) { var doc = new DOMParser().parseFromString(input, “text/html”); return doc.documentElement.textContent; }

// Function to copy code to clipboard“`javascript function copyCodeToClipboard40785() { const code = editor40785.getValue(); // Retrieve code from the editor navigator.clipboard.writeText(code).then(() => { // alert(“Code successfully copied to clipboard!”);

jQuery(“.maineditor40785 .copymessage”).show(); setTimeout(function() { jQuery(“.maineditor40785 .copymessage”).hide(); }, 2000); }).catch(err => { console.error(“An error occurred while copying code: “, err); }); }

function runCode40785() {

var code = editor40785.getSession().getValue();

jQuery(“#runBtn40785 i.run-code”).show(); jQuery(“.output-tab”).click();

jQuery.ajax({ url: “https://intellipaat.com/blog/wp-admin/admin-ajax.php”, type: “post”,

data: { language: “python”, code: code, cmd_line_args: “”, variablenames: “”, action:”compilerajax” }, success: function(response) { var myArray = response.split(“~”); var data = myArray[1];

jQuery(“.output40785”).html(“

"+data+"");
									jQuery(".maineditor40785 .code-editor-output").show();
									jQuery("#runBtn40785 i.run-code").hide();

} })

}

function closeoutput40785() { var code = editor40785.getSession().getValue(); jQuery(".maineditor40785 .code-editor-output").hide(); }

// Bind event listeners to the buttons document.getElementById("copyBtn40785").addEventListener("click", copyCodeToClipboard40785); document.getElementById("runBtn40785").addEventListener("click", runCode40785); document.getElementById("closeoutputBtn40785").addEventListener("click", closeoutput40785);

Output:

Importing Libraries and Loading Dataset Output

Clarification:

The code provided above is utilized to import the Iris dataset. It divides the dataset into training (80%) and testing (20%) portions, ultimately printing a success notification.

Step 2: Training the KNN Model

Sample:

Python

Code Copied!

var isMobile = window.innerWidth ");

editor76806.setValue(decodedContent); // Set the initial text editor76806.clearSelection();

editor76806.setOptions({ maxLines: Infinity });

function decodeHTML76806(input) { var doc = new DOMParser().parseFromString(input, "text/html"); return doc.documentElement.textContent; }

// Function to copy code to clipboard function copyCodeToClipboard76806() { const code = editor76806.getValue(); // Retrieve code from the editor navigator.clipboard.writeText(code).then(() => { // alert("Code successfully copied to clipboard!");

jQuery(".maineditor76806 .copymessage").show(); setTimeout(function() { jQuery(".maineditor76806 .copymessage").hide(); }, 2000); }).catch(err => { console.error("An error occurred while copying code: ", err); }); }

function runCode76806() {

var code = editor76806.getSession().getValue();

jQuery("#runBtn76806 i.run-code").show(); jQuery(".output-tab").click();

jQuery.ajax({ url: "https://intellipaat.com/blog/wp-admin/admin-ajax.php", type: "post",

data: { language: "python", code: code, cmd_line_args: "", variablenames: "", action:"compilerajax" }, success: function(response) { var myArray = response.split("~"); var data = myArray[1];

jQuery(".output76806").html("

"+data+"");
									jQuery(".maineditor76806 .code-editor-output").show();
									jQuery("#runBtn76806 i.run-code").hide();

} })

}

function closeoutput76806() { var code = editor76806.getSession().getValue(); jQuery(".maineditor76806 .code-editor-output").hide(); }

// Bind event listeners to the buttons document.getElementById("copyBtn76806").addEventListener("click", copyCodeToClipboard76806); document.getElementById("runBtn76806").addEventListener("click", runCode76806); document.getElementById("closeoutputBtn76806").addEventListener("click", closeoutput76806);

Output:

Training the KNN Model Output

Clarification:

The code presented above is intended to establish a K-Nearest Neighbors (KNN) classifier. It assigns the value of K to be 3, and subsequently trains it using the training dataset. Finally, it outputs a success notification.

Step 3: Generating Predictions

Sample:

Python

``````html
class="codelab-right-section-body">

Code Copied!

var isMobile = window.innerWidth “);

editor51646.setValue(decodedContent); // Set the initial content editor51646.clearSelection();

editor51646.setOptions({ maxLines: Infinity });

function decodeHTML51646(input) { var doc = new DOMParser().parseFromString(input, “text/html”); return doc.documentElement.textContent; }

// Function to copy code to clipboard function copyCodeToClipboard51646() { const code = editor51646.getValue(); // Retrieve code from the editor navigator.clipboard.writeText(code).then(() => { // alert(“Code copied to clipboard!”);

jQuery(“.maineditor51646 .copymessage”).show(); setTimeout(function() { jQuery(“.maineditor51646 .copymessage”).hide(); }, 2000); }).catch(err => { console.error(“Error copying code: “, err); }); }

function runCode51646() {

var code = editor51646.getSession().getValue();

jQuery(“#runBtn51646 i.run-code”).show(); jQuery(“.output-tab”).click();

jQuery.ajax({ url: “https://intellipaat.com/blog/wp-admin/admin-ajax.php”, type: “post”,

data: { language: “python”, code: code, cmd_line_args: “”, variablenames: “”, action:”compilerajax” }, success: function(response) { var myArray = response.split(“~”); var data = myArray[1];

jQuery(“.output51646”).html(“

"+data+"");
									jQuery(".maineditor51646 .code-editor-output").show();
									jQuery("#runBtn51646 i.run-code").hide();

} })

}

function closeoutput51646() { var code = editor51646.getSession().getValue(); jQuery(".maineditor51646 .code-editor-output").hide(); }

// Connect event listeners to the buttons document.getElementById("copyBtn51646").addEventListener("click", copyCodeToClipboard51646); document.getElementById("runBtn51646").addEventListener("click", runCode51646); document.getElementById("closeoutputBtn51646").addEventListener("click", closeoutput51646);

Output:

Making Predictions Output

Explanation:

The preceding code utilizes the trained KNN model, which assists in forecasting labels for the test dataset. It subsequently prints the predicted class labels.

Step 4: Evaluating the Model

Example:

Python

Code Copied!

var isMobile = window.innerWidth ");

editor58027.setValue(decodedContent); // Establish the default content editor58027.clearSelection();

editor58027.setOptions({ maxLines: Infinity });

function decodeHTML58027(input) { var doc = new DOMParser().parseFromString(input, "text/html"); return doc.documentElement.textContent; }

// Function to copy code to clipboard function copyCodeToClipboard58027() { const code = editor58027.getValue(); // Retrieve code from the editor navigator.clipboard.writeText(code).then(() => { // alert("Code copied to clipboard!");

jQuery(".maineditor58027 .copymessage").show(); setTimeout(function() { jQuery(".maineditor58027 .copymessage").hide(); }, 2000); }).catch(err => { console.error("Error copying code: ", err); }); }

function runCode58027() {

var code = editor58027.getSession().getValue();

jQuery("#runBtn58027 i.run-code").show(); jQuery(".output-tab").click();

jQuery.ajax({ url: "https://intellipaat.com/blog/wp-admin/admin-ajax.php", type: "post",

data: { language: "python", code: code, cmd_line_args: "", variablenames: "", action:"compilerajax" }, success: function(response) { var myArray = response.split("~"); var data = myArray[1];

jQuery(".output58027").html("

"+data+"");
									jQuery(".maineditor58027 .code-editor-output").show();
									jQuery("#runBtn58027 i.run-code").hide();

} })

}

function closeoutput58027() { var code = editor58027.getSession().getValue(); jQuery(".maineditor58027 .code-editor-output").hide(); }

// Connect event listeners to the buttons document.getElementById("copyBtn58027").addEventListener("click", copyCodeToClipboard58027); document.getElementById("runBtn58027").addEventListener("click", runCode58027); document.getElementById("closeoutputBtn58027").addEventListener("click", closeoutput58027);

``````html
copyCodeToClipboard58027);
document.getElementById("runBtn58027").addEventListener("click", runCode58027);
document.getElementById("closeoutputBtn58027").addEventListener("click", closeoutput58027);

Outcome:

Evaluating the Model Output

Clarification:

The preceding code is utilized to determine the accuracy of the KNN model. It contrasts the predicted labels (y_pred) with the actual labels (y_test) and subsequently displays the accuracy score.

Step 5: Identifying the Optimal K Value

Instance:

Python

Code Copied!

var isMobile = window.innerWidth ");

editor26216.setValue(decodedContent); // Set the default text editor26216.clearSelection();

editor26216.setOptions({ maxLines: Infinity });

function decodeHTML26216(input) { var doc = new DOMParser().parseFromString(input, "text/html"); return doc.documentElement.textContent; }

// Function to duplicate code to clipboard function copyCodeToClipboard26216() { const code = editor26216.getValue(); // Retrieve code from the editor navigator.clipboard.writeText(code).then(() => { // alert("Code copied to clipboard!");

jQuery(".maineditor26216 .copymessage").show(); setTimeout(function() { jQuery(".maineditor26216 .copymessage").hide(); }, 2000); }).catch(err => { console.error("Error duplicating code: ", err); }); }

function runCode26216() {

var code = editor26216.getSession().getValue();

jQuery("#runBtn26216 i.run-code").show(); jQuery(".output-tab").click();

jQuery.ajax({ url: "https://intellipaat.com/blog/wp-admin/admin-ajax.php", type: "post",

data: { language: "python", code: code, cmd_line_args: "", variablenames: "", action:"compilerajax" }, success: function(response) { var myArray = response.split("~"); var data = myArray[1];

jQuery(".output26216").html("

"+data+"");
									jQuery(".maineditor26216 .code-editor-output").show();
									jQuery("#runBtn26216 i.run-code").hide();

} })

}

function closeoutput26216() { var code = editor26216.getSession().getValue(); jQuery(".maineditor26216 .code-editor-output").hide(); }

// Attach event listeners to the buttons document.getElementById("copyBtn26216").addEventListener("click", copyCodeToClipboard26216); document.getElementById("runBtn26216").addEventListener("click", runCode26216); document.getElementById("closeoutputBtn26216").addEventListener("click", closeoutput26216);

Outcome:

Finding the Best K Value Output

Clarification:

The preceding code is designed to assess the KNN model’s accuracy for various values of K (from 1 to 19). It subsequently archives the accuracy scores and plots a graph that aids in visualizing the correlation between K and the accuracy of the model.

Drawbacks of KNN Algorithm

Some limitations of the KNN algorithm are mentioned below:

  1. Simple yet lacks inherent learning: KNN does not rely on internal machine learning frameworks for predictions. Instead, it categorizes based on the present data points.
  1. It struggles to predict Rare Events: KNN finds it challenging to detect unusual occurrences. For instance, new diseases. Due to its lack of prior knowledge about rare data distributions, it finds it difficult to identify new diseases.
  1. It is computationally Intensive: KNN necessitates considerable time and memory to maintain the complete dataset. It then executes distance computations for each prediction.
  1. It is Sensitive to Feature Scales: In KNN, features with higher magnitudes overshadow those with lower values. This occurs due to the Euclidean distance metric, resulting in skewed predictions.
  1. It is not Fitting for High-Dimensional Data: In expansive-dimensional datasets, KNN becomes ineffective due to the “curse of dimensionality”. This results in decreased efficiency as the feature count increases.

Optimal Practices for Utilizing the KNN Algorithm

Several best practices to improve the efficiency of the K-nearest neighbors (KNN) algorithm are outlined below:

  1. Select the Ideal K Value: Choosing a suitable value of K is vital because a small
    ```K (for instance, 1 or 3) could result in overfitting, whereas a larger K (for example, 10 or 20) might lead to underfitting. One way to optimize K is by selecting it as the square root of the overall count of data points.
  1. Standardize Features: As KNN relies on distance calculations, adjusting the dataset or standardizing it guarantees that features with significant magnitudes do not overpower the outcomes.
  1. Implement an Effective Distance Metric: While Euclidean distance is frequently utilized, alternative metrics like Manhattan or Minkowski distance may yield better results depending on the dataset's characteristics.
  1. Address Class Imbalance: In cases where your dataset exhibits imbalanced classes, employing weighted KNN, where nearer neighbors exert more influence on classification, is advisable.
  1. Enhance Computational Efficiency: KNN can be sluggish with extensive datasets. Utilizing KD-trees or Ball trees for quicker nearest neighbor searches can significantly enhance performance.
  1. Employ Cross-Validation: Utilizing cross-validation rather than depending on a singular train-test division helps in identifying the optimal K value and also mitigates the risk of overfitting.

Conclusion

KNN (K-Nearest Neighbors) is a straightforward yet potent algorithm, widely employed in classification and regression endeavors. It is simple to grasp, non-parametric by design, and adaptable to various dataset types, making it a valuable asset in machine learning. Nevertheless, KNN comes with its drawbacks. Some of the challenges include elevated computational demands and sensitivity to feature scaling and dimensionality. By normalizing data, meticulously determining the best K value, opting for an apt distance metric, and employing dimensionality reduction strategies, you can enhance both the efficiency and precision of your model. While KNN may not be ideal for massive datasets, it remains an excellent selection for smaller datasets where interpretability and simplicity are prioritized.

FAQs

The article Finding K-Nearest Neighbors and Its Implementation was initially published on Intellipaat Blog.


Leave a Reply

Your email address will not be published. Required fields are marked *

Share This