Academia.eduAcademia.edu

Nearest Neighbor Method

description317 papers
group0 followers
lightbulbAbout this topic
The Nearest Neighbor Method is a classification and regression technique in machine learning that predicts the output for a given input based on the closest training examples in the feature space. It operates on the principle that similar instances are likely to have similar outcomes.
lightbulbAbout this topic
The Nearest Neighbor Method is a classification and regression technique in machine learning that predicts the output for a given input based on the closest training examples in the feature space. It operates on the principle that similar instances are likely to have similar outcomes.

Key research themes

1. How do different distance measures impact the accuracy and performance of k-Nearest Neighbor classification?

This research area investigates the role of various distance metrics in determining neighborhoods in k-NN algorithms and their effect on classification accuracy, sensitivity, specificity, and computational efficiency. It is critical because the choice of distance metric directly shapes the notion of similarity between data points, influencing the classifier's effectiveness across diverse data types such as network intrusion detection, medical data, foreign exchange forecasting, and student data classification.

Key finding: Comparative evaluation of Euclidean, Manhattan, and Chebychev distance metrics on the KDD intrusion detection dataset revealed that Manhattan distance consistently outperformed the others by achieving higher accuracy,... Read more
Key finding: Through extensive experimentation with eighteen diverse distance measures across multiple real-world datasets, the study demonstrated that k-NN classifier performance is significantly sensitive to distance metric choice, with... Read more
Key finding: Empirical testing on student graduation data using Euclidean and Manhattan distances showed both metrics perform effectively in classifying students as timely or untimely graduates, with the best accuracy of 85.28% attained... Read more
Key finding: The study emphasizes the inadequacy of standard distance metrics like Euclidean in forecasting highly correlated, nonlinear foreign exchange data, demonstrating that Mahalanobis distance’s ability to account for feature... Read more
Key finding: This paper outlines the foundational role of distance calculations, notably Euclidean distance, in k-NN classification and demonstrates through conceptual examples how the choice of distance metric and the parameter k... Read more

2. Can adaptive and local parameter selection improve nearest neighbor classifier accuracy compared to fixed global parameters?

This theme addresses the optimization of the key k-NN hyperparameter k, moving beyond the classic fixed-k approach toward locally adaptive or dynamic selection methods. The goal is to tailor the neighborhood size for each test instance based on data distribution characteristics or clustering information, thereby enhancing classification precision and reducing misclassification caused by uniform parameter settings.

Key finding: The proposed dynamic local-k selection method, which employs clustering to determine the optimal k for each test instance, demonstrated improved classification accuracy over traditional fixed-k k-NN implementations across... Read more
Key finding: By restricting analogy-based effort estimation to clusters with low variance in effort data, this method dynamically selects nearest neighbors instead of relying on fixed-sized neighborhoods, significantly reducing estimation... Read more
Key finding: Integrating local mean-based classification with distance-weighted voting to determine class assignment for neighbors resulted in a consistent average accuracy improvement of 2.45% across multiple benchmark datasets, and up... Read more

3. How can approximate nearest neighbor search methods alleviate the curse of dimensionality and improve search efficiency in metric and non-metric spaces?

This area focuses on algorithmic and data structural innovations that enable efficient approximate nearest neighbor (ANN) search in high-dimensional and non-metric spaces, circumventing the computational impracticalities of exact search due to the curse of dimensionality. Research evaluates tradeoffs between speed and accuracy, comparing traditional metric-based trees and graph-based small world methods, with applications in similarity search across varied domains.

Key finding: Through empirical evaluation on metric and non-metric datasets, this study showed that small world graph based approaches provide superior efficiency-effectiveness tradeoffs compared to classical data structures like VP-tree... Read more
Key finding: Parallelizing the False Nearest Neighbors algorithm across distributed memory architectures achieved speedups between 17x and 37x over the best sequential TISEAN implementation, enabling rapid identification of appropriate... Read more
Key finding: Introducing sparse occupancy tree structures as non-parametric approximators for the complex long wave radiation parameterization in climate models provides a computationally efficient emulation for a 220-dimensional input... Read more

All papers in Nearest Neighbor Method

A family of Kalman-type filters that estimate the user's position indoors, using range measurements and floor plan data, is presented. The floor plan information is formulated as a set of linear constraints and is used to truncate the... more
This article considers the problem of educational placement. Several discriminant techniques are applied to a data set from a survey project of science ability. A profile vector for each student consists of five science-educational... more
Particle filtering is being investigated extensively due to its important feature of target tracking based on nonlinear and non-Gaussian model. It tracks a trajectory with a known model at a given time. It means that particle filter... more
This paper develops a ZigBee indoor positioning scheme based on the location fingerprinting approach. The proposed scheme includes four workflows: (1) creating the location fingerprint table, (2) training the locating model using neural... more
This paper compares Models-3/Community Multiscale Air Quality (CMAQ) outputs at multiple resolutions by interpolating from coarse resolution to fine resolution and analyzing the interpolation difference. Spatial variograms provide a... more
We present a new method for creating a comparable document collection from two document collections in different languages. The best query keys were extracted from a Finnish source collection (articles of the newspaper Aamulehti) with the... more
KNN (K-nearest neighbor) is an important tool in machine learning and it is used in classification and prediction problems. In recent years several modified versions of KNN search algorithm have been developed and employed to improve the... more
Forest structure is a fundamental component of the forest ecosystem and significantly impacts carbon sequestration. Previous studies mainly focused on optimizing forest non-spatial attributes for restoring carbon, but the significance of... more
The computation of Global Climate Models (GCMs) presents significant numerical challenges. This paper presents new algorithms based on sparse occupancy trees for learning and emulating the long wave radiation parameterization in the in... more
Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We... more
Changes in the configurational entropies of molecules make important contributions to free energies of reaction for processes such as protein-folding, noncovalent association, and conformational change. However, obtaining entropy from... more
Changes in the configurational entropies of molecules make important contributions to the free energies of reaction for processes such as protein‐folding, noncovalent association, and conformational change. However, obtaining entropy from... more
Several effective methods have been developed recently for improving predictive performance by generating and combining multiple learned models. The general approach is to create a set of learned models either by applying an algorithm... more
The assessment of an appropriate function describing the relationship between hydrological variables is a frequent problem. The usual way of estimating an overall function is a difficult task if the relationship between the variables is... more
Motivation: Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which... more
In oats, the year factor has a large influence on the phenotype expression. As a consequence, one year estimates of genetic distance between cultivars often have very little precision. The objectives of this work were to estimate: the... more
Dezoito genótipos de aveia foram testados quanto à dissimilaridade genética, com e sem o controle de moléstias da parte aérea. As variáveis avaliadas foram rendimento de grãos desaristados, peso de mil grãos, peso do hectolitro, estatura... more
Frequent failures are becoming a serious concern to the community of high-end computing, especially when the applications and the underlying systems rapidly grow in size and complexity. In order to develop effective fault-tolerant... more
Two new feature selection methods are introduced, the first based on separability criterion, the second on consistency index that includes interactions between the selected subsets of features. Comparison of accuracy was made against... more
The experimental data retrieved from three-dimensional particle tracking velocimetry (3D PTV) are crucial for indoor environment engineering when designing ventilation strategies or monitoring airborne pollutants dispersion in inhabited... more
In this study, the nonparametric k-nearest neighbour method was used to describe diameter distributions of birch stands in Northwest Spain. It was applied using the following essential steps: (i) estimation of the distance between target... more
Two new feature selection methods are introduced, the first based on separability criterion, the second on consistency index that includes interactions between the selected subsets of features. Comparison of accuracy was made against... more
In the present research we study the codebook generation problem of vector quantization, using two different techniques of Genetic Algorithm (GA). We used the Simple GA (SGA) method and Ordain GA (OGA) method in vector quantization. SGA... more
People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication... more
Many indices for evaluation of features have been considered. Applied to single features they allow for filtering irrelevant attributes. Algorithms for selection of subsets of features also remove redundant features. Hashing techniques... more
The k-nearest-neighbor (kNN) decision rule is a simple and robust classifier for text categorization. The performance of kNN decision rule depends heavily upon the value of the neighborhood parameter k. The method categorize a test... more
The similarity based decision rule computes the similarity between a new test document and the existing documents of the training set that belong to various categories. The new document is grouped to a particular category in which it has... more
This work proposes and evaluates a Nearest-Neighbor Method to substitute missing values in datasets formed by continuous attributes. In the substitution process, each instance containing missing values is compared with complete instances,... more
Combining outputs from different classifiers to achieve high accuracy in classification task is one of the most active research areas in ensemble method. Although many state-of-art approaches have been introduced, no method is outstanding... more
In this paper a specialized method for generating Markovian random fields, with or without conditioning, is presented. Here, the prior fields are assumed to be stationary second-order Gauss-Markov random fields in N-dimensional (N-D)... more
Over the course of the previous two decades, there has been a rise in the quantity of text documents stored digitally. The ability to organize and categorize those documents in an automated mechanism, is known as text categorization which... more
We describe a new decision list induction algorithm called the Greedy Prepend Algorithm (GPA). GPA improves on other decision list algorithms by introducing a new objective function for rule selection and a set of novel search algorithms... more
Airborne LiDAR techniques can provide accurate measurements of tree height, from which estimates of stem volume and forest woody biomass can be obtained. These techniques, however, are still expensive to apply repeatedly over large areas.... more
Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We... more
Up-to-date information of forest resources is required at a variety of scales in order to support forest management practices ranging from strategic to operational levels. The rate of change in Scottish forests is significant and may... more
PTTRNFNDR is an unsupervised statistical learning algorithm that detects patterns in DNA sequences, protein sequences, or any natural language texts that can be decomposed into letters of a finite alphabet. PTTRNFNDR performs complex... more
RESUMO-Mais de 200 linhagens avançadas de aveia (Avena retive L.) e trigo (Triticuin aeflivuzn L.), selecionadas em 1983, foram avaliadas em dois experimentos, conduzidos em Guaíba, RS, durante o ano de 1984. O objetivo foi testar... more
Forest policy makers increasingly desire the use of quantitative descriptions to define desirable forest characteristics as a target for forest management. A framework for quantitative, multivariate target definition and assessment is... more
When employing nearest neighbor classifiers scaling of input variables is often useful. In this paper we propose a small modification in usual data preprocessing: scaling of variables should be done by use of pooled variances instead of... more
Day by day, human-agent negotiation becomes more and more vital to reach a socially beneficial agreement when stakeholders need to make a joint decision together. Developing agents who understand not only human preferences but also... more
Download research papers for free!