Latest Android Projects - RANK-BASED SIMILARITY SEARCH REDUCING THE DIMENSIONAL DEPENDENCE

ABSTRACT:

This paper introduces a data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values.

INTRODUCTION

Of the fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, perhaps the most widely-encountered is that of similarity search. Similarity search is the foundation of k-nearest-neighbor (k-NN) classification, which often produces competitively-low error rates in practice, particularly when the number of classes is large. The error rate of nearest-neighbor classification has been shown to be ‘asymptotically optimal’ as the training set size increases. For clustering, many of the most effective and popular strategies require the determination of neighbor sets based at a substantial proportion of the data set objects: examples include hierarchical (agglomerative) methods such as content-based filtering methods for recommender systems and anomaly detection methods commonly make use of k-NN techniques, either through the direct use of k-NN search, or by means of k-NN cluster analysis.

A very popular density-based measure, the Local Outlier Factor (LOF), relies heavily on k-NN set computation to determine the relative density of the data in the vicinity of the test point [8]. For data mining applications based on similarity search, data objects are typically modeled as feature vectors of attributes for which some measure of similarity is defined Motivated at least in part by the impact of similarity search on problems in data mining, machine learning, pattern recognition, and statistics, the design and analysis of scalable and effective similarity search structures has been the subject of intensive research for many decades. Until relatively recently, most data structures for similarity search targeted low-dimensional real vector space representations and the euclidean or other Lp distance metrics.

However, many public and commercial data sets available today are more naturally represented as vectors spanning many hundreds or thousands of feature attributes that can be real or integer-valued, ordinal or categorical, or even a mixture of these types. This has spurred the development of search structures for more general metric spaces, such as the MultiVantage-Point Tree, the Geometric Near-neighbor Access Tree (GNAT), Spatial Approximation Tree (SAT), the M-tree, and (more recently) the Cover Tree (CT). Despite their various advantages, spatial and metric search structures are both limited by an effect often referred to as the curse of dimensionality.

One way in which the curse may manifest itself is in a tendency of distances to concentrate strongly around their mean values as the dimension increases. Consequently, most pairwise distances become difficult to distinguish, and the triangle inequality can no longer be effectively used to eliminate candidates from consideration along search paths. Evidence suggests that when the representational dimension of feature vectors is high (roughly 20 or more traditional similarity search accesses an unacceptably-high proportion of the data elements, unless the underlying data distribution has special properties. Even though the local neighborhood information employed by data mining applications is useful and meaningful, high data dimensionality tends to make this local information very expensive to obtain.

The performance of similarity search indices depends crucially on the way in which they use similarity information for the identification and selection of objects relevant to the query. Virtually all existing indices make use of numerical constraints for pruning and selection. Such constraints include the triangle inequality (a linear constraint on three distance values), other bounding surfaces defined in terms of distance (such as hypercubes or hyperspheres), range queries involving approximation factors as in Locality-Sensitive Hashing (LSH) or absolute quantities as additive distance terms. One serious drawback of such operations based on numerical constraints such as the triangle inequality or distance ranges is that the number of objects actually examined can be highly variable, so much so that the overall execution time cannot be easily predicted.

Similarity search, researchers and practitioners have investigated practical methods for speeding up the computation of neighborhood information at the expense of accuracy. For data mining applications, the approaches considered have included feature sampling for local outlier detection, data sampling for clustering, and approximate similarity search for k-NN classification. Examples of fast approximate similarity search indices include the BD-Tree, a widely-recognized benchmark for approximate k-NN search; it makes use of splitting rules and early termination to improve upon the performance of the basic KD-Tree. One of the most popular methods for indexing, Locality-Sensitive Hashing can also achieve good practical search performance for range queries by managing parameters that influence a tradeoff between accuracy and time.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1 GHz
RAM – 256 MB (min)
Hard Disk – 20 GB
Floppy Drive – 44 MB
Key Board – Standard Windows Keyboard
Mouse – Two or Three Button Mouse
Monitor – SVGA

SOFTWARE REQUIREMENTS:

JAVA

Operating System : Windows XP or Win7
Front End : JAVA JDK 1.7
Back End : MYSQL Server
Server : Apache Tomact Server
Script : JSP Script
Document : MS-Office 2007

.NET

Operating System : Windows XP or Win7
Front End : Microsoft Visual Studio .NET 2008
Script : C# Script
Back End : MS-SQL Server 2005
Document : MS-Office 2007

Tags: .net, 2019, 2019-2020, 2020, Android, Android (Operating System), Android app ideas, Android app ideas 2019, Android app ideas 2020, Android app ideas for beginners, Android app ideas for college project, Android app ideas for students, Android app project, Android app project ideas, Android app projects, Android based projects, Android mini project topics, Android Mini Projects, Android php projects, Android Project, Android Project Ideas, Android project ideas 2019, Android project ideas for beginners, Android project ideas for computer science, Android project ideas for students, Android Project Ideas Of 2019, Android Project Ideas Of 2020, Android project ideas with source code, Android Project Titles, Android project topics, Android project with source code, Android project with source code for students, Android Projects, Android Projects For Final Year, Android Projects Ideas, Android projects list, Android Projects Topics, Android Projects With Source Code, Android Studio, Android Studio Project, Android Studio Tutorial, Android Tutorial, Capstone Project Titles, Create Android Project, Final Year Android Project Titles, Final Year Android Projects, Hosur, How To Create New Android Studio Project 2019 2020, Ieee Projects, Ieee Projects Php, In Your Android Project, Java, Kumbakonam, Mannargudi, Mayiladuthurai, Mca Android Projects, Mca final year projects, Mca final year projects titles, Mca mini project titles with abstract, Mca project ideas, Mca project titles, Mca project topics, Mca projects in android, Mca projects in php, Mca Projects Titles, Mini project topics for mca, Mini projects for mca 5th sem, Php, Php Project Titles, Php project topics, Php project topics for mca, Project, Project center in hosur, Project center in kumbakonam, Project center in mannargudi, Project center in mayiladuthurai, Project center in thanjavur, Project center in trichy, Project Ideas, Thanjavur, Trichy

Family Room Cafe

RANK-BASED SIMILARITY SEARCH REDUCING THE DIMENSIONAL DEPENDENCE

ABSTRACT:

INTRODUCTION

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v Processor – Pentium –IV

SOFTWARE REQUIREMENTS:

PHP Project Ideas

Recent Posts

Archives

MCA Project Topics

Android Projects Titles

Categories

Android Project Ideas

Family Room Cafe