Home »

# Dbscan

## The meaning of «dbscan»

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.[1] It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.[2]

In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, ACM SIGKDD.[3] As of July 2020[update], the follow-up paper "DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN"[4] appears in the list of the 8 most downloaded articles of the prestigious ACM Transactions on Database Systems (TODS) journal.[5]

In 1972, Robert F. Ling published a closely related algorithm in "The Theory and Construction of k-Clusters"[6] in The Computer Journal with an estimated runtime complexity of O(n³).[6] DBSCAN has a worst-case of O(n²), and the database-oriented range-query formulation of DBSCAN allows for index acceleration. The algorithms slightly differ in their handling of border points.

Consider a set of points in some space to be clustered. Let ε be a parameter specifying the radius of a neighborhood with respect to some point. For the purpose of DBSCAN clustering, the points are classified as core points, (density-) reachable points and outliers, as follows:

Now if p is a core point, then it forms a cluster together with all points (core or non-core) that are reachable from it. Each cluster contains at least one core point; non-core points can be part of a cluster, but they form its "edge", since they cannot be used to reach more points.

Reachability is not a symmetric relation: by definition, only core points can reach non-core points. The opposite is not true, so a non-core point may be reachable, but nothing can be reached from it. Therefore, a further notion of connectedness is needed to formally define the extent of the clusters found by DBSCAN. Two points p and q are density-connected if there is a point o such that both p and q are reachable from o. Density-connectedness is symmetric.

DBSCAN requires two parameters: ε (eps) and the minimum number of points required to form a dense region[a] (minPts). It starts with an arbitrary starting point that has not been visited. This point's ε-neighborhood is retrieved, and if it contains sufficiently many points, a cluster is started. Otherwise, the point is labeled as noise. Note that this point might later be found in a sufficiently sized ε-environment of a different point and hence be made part of a cluster.

### Related Searches

 Descanso Dominical Descant (magazine) Descanso, California Descanso Gardens DBS Bank Descant Discant Descanso (spider) Descanso, Santa Catarina

### Choice of words

 d-bscan_ _db-scan_ _dbs-can_ _dbsc-an_ _dbsca-n_ _dbscan-_ _ dbscan:_ _ _ _dbscan_ _ _ _dbscan_ - _ _ _dbscan-_ _ _ _dbscan _ _ _ _ _dbscan _ - _ _ _ _