Cluster Analysis
June 5, 2023
Cluster Analysis
Clustering is the process of making a group of abstract objects into classes of similar objects. A cluster of data objects can be treated as one group. While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups.
“Cluster analysis is a multivariate data mining technique whose goal is to groups objects (eg., products, respondents, or other entities) based on a set of user selected characteristics or attributes. It is the basic and most important step of data mining and a common technique for statistical data analysis, and it is used in many fields such as data compression, machine learning, pattern recognition, information retrieval etc”

Types of Cluster Analysis
The clustering algorithm needs to be chosen experimentally unless there is a mathematical reason to choose one cluster method over another. It should be noted that an algorithm that works on a particular set of data will not work on another set of data. There are a number of different methods to perform cluster analysis. Some of them are :
Hierarchical Cluster Analysis :
In this method, first, a cluster is made and then added to another cluster (the most similar and closest one) to form one single cluster. This process is repeated until all subjects are in one cluster. This particular method is known as Agglomerative method. Agglomerative clustering starts with single objects and starts grouping them into clusters.
The divisive method is another kind of Hierarchical method in which clustering starts with the complete data set and then starts dividing into partitions.
Centroid-based Clustering :
In this type of clustering, clusters are represented by a central entity, which may or may not be a part of the given data set. K-Means method of clustering is used in this method, where k is the cluster centers and objects are assigned to the nearest cluster centers.

Distribution-based Clustering :
It is a type of clustering model closely related to statistics based on the modals of distribution. Objects that belong to the same distribution are put into a single cluster. This type of clustering can capture some complex properties of objects like correlation and dependence between attributes.

Distribution-based Clustering :
In this type of clustering, clusters are defined by the areas of density that are higher than the remaining of the data set. Objects in sparse areas are usually required to separate clusters.The objects in these sparse points are usually noise and border points in the graph.The most popular method in this type of clustering is DBSCAN.

Requirements of Clustering Data Mining Techniques
• Scalability: Many clustering techniques work well on small data sets with less than 200 data objects however a huge database might include millions of objects. Clustering on a subset of a big dataset might result in skewed findings. Clustering methods that are highly scalable are required.
• Usability and interpretability: Users anticipate interpretable, thorough, and usable clustering findings. As a result, clustering may require unique semantic interpretations and applications. It’s crucial to investigate how the application aim influences Clustering Data Mining technique selection.
• High dimensionality: A database or a data warehouse can have several dimensions or properties. Many clustering algorithms excel at dealing with low-dimensional data (two or three dimensions). Human eyes are capable of assessing clustering quality in up to three dimensions
• Constraint-based clustering: Clustering may be required in real-world applications due to a variety of restrictions. Assume you’re in charge of selecting locations for a certain number of new automatic cash dispensing machines (ATMs) in a city. You may decide this by clustering households while taking into account limits such as the city’s waterways, highway networks, and client needs per area.
Interview Questions :
1. What is cluster analysis?
2. How many types of cluster analysis are there?What are they ?
3. Difference between centroid based and density based analysis?
4. What are the requirements for clustering data ?
5. what is Distribution based cluster analysis?
Relative Blogs
June 5, 2023
June 5, 2023
Feb 27, 2023