Filter Methods
Aug 02, 2023
Filter Methods
Filter methods are feature selection techniques used to select the most relevant and informative features from a dataset before training a machine learning model. These methods evaluate the individual features independently of the target variable and rank them based on certain criteria. The selected features are then used for model training, leading to reduced dimensionality and potentially improved model performance.
1. Chi-squared (chi2): Chi-squared is a statistical test used to assess the independence between two categorical variables. In feature selection, chi-squared is applied to determine the relevance of each feature in a classification task.
The formula for the chi-squared test statistic between a categorical feature X and Categorical target variable Y.
Where,
● O is the observed frequency (count) of each category in the contingency table.
● E is the expected frequency under the assumption of independence between X & Y.
A higher chi-squared value indicates a stronger dependency between the feature and the target, suggesting its relevance for classification.
2. Pearson's Correlation Coefficient (PCC): As mentioned earlier, Pearson's correlation coefficient measures the linear relationship between two continuous variables. In feature selection, it quantifies the correlation between each feature and the target variable.
The formula for Pearson's correlation coefficient between two continuous variables X and Y.
Where,
● Cov( X, Y ) is the covariance between X and Y.
● σX is standard deviation of X
● σY is standard deviation of Y
A higher absolute value of PCC indicates a stronger linear correlation between the feature and the target, implying its relevance for regression or classification tasks.
3. Fisher Score: The Fisher Score (also known as Fisher Discriminant Ratio) is a measure of feature relevance in classification tasks, particularly useful when dealing with binary class labels.
The formula for the Fisher Score between a feature X and a binary target variable Y is,
Where,
● m1 and m2 are the means of feature X for class 1 and class 2 respectively.
● s1 and s2 are standard deviations of feature X for class 1 and class 2
respectively.
A higher Fisher Score indicates a feature that has good class separability, making it relevant for discriminating between the classes.
4. Variance threshold : Another commonly used filter method for feature selection. It aims to remove features with low variance as they are considered less informative for predicting the target variable. The intuition behind this approach is that features with low variance do not vary much across the dataset and thus may not contribute much to the model's ability to discriminate between different classes or make accurate predictions.
Here's how the variance threshold method works:
Step 1: Calculate Variance for Each Feature For each feature in the dataset, calculate its variance, which is a measure of how much the feature values deviate from their mean.
Step 2: Set the Threshold Choose a threshold value, denoted as T, to determine the minimum acceptable variance. Features with variance below this threshold are considered low-variance features and are potential candidates for removal.
Step 3: Select Features Keep only those features whose variance is greater than or equal to the specified threshold T.
By applying the variance threshold filter method, features that have a variance below the threshold are removed from the dataset, leading to a reduced set of features. This process can help in reducing the dimensionality of the dataset and potentially improve the efficiency and interpretability of the subsequent machine learning model.
5. Information Gain (Info Gain): Information Gain is a metric used in decision tree-based algorithms for feature selection. It quantifies the amount of information a feature provides about the target variable in a classification task.
The formula for Information Gain of a feature X with respect to a target variable Y is:
Where,
● H (Y) is the entropy of the target variable Y.
● H (Y|X) conditional entropy of Y given feature X.
A higher Information Gain implies that the feature contains more valuable information for classification.
Interview Questions :
1. What is Filter Methods?
2. What is Fisher Score?
3. What is Information Gain?
4. What is Pearson's Correlation Coefficient (PCC)?
5. What is Chi-squared (chi2)?
6. How the variance threshold method works in Variance threshold?
Relative Blogs
Aug 02, 2023
Aug 02, 2023
Feb 27, 2023