EnrichRBP.featureSelection¶
The EnrichRBP integrates feature selection methods based on four different evaluation approaches (information theoretical based, similarity based, sparse learning based and statistical based).
Information theoretical based¶
- EnrichRBP.featureSelection.cife(features, labels, num_features=10)¶
This function uses Conditional Infomax Feature Extraction [CIFE] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Dahua Lin and Xiaoou Tang. 2006. Conditional infomax learning: An integrated framework for feature extraction and fusion. In ECCV. 68–82.
- EnrichRBP.featureSelection.cmim(features, labels, num_features=10)¶
This function uses Conditional Mutual Information Maximization [CMIM] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
François Fleuret. 2004. Fast binary feature selection with conditional mutual information. JMLR 5 (2004), 1531–1555.
- EnrichRBP.featureSelection.disr(features, labels, num_features=10)¶
This function uses Double Input Symmetrical Relevance [DISR] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Patrick Emmanuel Meyer, Colas Schretter, and Gianluca Bontempi. 2008. Information-theoretic feature selection in microarray data using variable complementarity. IEEE J. Select. Top. Sign. Process. 2, 3 (2008), 261–274.
- EnrichRBP.featureSelection.icap(features, labels, num_features=10)¶
This function uses mutual information [ICAP] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Ali El Akadi, Abdeljalil El Ouardighi, and Driss Aboutajdine. 2008. A powerful feature selection approach based on mutual information. Int. J. Comput. Sci. Netw. Secur. 8, 4 (2008), 116.
- EnrichRBP.featureSelection.jmi(features, labels, num_features=10)¶
This function uses Joint Mutual Information [JMI] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Patrick Emmanuel Meyer, Colas Schretter, and Gianluca Bontempi. 2008. Information-theoretic feature selection in microarray data using variable complementarity. IEEE J. Select. Top. Sign. Process. 2, 3 (2008), 261–274.
- EnrichRBP.featureSelection.mifs(features, labels, num_features=10)¶
This function uses Mutual Information Feature Selection [MIFS] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Roberto Battiti. 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Network. 5, 4 (1994), 537–550.
- EnrichRBP.featureSelection.mim(features, labels, num_features=10)¶
This function uses Mutual Information Maximization [MIM] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
David D. Lewis. 1992. Feature selection and feature extraction for text categorization. In Proceedings of the Workshop on Speech and Natural Language. 212–217.
- EnrichRBP.featureSelection.mrmr(features, labels, num_features=10)¶
This function uses Minimum Redundancy Maximum Relevance [MRMR] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 8 (2005), 1226–1238.
Similarity based¶
- EnrichRBP.featureSelection.fisherScore(features, labels, num_features=10)¶
This function uses Fisher Score [fisherscore] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Richard O. Duda, Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley & Sons.
- EnrichRBP.featureSelection.relief_f(features, labels, num_features=10)¶
This function uses ReliefF [reliefF] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53, 1-2 (2003), 23–69.
- EnrichRBP.featureSelection.traceRatio(features, labels, num_features=10)¶
This function uses Trace Ratio Criterion [traceratio] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, and Shuicheng Yan. 2008. Trace ratio criterion for feature selection. In AAAI. 671–676.
Sparse learning based¶
- EnrichRBP.featureSelection.llL21(features, labels, num_features=10)¶
This function uses l2,1-norm regularization-based feature selection method [lll21] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Jiliang Tang, Salem Alelyani, and Huan Liu. 2014. Feature selection for classification: A review. Data Classification: Algorithms and Applications (2014), 37.
- EnrichRBP.featureSelection.lsL21(features, labels, num_features=10)¶
This function uses l2,1-norm regularization-based feature selection method [lsl21] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Jiliang Tang, Salem Alelyani, and Huan Liu. 2014. Feature selection for classification: A review. Data Classification: Algorithms and Applications (2014), 37.
Statistical based¶
- EnrichRBP.featureSelection.cfs(features, labels, num_features=10)¶
This function uses correlation-based filter approach [CFS] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Mark A. Hall and Lloyd A. Smith. 1999. Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In FLAIRS. 235–239.
- EnrichRBP.featureSelection.chiSquare(features, labels, num_features=10)¶
This function uses Chi-Square Score [chisquare] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Huan Liu and Rudy Setiono. 1995. Chi2: Feature selection and discretization of numeric attributes. In ICTAI. 388–391.
- EnrichRBP.featureSelection.fScore(features, labels, num_features=10)¶
This function uses F-score [fscore] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Wright, Sewall. “The Interpretation of Population Structure by F-Statistics with Special Regard to Systems of Mating.” Evolution, vol. 19, no. 3, 1965, pp. 395–420.
- EnrichRBP.featureSelection.giniIndex(features, labels, num_features=10)¶
This function uses Gini Index [giniindex] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
Gini. 1912. Variability and mutability, contribution to the study of statistical distribution and relaitons. Studi Economico-Giuricici Della R (1912).
- EnrichRBP.featureSelection.tScore(features, labels, num_features=10)¶
This function uses T-score [tscore] for feature selection and returns the corresponding best feature matrix.
- Parameters:
- features:numpy array, shape (n_samples, n_features)
Sample feature matrix to be processed.
- labels:numpy array, shape (n_samples, )
Class labels according to the input samples.
- num_features:int, default=10
Number of selected features.
John C. Davis and Robert J. Sampson. 1986. Statistics and Data Analysis in Geology. Vol. 646. Wiley. New York.