Evaluating classifiers examples ================================================================== In the EnrichRBP, we integrate several machine learning classifiers from sklearn and implement several classical deep learning models for users to perform performance tests, for which we provide two easy-to-use functions for machine learning classifiers and deep learning models respectively. Importing related modules ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: py from EnrichRBP.evaluateClassifiers import evaluateDLclassifers, evaluateMLclassifers from EnrichRBP.Features import generateDynamicLMFeatures, generateStructureFeatures, generateBPFeatures from EnrichRBP.filesOperation import read_fasta_file, read_label Evaluating various machine learning classifiers using biological features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A total of 11 machine learning classifiers are included in the ``EnrichRBP``. After the function finishes running, an ``ML_evalution_metrics.csv`` is generated, which contains the performance metrics of each classifier on the dataset. .. code-block:: py fasta_path = '/home/wangyansong/wangyubo/EnrichRBP/src/RNA_datasets/circRNAdataset/AGO1/seq' label_path = '/home/wangyansong/wangyubo/EnrichRBP/src/RNA_datasets/circRNAdataset/AGO1/label' sequences = read_fasta_file(fasta_path) # read sequences and labels from given path label = read_label(label_path) # Generating PGKM features for example. biological_features = generateBPFeatures(sequences, PGKM=True) # Perform feature selection to refine the biological features refined_biological_features = cife(biological_features, label, num_features=10) # Perform a 5-fold cross-validation of the machine learning classifier using biological features, and store the result file in the current folder. evaluateMLclassifers(refined_biological_features, folds=5, labels=label, file_path='./', shuffle=True) output: :: Starting runnning machine learning classifiers using 5-fold cross-validation, please be patient... running LogisticRegression... finish running KNeighborsClassifier... finish running DecisionTreeClassifier... finish running GaussianNB... finish running BaggingClassifier... finish running RandomForestClassifier... finish running AdaBoostClassifier... finish running GradientBoostingClassifier... finish running SVM... finish running LinearDiscriminantAnalysis... finish running ExtraTreesClassifier... finish All classifiers have finished running, the result file are locate in ./ Evaluating various deep learning models using dynamic semantic information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In ``EnrichRBP`` we implement four classes of classical deep learning models, including ``CNN``, ``RNN``, ``ResNet-1D``, and ``MLP``. After the function finishes running, an ``DL_evalution_metrics.csv`` is generated, which contains the performance metrics of each model on the dataset. We use the same dataset as previous example to evaluate deep learning models. .. code-block:: py # Generating 4mer dynamic semantic information for evaluating models. dynamic_semantic_information = generateDynamicLMFeatures(sequences, kmer=4, model='/home/wangyansong/wangyubo/EnrichRBP/src/dynamicRNALM/circleRNA/pytorch_model_4mer') # Perform a 5-fold cross-validation of the machine learning classifier using biological features, and store the result file in the current folder. evaluateDLclassifers(dynamic_semantic_information, folds=5, labels=label, file_path='./', shuffle=True) output: :: Starting runnning deep learning models using 5-fold cross-validation, please be patient... running CNN... (some log information) finish running RNN... (some log information) finish running ResNet-1D... (some log information) finish running MLP (some log information) finish All models have finished running, the result file are locate in ./ .. note:: The performance in the package is for reference only, and targeted hyperparameters need to be set for specific datasets to perform at their best.