Welcome to EnrichRBP’s documentation!

Date: November 28, 2024. Version: 0.1.0

paper: EnrichRBP: an automated and interpretable computational platform for predicting and analyzing RNA-binding protein events

Citing Us:

If you find EnrichRBP helpful in your work or research, we would greatly appreciate citations to the following paper:

Unpublished yet

EnrichRBP is a powerful web service that enables researchers to develop original deep learning and machine learning architectures to explore the complex dynamics of RNA-binding proteins.

EnrichRBP supports:

  • 70 innovative deep learning algorithms, covering feature representation, selection, model training, comparison, optimization, and evaluation, all integrated within an automated pipeline.

  • comprehensive visualizations, enhancing model interpretability, and facilitating the discovery of functionally significant sequence regions crucial for RBP interactions.

  • ultra-fast predictions ranging from seconds to hours, applicable to both pre-trained and custom model scenarios, thus proving its utility in real-world applications.

API Demo

from EnrichRBP.filesOperation import read_fasta_file, read_label
from EnrichRBP.Features import generateDynamicLMFeatures, generateStaticLMFeatures, generateStructureFeatures, generateBPFeatures
from EnrichRBP.evaluateClassifiers import evaluateDLclassifers
from EnrichRBP.metricsPlot import violinplot, shap_interaction_scatter
from EnrichRBP.featureSelection import cife
from sklearn.svm import SVC

fasta_path = '/home/wangyansong/wangyubo/EnrichRBP/src/RNA_datasets/circRNAdataset/AGO1/seq'
label_path = '/home/wangyansong/wangyubo/EnrichRBP/src/RNA_datasets/circRNAdataset/AGO1/label'

sequences = read_fasta_file(fasta_path)  # read sequences and labels from given path
label = read_label(label_path)

biological_features = generateBPFeatures(sequences, PGKM=True)  # generate biological features
bert_features = generateDynamicLMFeatures(sequences, kmer=4, model='/home/wangyansong/wangyubo/EnrichRBP/src/dynamicRNALM/circleRNA/pytorch_model_4mer')  # generate dynamic semantic information
static_features = generateStaticLMFeatures(sequences, kmer=3, model='/home/wangyansong/wangyubo/EnrichRBP/src/staticRNALM/circleRNA/circRNA_3mer_fasttext')
structure_features = generateStructureFeatures(fasta_path, script_path='/home/wangyansong/wangyubo/EnrichRBP/src/EnrichRBP/RNAplfold', basic_path='/home/wangyansong/wangyubo/EnrichRBP/src/circRNAdatasetAGO1', W=101, L=70, u=1)  # generate secondary structure information


refined_biological_features = cife(biological_features, label, num_features=10)  # refine the biologcial_feature using cife feature selection method


evaluateDLclassifers(bert_features, folds=10, labels=label, file_path='./', shuffle=True)  # evaluate CNN, RNN, ResNet-1D and MLP using dynamic semantic information

clf = SVC(probability=True)
shap_interaction_scatter(refined_biological_features, label, clf=clf, sample_size=(0, 100), feature_size=(0, 10), image_path='./')  # Plotting the interaction between biological features in SVM

API