Predict TF binding regions¶
This notebook shows how to use the ChromBERT-tools Python API predict_tf_binding_regions to predict TF binding regions.
For the bash command-line usage, see `examples/cli/predict_tf_binding_regions.ipynb <../cli/predict_tf_binding_regions.ipynb>`__.
For more details, please refer to the `predict_tf_binding_regions <https://chrombert-tools.readthedocs.io/en/latest/commands/predict_tf_binding_regions.html>`__ command documentation.
[ ]:
from chrombert_tools import predict_tf_binding_regions
[ ]:
# Return
# results_pro_df: Imputed peak probabilities.
results_pro_df = predict_tf_binding_regions(
cistrome = "BCL11A:GM12878;BRD4:MCF7;CTCF:HepG2;MYC:H1;MYC:h9;SPI1:GSM2702714", # your cistrome: TF:cell_type
region="../data/CTCF_ENCFF664UGR_sample100.bed", # your input regions
odir="./output_tf_binding_regions", # output directory
genome="hg38", # Options: mouse
resolution="1kb", # Options: only 1kb
)
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 0
celltype: h1 has no corresponding wild type dnase data in ChromBERT.
Note: All cistromes names were converted to lowercase for matching.
Cistromes count summary - requested: 6, matched in ChromBERT: 5, not found: 1, not found cistromes: ['myc:h1']
ChromBERT cistromes metas: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_meta.tsv
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Loading checkpoint from /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_prompt_cistrome.ckpt
Loading from pl module, remove prefix 'model.'
Loading from pl module, replace 'pretrain_model' with 'pretrain_model.chrombert'
Loaded 112/112 parameters
Imputing cistromes: 100%|██████████| 25/25 [00:05<00:00, 4.77it/s]
Finished imputing cistromes on specific regions.
Focus region summary - total: 100, overlapping with ChromBERT: 100, non-overlapping: 0
Overlapping regions BED file: ./output_tf_binding_regions/overlap_region.bed
Non-overlapping regions BED file: ./output_tf_binding_regions/no_overlap_region.bed
Results saved to: ./output_tf_binding_regions/results_prob_df.csv
Results track files saved to: ./output_tf_binding_regions/*.bw
[5]:
results_pro_df
[5]:
| input_chrom | input_start | input_end | chrombert_build_region_index | chrombert_start | chrombert_end | bcl11a:gm12878 | brd4:mcf7 | ctcf:hepg2 | myc:h9 | spi1:gsm2702714 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | chr1 | 37989946 | 37990368 | 32658 | 37990000 | 37991000 | 0.781250 | 0.660156 | 0.984375 | 0.972656 | 0.632812 |
| 1 | chr11 | 2400199 | 2400617 | 289179 | 2400000 | 2401000 | 0.664062 | 0.570312 | 0.972656 | 0.882812 | 0.917969 |
| 2 | chr12 | 6778809 | 6779319 | 391108 | 6779000 | 6780000 | 0.527344 | 0.412109 | 0.980469 | 0.871094 | 0.503906 |
| 3 | chr12 | 52980788 | 52981316 | 424926 | 52981000 | 52982000 | 0.174805 | 0.601562 | 0.976562 | 0.812500 | 0.345703 |
| 4 | chr12 | 53676021 | 53676448 | 425578 | 53676000 | 53677000 | 0.494141 | 0.699219 | 0.968750 | 0.945312 | 0.570312 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | chr6 | 53171843 | 53172315 | 1660979 | 53172000 | 53173000 | 0.408203 | 0.474609 | 0.988281 | 0.566406 | 0.617188 |
| 96 | chr6 | 131628105 | 131628616 | 1713078 | 131628000 | 131629000 | 0.632812 | 0.667969 | 0.988281 | 0.894531 | 0.773438 |
| 97 | chr6 | 158704189 | 158704642 | 1735665 | 158704000 | 158705000 | 0.558594 | 0.251953 | 0.972656 | 0.613281 | 0.554688 |
| 98 | chr9 | 128117589 | 128118035 | 2049996 | 128117000 | 128118000 | 0.597656 | 0.468750 | 0.972656 | 0.812500 | 0.468750 |
| 99 | chr9 | 136122853 | 136123320 | 2057396 | 136123000 | 136124000 | 0.167969 | 0.310547 | 0.968750 | 0.394531 | 0.400391 |
100 rows × 11 columns
[ ]:
[ ]: