Interpret region-region interactions¶
Note: The remaining examples show command-line usage only (bash).
interpret_region_region_interactions subcommand: Uses the pre-trained ChromBERT model or fine-tuned ChromBERT to infer region-region interaction on user-specified enhancer regions.
For the Python API, see `examples/api/interpret_region_region_interactions.ipynb <../api/interpret_region_region_interactions.ipynb>`__.
If you need to use Apptainer container, please refer to the `apptainer_use.ipynb <apptainer_use.ipynb>`__ tutorial for detailed instructions on using apptainer exec with chrombert-tools.
For more details, please refer to the `interpret_region_region_interactions <https://chrombert-tools.readthedocs.io/en/latest/commands/interpret_region_region_interactions.html>`__ command documentation
infer region-region interactions (enhancer-promoter loop; only by pretrained chrombert)¶
[ ]:
%%bash
# --region: your focus enhancer region
# --odir: output directory
# --genome: genome
# --resolution: resolution
# --gene: focus on the gene promoter; otherwise, consider all genes.
chrombert-tools interpret_region_region_interactions \
--region '../data/hESC_GSM2386582_ATAC.bed' \
--odir "./output_infer_ep" \
--genome "hg38" \
--resolution "1kb" \
--gene "RNVU1-15"
Region summary - total: 5262, overlapping with ChromBERT: 5490 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 33
Gene filter: kept 1/55240 TSS rows (gene_name in [1 names], gene_id in [0 ids])
Finished!
Enhancer-promoter style pairs saved to: ./output_infer_ep/tss_region_pairs_cos.tsv
[5]:
# infer enhancer-promoter loop
# cos_sim: cosine similarity between the enhancer region embedding and the gene promoter (TSS) region embedding; higher values indicate a more likely enhancer–promoter loop.
import pandas as pd
tss_region_pairs_cos = pd.read_csv("output_infer_ep/tss_region_pairs_cos.tsv",sep='\t')
tss_region_pairs_cos
[5]:
| chrom | gene_id | gene_name | tss | tss_build_region_index | distal_region_start | distal_region_end | distal_region_build_region_index | dist | dist_bin | cos_sim | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144546000 | 144547000 | 99004 | 133424 | 79 | 0.966797 |
| 1 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144551000 | 144552000 | 99008 | 138424 | 83 | 0.910645 |
| 2 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144560000 | 144561000 | 99015 | 147424 | 90 | 0.794922 |
| 3 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144461000 | 144462000 | 98949 | 48424 | 24 | 0.699219 |
| 4 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144419000 | 144420000 | 98930 | 6424 | 5 | 0.688477 |
| 5 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144524000 | 144525000 | 98985 | 111424 | 60 | 0.651367 |
| 6 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144567000 | 144568000 | 99021 | 154424 | 96 | 0.431885 |
| 7 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144490000 | 144491000 | 98966 | 77424 | 41 | 0.307373 |
infer enhancer-promoter interactions (celltype-specific fine-tuned model)¶
[ ]:
# Download example data
# Myoblast and fibroblast data: ATAC-seq peak files
import subprocess
import os
if not os.path.exists('../data/myoblast_ENCFF647RNC_peak.bed'):
cmd = f'wget https://www.encodeproject.org/files/ENCFF647RNC/@@download/ENCFF647RNC.bed.gz -O ../data/myoblast_ENCFF647RNC_peak.bed.gz'
subprocess.run(cmd, shell=True)
cmd = f"gzip -d ../data/myoblast_ENCFF647RNC_peak.bed.gz"
subprocess.run(cmd, shell=True)
[ ]:
# # Download example data
# # Myoblast and fibroblast data: ATAC-seq bigWig and peak files
# import subprocess
# import os
# if not os.path.exists('../data/myoblast_ENCFF647RNC_peak.bed'):
# cmd = f'wget https://www.encodeproject.org/files/ENCFF647RNC/@@download/ENCFF647RNC.bed.gz -O ../data/myoblast_ENCFF647RNC_peak.bed.gz'
# subprocess.run(cmd, shell=True)
# cmd = f"gzip -d ../data/myoblast_ENCFF647RNC_peak.bed.gz"
# subprocess.run(cmd, shell=True)
# if not os.path.exists('../data/myoblast_ENCFF149ERN_signal.bigwig'):
# cmd = f'wget https://www.encodeproject.org/files/ENCFF149ERN/@@download/ENCFF149ERN.bigWig -O ../data/myoblast_ENCFF149ERN_signal.bigwig'
# subprocess.run(cmd, shell=True)
## fine-tuned a cell-type-specific model
# '''
# --odir: output directory
# --acc_signal1: cell-type-specific accessibility signal
# --acc_peak1: cell-type-specific peak
# --genome: genome
# --resolution: resolution
# '''
# !chrombert-tools region_activity_regression \
# --odir "./output_cell_specific_emb_train" \
# --acc_signal1 "../data/myoblast_ENCFF149ERN_signal.bigwig" \
# --acc_peak1 "../data/myoblast_ENCFF647RNC_peak.bed" \
# --genome "hg38" \
# --resolution "1kb"
[7]:
import glob
ft_ckpt_dir = "./output_cell_specific_emb_train/train/**/*.ckpt" # Use checkpoints from embed_region.ipynb if available; otherwise, run the code above first
ft_ckpt = glob.glob(ft_ckpt_dir, recursive=True)[0]
ft_ckpt
[7]:
'./output_cell_specific_emb_train/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt'
[11]:
# --region: your focus enhancer region
# --odir: output directory
# --genome: genome
# --resolution: resolution
# --ft-ckpt: fine-tuned model checkpoint
# --batch-size: batch size
# --gene: focus on the gene promoter; otherwise, consider all genes.
!export CUDA_VISIBLE_DEVICES=1
!chrombert-tools interpret_region_region_interactions \
--region '../data/myoblast_ENCFF647RNC_peak.bed' \
--odir "./output_infer_ep_myoblast_specific" \
--genome "hg38" \
--resolution "1kb" \
--ft-ckpt {ft_ckpt} \
--batch-size 64 \
--gene "RNVU1-15"
Region summary - total: 373422, overlapping with ChromBERT: 368260 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 7920
Gene filter: kept 1/55240 TSS rows (gene_name in [1 names], gene_id in [0 ids])
Gene filter: kept 33479/368260 region1 (BED) rows on 1 chromosome(s) matching the selected gene(s)
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Loading checkpoint from ./output_cell_specific_emb_train/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt
Loading from pl module, remove prefix 'model.'
Loading from pl module, replace 'pretrain_model' with 'pretrain_model.chrombert'
Loaded 111/111 parameters
100%|█████████████████████████████████████████| 461/461 [07:41<00:00, 1.00s/it]
Finished!
Enhancer-promoter style pairs saved to: ./output_infer_ep_myoblast_specific/tss_region_pairs_cos.tsv
[12]:
# infer enhancer-promoter loop
# cos_sim: cosine similarity between the enhancer region embedding and the gene promoter (TSS) region embedding; higher values indicate a more likely enhancer–promoter loop.
tss_region_pairs_cos_myoblast = pd.read_csv("output_infer_ep_myoblast_specific/tss_region_pairs_cos.tsv",sep='\t')
tss_region_pairs_cos_myoblast
[12]:
| chrom | gene_id | gene_name | tss | tss_build_region_index | distal_region_start | distal_region_end | distal_region_build_region_index | dist | dist_bin | cos_sim | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144546000 | 144547000 | 99004 | 133424 | 79 | 0.969920 |
| 1 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144546000 | 144547000 | 99004 | 133424 | 79 | 0.969920 |
| 2 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144552000 | 144553000 | 99009 | 139424 | 84 | 0.936230 |
| 3 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144551000 | 144552000 | 99008 | 138424 | 83 | 0.921858 |
| 4 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144560000 | 144561000 | 99015 | 147424 | 90 | 0.744101 |
| 5 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144413000 | 144414000 | 98926 | 424 | 1 | 0.613024 |
| 6 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144461000 | 144462000 | 98949 | 48424 | 24 | 0.592507 |
| 7 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144461000 | 144462000 | 98949 | 48424 | 24 | 0.592507 |
| 8 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144524000 | 144525000 | 98985 | 111424 | 60 | 0.569692 |
| 9 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144419000 | 144420000 | 98930 | 6424 | 5 | 0.567447 |
| 10 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144419000 | 144420000 | 98930 | 6424 | 5 | 0.567447 |
| 11 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144547000 | 144548000 | 99005 | 134424 | 80 | 0.550278 |
| 12 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144523000 | 144524000 | 98984 | 110424 | 59 | 0.495028 |
| 13 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144550000 | 144551000 | 99007 | 137424 | 82 | 0.486923 |
| 14 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144545000 | 144546000 | 99003 | 132424 | 78 | 0.414352 |
| 15 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144519000 | 144520000 | 98980 | 106424 | 55 | 0.222717 |
| 16 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144522000 | 144523000 | 98983 | 109424 | 58 | 0.199014 |
| 17 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144518000 | 144519000 | 98979 | 105424 | 54 | 0.195656 |
| 18 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144518000 | 144519000 | 98979 | 105424 | 54 | 0.195656 |
| 19 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144518000 | 144519000 | 98979 | 105424 | 54 | 0.195656 |
| 20 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144518000 | 144519000 | 98979 | 105424 | 54 | 0.195656 |
| 21 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144489000 | 144490000 | 98965 | 76424 | 40 | 0.194427 |
| 22 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144490000 | 144491000 | 98966 | 77424 | 41 | 0.192645 |
| 23 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144494000 | 144495000 | 98968 | 81424 | 43 | 0.182398 |
| 24 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144521000 | 144522000 | 98982 | 108424 | 57 | 0.179958 |
| 25 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144520000 | 144521000 | 98981 | 107424 | 56 | 0.174450 |
| 26 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144517000 | 144518000 | 98978 | 104424 | 53 | 0.173654 |
| 27 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144470000 | 144471000 | 98953 | 57424 | 28 | 0.126555 |
| 28 | chr1 | ENSG00000207205 | RNVU1-15 | 144412576 | 98925 | 144502000 | 144503000 | 98971 | 89424 | 46 | 0.114761 |