Using ChromBERT-tools with a Apptainer Container¶
This notebook demonstrates how to use ChromBERT-tools commands with a Apptainer container.
Key Apptainer Parameters¶
--nv: Enable NVIDIA GPU support (required for GPU acceleration)--bind: Mount local directories into the container (format:--bind /local/path:/container/path)--pwd: Set working directory inside the container
Notes¶
All
chrombert-toolscommands work the same way inside the container.Running chromBERT-tools in the container produces the same outputs (format and directory structure) as running it on the host after a normal installation.
For detailed command usage and output analysis, refer to other tutorial notebooks (e.g.,
embed_regulator.ipynb,predict_cell_type_master_regulators.ipynb).You don’t need to launch Jupyter from the container image.
[1]:
import os
workdir="/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli" # your workdir
os.chdir(workdir)
os.environ["CUDA_VISIBLE_DEVICES"] = "2" # gpu device
[2]:
sif_file = "/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/Singularity/chrombert-tools.sif" # your image file
! apptainer exec --nv {sif_file} chrombert-tools -h
INFO: fuse2fs not found, will not be able to mount EXT3 filesystems
INFO: gocryptfs not found, will not be able to use gocryptfs
Usage: chrombert-tools [OPTIONS] COMMAND [ARGS]...
Type -h or --help after any subcommand for more information.
Options:
-v, --verbose Verbose logging
-d, --debug Post mortem debugging
-V, --version Show the version and exit.
-h, --help Show this message and exit.
Commands:
embed_region Generate region...
embed_regulator Extract regulator...
gene_activity_regression Predict gene...
interpret_region_region_interactions
Region embedding...
interpret_regulator_effects_between_region_groups
Identify regulators...
interpret_regulator_regulator_interactions
Interpret...
predict_cell_type_master_regulators
Find candidate key...
predict_regulator_context_cofactors
Identify...
predict_tf_binding_regions Predict TF binding...
predict_transition_driver_regulators
Find driver factors...
region_activity_regression Predict region...
region_function_classification Classify genomic...
[3]:
# Define example data file
region_file = '../data/CTCF_ENCFF664UGR_sample100.bed'
Basic Usage: Check Available Commands¶
[4]:
! apptainer exec {sif_file} chrombert-tools embed_regulator -h
INFO: fuse2fs not found, will not be able to mount EXT3 filesystems
INFO: gocryptfs not found, will not be able to use gocryptfs
Usage: chrombert-tools embed_regulator [OPTIONS]
Extract regulator embeddings on specified regions. Supports both general and
cell-specific modes.
Options:
--region FILE Region file. [required]
--regulator TEXT Regulators of interest, e.g. EZH2 or
EZH2;BRD4. Use ';' to separate multiple
regulators. [required]
--cell-type-bw FILE Cell type accessibility BigWig file. Used
for cell-specific mode.
--cell-type-peak FILE Cell type accessibility Peak BED file. Used
for cell-specific mode.
--ft-ckpt FILE Fine-tuned checkpoint. If provided, use
cell-specific model and skip fine-tuning.
--odir DIRECTORY Output directory. [default: ./output]
--oname TEXT Output name of the regulator embeddings.
[default: regulator_emb]
--genome [hg38|mm10] Genome. [default: hg38]
--resolution [1kb|200bp|2kb|4kb]
Resolution. [default: 1kb]
--mode [fast|full] Used when training cell-specific model.
[default: fast]
--batch-size INTEGER Batch size. [default: 4]
--num-workers INTEGER Dataloader workers. [default: 8]
--chrombert-cache-dir DIRECTORY
ChromBERT cache dir (contains config/
checkpoint/ etc). [default:
~/.cache/chrombert/data]
-h, --help Show this message and exit.
Example 1: Extract Regulator Embeddings¶
This example demonstrates running embed_regulator with all necessary apptainer parameters.
[5]:
# Run embed_regulator command inside apptainer container
# --nv: Enable NVIDIA GPU
# --bind: Mount local directory to container
# --pwd: Set working directory inside container
! apptainer exec --nv \
--bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
--pwd {workdir} \
{sif_file} \
chrombert-tools embed_regulator \
--region {region_file} \
--regulator "EZH2;BRD4;CTCF;FOXA3;myod1;myF5" \
--odir "./output_emb_regulator_appatiner" \
--genome "hg38" \
--resolution "1kb"
INFO: fuse2fs not found, will not be able to mount EXT3 filesystems
INFO: gocryptfs not found, will not be able to use gocryptfs
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 6, matched in ChromBERT: 5, not found: 1, not found regulator: ['foxa3']
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Computing regulator embeddings: 100%|███████████| 25/25 [00:03<00:00, 7.26it/s]
Finished!
Focus region summary - total: 100, overlapping with ChromBERT: 100, non-overlapping: 0
Overlapping regions BED file: ./output_emb_regulator_appatiner/overlap_region.bed
Non-overlapping regions BED file: ./output_emb_regulator_appatiner/no_overlap_region.bed
Mean regulator embeddings saved to: ./output_emb_regulator_appatiner/mean_regulator_emb.pkl
Region-aware regulator embeddings saved to: ./output_emb_regulator_appatiner/region_aware_regulator_emb.hdf5
Embedding type: general
Example 2: Infer regulator-regulator interaction networks¶
This example demonstrates running interpret_regulator_regulator_interactions with all necessary apptainer parameters.
[6]:
! apptainer exec --nv \
--bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
--pwd {workdir} \
{sif_file} \
chrombert-tools interpret_regulator_regulator_interactions \
--region "../data/CTCF_ENCFF664UGR_sample100.bed" \
--regulator "ctcf" \
--odir "./output_trn_apptainer_1kb" \
--genome "hg38" \
--resolution "1kb"
INFO: fuse2fs not found, will not be able to mount EXT3 filesystems
INFO: gocryptfs not found, will not be able to use gocryptfs
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 1, matched in ChromBERT: 1, not found: 0, not found regulator: []
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
100%|█████████████████████████████████████████████| 2/2 [00:03<00:00, 1.85s/it]
Total graph nodes: 951
Total graph edges (threshold=0.636): 11503
Regulator subnetwork saved to: ./output_trn_apptainer_1kb/subnetwork_ctcf_k1_q0.980_thr0.636.pdf
Finished!
Saved outputs to: ./output_trn_apptainer_1kb
Regulator cosine similarity saved to: ./output_trn_apptainer_1kb/regulator_cosine_similarity.tsv
Total graph edges saved to: ./output_trn_apptainer_1kb/total_graph_edge_threshold0.636_quantile0.980.tsv
Example 3: Predict TF binding regions¶
This example demonstrates running predict_tf_binding_regions with all necessary apptainer parameters.
[7]:
! apptainer exec --nv \
--bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
--pwd {workdir} \
{sif_file} \
chrombert-tools predict_tf_binding_regions \
--cistrome "BCL11A:GM12878;BRD4:MCF7;CTCF:HepG2;MYC:H1;MYC:h9;SPI1:GSM2702714" \
--region "../data/CTCF_ENCFF664UGR_sample100.bed" \
--odir "./output_predict_tf_binding_regions_apptainer" \
--genome "hg38" \
--resolution "1kb"
INFO: fuse2fs not found, will not be able to mount EXT3 filesystems
INFO: gocryptfs not found, will not be able to use gocryptfs
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 0
celltype: h1 has no corresponding wild type dnase data in ChromBERT.
Note: All cistromes names were converted to lowercase for matching.
Cistromes count summary - requested: 6, matched in ChromBERT: 5, not found: 1, not found cistromes: ['myc:h1']
ChromBERT cistromes metas: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_meta.tsv
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Loading checkpoint from /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_prompt_cistrome.ckpt
Loading from pl module, remove prefix 'model.'
Loading from pl module, replace 'pretrain_model' with 'pretrain_model.chrombert'
Loaded 112/112 parameters
Imputing cistromes: 100%|███████████████████████| 25/25 [00:03<00:00, 7.09it/s]
Finished imputing cistromes on specific regions.
Focus region summary - total: 100, overlapping with ChromBERT: 100, non-overlapping: 0
Overlapping regions BED file: ./output_predict_tf_binding_regions_apptainer/overlap_region.bed
Non-overlapping regions BED file: ./output_predict_tf_binding_regions_apptainer/no_overlap_region.bed
Results saved to: ./output_predict_tf_binding_regions_apptainer/results_prob_df.csv
Results track files saved to: ./output_predict_tf_binding_regions_apptainer/*.bw
Example 4: Infer cell-type-specific key regulators¶
This example demonstrates running predict_cell_type_master_regulators with all necessary apptainer parameters.
[ ]:
# # Download example data
# # Myoblast and fibroblast data: ATAC-seq bigWig and peak files
import subprocess
import os
if not os.path.exists('../data/myoblast_ENCFF647RNC_peak.bed'):
cmd = f'wget https://www.encodeproject.org/files/ENCFF647RNC/@@download/ENCFF647RNC.bed.gz -O ../data/myoblast_ENCFF647RNC_peak.bed.gz'
subprocess.run(cmd, shell=True)
cmd = f"gzip -d ../data/myoblast_ENCFF647RNC_peak.bed.gz"
subprocess.run(cmd, shell=True)
if not os.path.exists('../data/myoblast_ENCFF149ERN_signal.bigwig'):
cmd = f'wget https://www.encodeproject.org/files/ENCFF149ERN/@@download/ENCFF149ERN.bigWig -O ../data/myoblast_ENCFF149ERN_signal.bigwig'
subprocess.run(cmd, shell=True)
[8]:
! apptainer exec --nv \
--bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
--pwd {workdir} \
{sif_file} \
chrombert-tools predict_cell_type_master_regulators \
--cell-type-bw "../data/myoblast_ENCFF149ERN_signal.bigwig" \
--cell-type-peak "../data/myoblast_ENCFF647RNC_peak.bed" \
--odir "./output_predict_cell_type_master_regulators_apptainer" \
--genome "hg38" \
--resolution "1kb" 2> "./tmp/infer_cell_key_regulator.sif.stderr.log" # redirect stderr to log file
Step 1/3: Building or loading a cell-specific model...
Preparing dataset ...
Region summary - total: 373422, overlapping with ChromBERT: 368260 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 7920
Total regions: 324690
Fast mode: downsampling to 20k regions
Fine-tuning cell-specific model...
[Attempt 0/2] seed=55
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Epoch 0: 20%|████▍ | 800/4000 [02:18<09:15, 5.77it/s, v_num=0]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.20it/s]
Epoch 0: 40%|▍| 1600/4000 [05:01<07:32, 5.30it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.23it/s]
Epoch 0: 60%|▌| 2400/4000 [07:44<05:09, 5.17it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.12it/s]
Epoch 0: 80%|▊| 3200/4000 [10:27<02:36, 5.10it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.22it/s]
Epoch 0: 100%|█| 4000/4000 [13:10<00:00, 5.06it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.24it/s]
Epoch 1: 20%|▏| 800/4000 [02:16<09:07, 5.84it/s, v_num=0, default_validation/r
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.27it/s]
Epoch 1: 40%|▍| 1600/4000 [05:00<07:30, 5.33it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.04it/s]
Epoch 1: 60%|▌| 2400/4000 [07:44<05:09, 5.17it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:25<00:00, 9.90it/s]
Epoch 1: 80%|▊| 3200/4000 [10:28<02:37, 5.09it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.21it/s]
Epoch 1: 100%|█| 4000/4000 [13:11<00:00, 5.05it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.16it/s]
Epoch 2: 20%|▏| 800/4000 [02:17<09:11, 5.80it/s, v_num=0, default_validation/r
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.04it/s]
Epoch 2: 40%|▍| 1600/4000 [05:01<07:31, 5.31it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:25<00:00, 9.84it/s]
Epoch 2: 60%|▌| 2400/4000 [07:45<05:10, 5.16it/s, v_num=0, default_validation/
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.23it/s]
Epoch 2: 60%|▌| 2400/4000 [08:10<05:26, 4.90it/s, v_num=0, default_validation/
Evaluating the finetuned model performance
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Loading checkpoint from /mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_predict_cell_type_master_regulators_apptainer/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt
Loading from pl module, remove prefix 'model.'
Loading from pl module, replace 'pretrain_model' with 'pretrain_model.chrombert'
Loaded 111/111 parameters
ft_ckpt: /mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_predict_cell_type_master_regulators_apptainer/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt, test_metrics: {'pearsonr': 0.8895852565765381, 'spearmanr': 0.7906825542449951, 'mse': 0.03534284979104996, 'mae': 0.10887021571397781, 'r2': 0.7109256768676733}
Attempt metrics: pearsonr=0.8895852565765381
Accepted run (pearsonr=0.8896 >= 0.4).
Finished stage 2: obtained a fine-tuned ChromBERT
Best pearsonr=0.8895852565765381, metrics={'pearsonr': 0.8895852565765381, 'spearmanr': 0.7906825542449951, 'mse': 0.03534284979104996, 'mae': 0.10887021571397781, 'r2': 0.7109256768676733, 'ft_ckpt': '/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_predict_cell_type_master_regulators_apptainer/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt'}
Step 2/3: Preparing region groups for interpretation...
Step 3/3: Ranking candidate key regulators...
Top 25 candidate regulators:
factors similarity rank embedding_shift
0 yap1 0.118452 1 0.881548
1 tead1 0.146159 2 0.853841
2 myf5 0.171538 3 0.828462
3 esco2 0.173018 4 0.826982
4 pgbd3 0.212452 5 0.787548
5 cbx6 0.235816 6 0.764184
6 foxo1 0.238043 7 0.761957
7 ring1 0.247548 8 0.752452
8 tead4 0.252544 9 0.747456
9 chd4 0.255349 10 0.744651
10 nr3c1 0.259137 11 0.740863
11 wwtr1 0.260190 12 0.739810
12 prmt5 0.263578 13 0.736422
13 rb1 0.265061 14 0.734939
14 cbx8 0.281104 15 0.718896
15 tcf21 0.289050 16 0.710950
16 rnf2 0.295508 17 0.704492
17 myod1 0.295947 18 0.704053
18 hira 0.302095 19 0.697905
19 pax3-foxo1a 0.313865 20 0.686135
20 ep300 0.314469 21 0.685531
21 h3.3 0.316828 22 0.683172
22 kdm6b 0.325724 23 0.674276
23 myog 0.327204 24 0.672796
24 polr2m 0.331407 25 0.668593
Analysis finished.
Checkpoint used for interpretation: /mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_predict_cell_type_master_regulators_apptainer/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt
Ranked regulators saved to: ./output_predict_cell_type_master_regulators_apptainer/results/factor_importance_rank.csv
Analyzing Output Files¶
The output files generated using Singularity are identical to those from direct command-line execution - both methods produce the same results., refer to other tutorial notebooks