Using ChromBERT-tools with a Apptainer Container

This notebook demonstrates how to use ChromBERT-tools commands with a Apptainer container.

Key Apptainer Parameters

  • --nv: Enable NVIDIA GPU support (required for GPU acceleration)

  • --bind: Mount local directories into the container (format: --bind /local/path:/container/path)

  • --pwd: Set working directory inside the container

Notes

  • All chrombert-tools commands work the same way inside the container.

  • Running chromBERT-tools in the container produces the same outputs (format and directory structure) as running it on the host after a normal installation.

  • For detailed command usage and output analysis, refer to other tutorial notebooks (e.g., embed_regulator.ipynb, predict_cell_type_master_regulators.ipynb).

  • You don’t need to launch Jupyter from the container image.

[1]:
import os
workdir="/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli" # your workdir
os.chdir(workdir)
os.environ["CUDA_VISIBLE_DEVICES"] = "2" # gpu device
[2]:
sif_file = "/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/Singularity/chrombert-tools.sif" # your image file


! apptainer exec --nv {sif_file} chrombert-tools -h
INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
INFO:    gocryptfs not found, will not be able to use gocryptfs
Usage: chrombert-tools [OPTIONS] COMMAND [ARGS]...

  Type -h or --help after any subcommand for more information.

Options:
  -v, --verbose  Verbose logging
  -d, --debug    Post mortem debugging
  -V, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  embed_region                    Generate region...
  embed_regulator                 Extract regulator...
  gene_activity_regression        Predict gene...
  interpret_region_region_interactions
                                  Region embedding...
  interpret_regulator_effects_between_region_groups
                                  Identify regulators...
  interpret_regulator_regulator_interactions
                                  Interpret...
  predict_cell_type_master_regulators
                                  Find candidate key...
  predict_regulator_context_cofactors
                                  Identify...
  predict_tf_binding_regions      Predict TF binding...
  predict_transition_driver_regulators
                                  Find driver factors...
  region_activity_regression      Predict region...
  region_function_classification  Classify genomic...
[3]:
# Define example data file
region_file = '../data/CTCF_ENCFF664UGR_sample100.bed'

Basic Usage: Check Available Commands

[4]:
! apptainer exec {sif_file} chrombert-tools embed_regulator -h
INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
INFO:    gocryptfs not found, will not be able to use gocryptfs
Usage: chrombert-tools embed_regulator [OPTIONS]

  Extract regulator embeddings on specified regions. Supports both general and
  cell-specific modes.

Options:
  --region FILE                   Region file.  [required]
  --regulator TEXT                Regulators of interest, e.g. EZH2 or
                                  EZH2;BRD4. Use ';' to separate multiple
                                  regulators.  [required]
  --cell-type-bw FILE             Cell type accessibility BigWig file. Used
                                  for cell-specific mode.
  --cell-type-peak FILE           Cell type accessibility Peak BED file. Used
                                  for cell-specific mode.
  --ft-ckpt FILE                  Fine-tuned checkpoint. If provided, use
                                  cell-specific model and skip fine-tuning.
  --odir DIRECTORY                Output directory.  [default: ./output]
  --oname TEXT                    Output name of the regulator embeddings.
                                  [default: regulator_emb]
  --genome [hg38|mm10]            Genome.  [default: hg38]
  --resolution [1kb|200bp|2kb|4kb]
                                  Resolution.  [default: 1kb]
  --mode [fast|full]              Used when training cell-specific model.
                                  [default: fast]
  --batch-size INTEGER            Batch size.  [default: 4]
  --num-workers INTEGER           Dataloader workers.  [default: 8]
  --chrombert-cache-dir DIRECTORY
                                  ChromBERT cache dir (contains config/
                                  checkpoint/ etc).  [default:
                                  ~/.cache/chrombert/data]
  -h, --help                      Show this message and exit.

Example 1: Extract Regulator Embeddings

This example demonstrates running embed_regulator with all necessary apptainer parameters.

[5]:
# Run embed_regulator command inside apptainer container
# --nv: Enable NVIDIA GPU
# --bind: Mount local directory to container
# --pwd: Set working directory inside container
! apptainer exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools embed_regulator \
    --region {region_file} \
    --regulator "EZH2;BRD4;CTCF;FOXA3;myod1;myF5" \
    --odir "./output_emb_regulator_appatiner" \
    --genome "hg38" \
    --resolution "1kb"
INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
INFO:    gocryptfs not found, will not be able to use gocryptfs
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 6, matched in ChromBERT: 5, not found: 1, not found regulator: ['foxa3']
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Computing regulator embeddings: 100%|███████████| 25/25 [00:03<00:00,  7.26it/s]

Finished!
Focus region summary - total: 100, overlapping with ChromBERT: 100, non-overlapping: 0
Overlapping regions BED file: ./output_emb_regulator_appatiner/overlap_region.bed
Non-overlapping regions BED file: ./output_emb_regulator_appatiner/no_overlap_region.bed
Mean regulator embeddings saved to: ./output_emb_regulator_appatiner/mean_regulator_emb.pkl
Region-aware regulator embeddings saved to: ./output_emb_regulator_appatiner/region_aware_regulator_emb.hdf5
Embedding type: general

Example 2: Infer regulator-regulator interaction networks

This example demonstrates running interpret_regulator_regulator_interactions with all necessary apptainer parameters.

[6]:
! apptainer exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools interpret_regulator_regulator_interactions \
    --region "../data/CTCF_ENCFF664UGR_sample100.bed" \
    --regulator "ctcf" \
    --odir "./output_trn_apptainer_1kb" \
    --genome "hg38" \
    --resolution "1kb"
INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
INFO:    gocryptfs not found, will not be able to use gocryptfs
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 1, matched in ChromBERT: 1, not found: 0, not found regulator: []
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
100%|█████████████████████████████████████████████| 2/2 [00:03<00:00,  1.85s/it]
Total graph nodes: 951
Total graph edges (threshold=0.636): 11503
Regulator subnetwork saved to: ./output_trn_apptainer_1kb/subnetwork_ctcf_k1_q0.980_thr0.636.pdf
Finished!
Saved outputs to: ./output_trn_apptainer_1kb
Regulator cosine similarity saved to: ./output_trn_apptainer_1kb/regulator_cosine_similarity.tsv
Total graph edges saved to: ./output_trn_apptainer_1kb/total_graph_edge_threshold0.636_quantile0.980.tsv

Example 3: Predict TF binding regions

This example demonstrates running predict_tf_binding_regions with all necessary apptainer parameters.

[7]:
! apptainer exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools predict_tf_binding_regions \
    --cistrome "BCL11A:GM12878;BRD4:MCF7;CTCF:HepG2;MYC:H1;MYC:h9;SPI1:GSM2702714" \
    --region "../data/CTCF_ENCFF664UGR_sample100.bed" \
    --odir "./output_predict_tf_binding_regions_apptainer" \
    --genome "hg38" \
    --resolution "1kb"
INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
INFO:    gocryptfs not found, will not be able to use gocryptfs
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 0
celltype: h1 has no corresponding wild type dnase data in ChromBERT.
Note: All cistromes names were converted to lowercase for matching.
Cistromes count summary - requested: 6, matched in ChromBERT: 5, not found: 1, not found cistromes: ['myc:h1']
ChromBERT cistromes metas: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_meta.tsv
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Loading checkpoint from /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_prompt_cistrome.ckpt
Loading from pl module, remove prefix 'model.'
Loading from pl module, replace 'pretrain_model' with 'pretrain_model.chrombert'
Loaded 112/112 parameters
Imputing cistromes: 100%|███████████████████████| 25/25 [00:03<00:00,  7.09it/s]

Finished imputing cistromes on specific regions.
Focus region summary - total: 100, overlapping with ChromBERT: 100, non-overlapping: 0
Overlapping regions BED file: ./output_predict_tf_binding_regions_apptainer/overlap_region.bed
Non-overlapping regions BED file: ./output_predict_tf_binding_regions_apptainer/no_overlap_region.bed
Results saved to: ./output_predict_tf_binding_regions_apptainer/results_prob_df.csv
Results track files saved to: ./output_predict_tf_binding_regions_apptainer/*.bw

Example 4: Infer cell-type-specific key regulators

This example demonstrates running predict_cell_type_master_regulators with all necessary apptainer parameters.

[ ]:
# # Download example data
# # Myoblast and fibroblast data: ATAC-seq bigWig and peak files
import subprocess
import os
if not os.path.exists('../data/myoblast_ENCFF647RNC_peak.bed'):
    cmd = f'wget https://www.encodeproject.org/files/ENCFF647RNC/@@download/ENCFF647RNC.bed.gz -O ../data/myoblast_ENCFF647RNC_peak.bed.gz'
    subprocess.run(cmd, shell=True)
    cmd = f"gzip -d ../data/myoblast_ENCFF647RNC_peak.bed.gz"
    subprocess.run(cmd, shell=True)

if not os.path.exists('../data/myoblast_ENCFF149ERN_signal.bigwig'):
    cmd = f'wget https://www.encodeproject.org/files/ENCFF149ERN/@@download/ENCFF149ERN.bigWig -O ../data/myoblast_ENCFF149ERN_signal.bigwig'
    subprocess.run(cmd, shell=True)
[8]:
! apptainer exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools predict_cell_type_master_regulators \
    --cell-type-bw "../data/myoblast_ENCFF149ERN_signal.bigwig" \
    --cell-type-peak "../data/myoblast_ENCFF647RNC_peak.bed" \
    --odir "./output_predict_cell_type_master_regulators_apptainer" \
    --genome "hg38" \
    --resolution "1kb"  2> "./tmp/infer_cell_key_regulator.sif.stderr.log" # redirect stderr to log file
Step 1/3: Building or loading a cell-specific model...
Preparing dataset ...
Region summary - total: 373422, overlapping with ChromBERT: 368260 (one region may overlap multiple ChromBERT regions, we keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region), non-overlapping: 7920
Total regions: 324690
Fast mode: downsampling to 20k regions
Fine-tuning cell-specific model...

[Attempt 0/2] seed=55
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Epoch 0:  20%|████▍                 | 800/4000 [02:18<09:15,  5.77it/s, v_num=0]
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.20it/s]
Epoch 0:  40%|▍| 1600/4000 [05:01<07:32,  5.30it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.23it/s]
Epoch 0:  60%|▌| 2400/4000 [07:44<05:09,  5.17it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.12it/s]
Epoch 0:  80%|▊| 3200/4000 [10:27<02:36,  5.10it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.22it/s]
Epoch 0: 100%|█| 4000/4000 [13:10<00:00,  5.06it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.24it/s]
Epoch 1:  20%|▏| 800/4000 [02:16<09:07,  5.84it/s, v_num=0, default_validation/r
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.27it/s]
Epoch 1:  40%|▍| 1600/4000 [05:00<07:30,  5.33it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.04it/s]
Epoch 1:  60%|▌| 2400/4000 [07:44<05:09,  5.17it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:25<00:00,  9.90it/s]
Epoch 1:  80%|▊| 3200/4000 [10:28<02:37,  5.09it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.21it/s]
Epoch 1: 100%|█| 4000/4000 [13:11<00:00,  5.05it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.16it/s]
Epoch 2:  20%|▏| 800/4000 [02:17<09:11,  5.80it/s, v_num=0, default_validation/r
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.04it/s]
Epoch 2:  40%|▍| 1600/4000 [05:01<07:31,  5.31it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:25<00:00,  9.84it/s]
Epoch 2:  60%|▌| 2400/4000 [07:45<05:10,  5.16it/s, v_num=0, default_validation/
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s]
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.23it/s]
Epoch 2:  60%|▌| 2400/4000 [08:10<05:26,  4.90it/s, v_num=0, default_validation/
Evaluating the finetuned model performance
Load pretrained ckpt /mnt/Storage/home/chenqianqian/.cache/chrombert/data/checkpoint/hg38_6k_1kb_pretrain.ckpt successfully!
Loading checkpoint from /mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_predict_cell_type_master_regulators_apptainer/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt
Loading from pl module, remove prefix 'model.'
Loading from pl module, replace 'pretrain_model' with 'pretrain_model.chrombert'
Loaded 111/111 parameters
ft_ckpt: /mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_predict_cell_type_master_regulators_apptainer/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt, test_metrics: {'pearsonr': 0.8895852565765381, 'spearmanr': 0.7906825542449951, 'mse': 0.03534284979104996, 'mae': 0.10887021571397781, 'r2': 0.7109256768676733}
Attempt metrics: pearsonr=0.8895852565765381
Accepted run (pearsonr=0.8896 >= 0.4).

Finished stage 2: obtained a fine-tuned ChromBERT
Best pearsonr=0.8895852565765381, metrics={'pearsonr': 0.8895852565765381, 'spearmanr': 0.7906825542449951, 'mse': 0.03534284979104996, 'mae': 0.10887021571397781, 'r2': 0.7109256768676733, 'ft_ckpt': '/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_predict_cell_type_master_regulators_apptainer/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt'}
Step 2/3: Preparing region groups for interpretation...
Step 3/3: Ranking candidate key regulators...
Top 25 candidate regulators:
        factors  similarity  rank  embedding_shift
0          yap1    0.118452     1         0.881548
1         tead1    0.146159     2         0.853841
2          myf5    0.171538     3         0.828462
3         esco2    0.173018     4         0.826982
4         pgbd3    0.212452     5         0.787548
5          cbx6    0.235816     6         0.764184
6         foxo1    0.238043     7         0.761957
7         ring1    0.247548     8         0.752452
8         tead4    0.252544     9         0.747456
9          chd4    0.255349    10         0.744651
10        nr3c1    0.259137    11         0.740863
11        wwtr1    0.260190    12         0.739810
12        prmt5    0.263578    13         0.736422
13          rb1    0.265061    14         0.734939
14         cbx8    0.281104    15         0.718896
15        tcf21    0.289050    16         0.710950
16         rnf2    0.295508    17         0.704492
17        myod1    0.295947    18         0.704053
18         hira    0.302095    19         0.697905
19  pax3-foxo1a    0.313865    20         0.686135
20        ep300    0.314469    21         0.685531
21         h3.3    0.316828    22         0.683172
22        kdm6b    0.325724    23         0.674276
23         myog    0.327204    24         0.672796
24       polr2m    0.331407    25         0.668593
Analysis finished.
Checkpoint used for interpretation: /mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_predict_cell_type_master_regulators_apptainer/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=1-step=126.ckpt
Ranked regulators saved to: ./output_predict_cell_type_master_regulators_apptainer/results/factor_importance_rank.csv

Analyzing Output Files

The output files generated using Singularity are identical to those from direct command-line execution - both methods produce the same results., refer to other tutorial notebooks