Machine learning identifies activation of RUNX/AP-1 as drivers of mesenchymal and fibrotic regulatory programs in gastric cancer

Milad Razavi-Mohseni1, Weitai Huang2, Yu A Guo2, Dustin Shigaki1, Shamaine Wei Ting Ho2, Patrick Tan2, Anders J Skanderup2, Michael A Beer 1,†

1Department of Biomedical Engineering and McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University

2Genome Institute of Singapore

Correspondence related to this website should be addressed to: Michael A. Beer (mbeer AT jhu DOT edu)

Gastric cancer (GC) is the fifth most common cancer worldwide and is a heterogeneous disease. Among GC subtypes, the mesenchymal phenotype (Mes-like) is more invasive than the epithelial phenotype (Epi-like). While gene expression of the epithelial-to-mesenchymal transition (EMT) has been studied, the regulatory landscape shaping this process is not fully understood. Here we use ATAC-seq and RNA-seq from a compendium of gastric cancer cell lines and primary tumors to detect drivers of regulatory state changes and their transcriptional responses. Using the ATAC-seq, we developed a machine learning approach to determine the transcription factors (TFs) regulating the subtypes of GC. We identified TFs driving the mesenchymal (RUNX2, ZEB1, SNAI2, AP-1 dimer) as well as the epithelial states (GATA4, GATA6, KLF5, HNF4A, FOXA2, GRHL2) in gastric cancer. We identified DNA copy number alterations associated with dysregulation of these TFs, specifically deletion of GATA4 and amplification of MAPK9. Comparisons with bulk and single-cell RNA-seq datasets identified activation toward fibroblast-like epigenomic and expression signatures in Mes-like GC. The activation of this mesenchymal fibrotic program is associated with differentially accessible DNA cis-regulatory elements flanking upregulated mesenchymal genes. These findings establish a map of TF activity in GC and highlight the role of copy number driven alterations in shaping epigenomic regulatory programs as potential drivers of gastric cancer heterogeneity and progression.

Citation

If you use this data, please cite as:

Razavi-Mohseni M, Huang W, Guo YA, Shigaki D, Ho SWT, Tan P, Skanderup AJ, Beer MA. Shigaki D, Yang Y, Eng N, and Beer MA. Machine learning identifies activation of RUNX/AP-1 as drivers of mesenchymal and fibrotic regulatory programs in gastric cancer. Genome Research 2024.

If you use gkm-SVM models, please also cite:
Ghandi, M., Lee, D., Mohammad-Noori, M. & Beer, M. A. (2014) Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features. PLoS Comput Biol 10: e1003711.
Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955-961 (2015). doi:10.1038/ng.3331

Software

All software used in this manuscript have been previously published, as cited.
The script used to generate training sets on differential peaks in two ATAC experiments can be found here:
score_comp_region.py

Models

gkm-SVM trained models on GC cell-line ATAC-seq can be found here:
models

ATAC-seq bigwigs

bigwigs for all ATAC-seq (GSE264550) plus average bigwigs for (Mes1,Intermediate,Epi) can be found here:
bigwigs

Average expression table

Average gene expression for (Mes,Intermediate,Epi,TCGA_Tumor,TCGA_Normal) can be found here:
avg_STAD_GCCL_expr.out