Research


Spatial Transcriptomics

I develop computational methods to understand the spatial organizational principle of cells in their native tissue environment.

seqFISH is a highly multiplexed RNA FISH (fluorescent in situ hybridization) technique that uses a barcoding scheme to repeatedly hybridize transcripts with fluorescent-oligo-probes in fixed cells.

It can achieve transcript localization, identification for hundreds of genes (seqFISH1) and thousands of genes (seqFISH+2).

With data generated by seqFISH, I build probabilistic methods and computational tools for mapping the spatial gene expression domains based on using simultaneously single-cells' gene expression and spatial location data 3, and revealed a diversity of gene expression domains in the visual cortex 3.

I showed that both cell type and spatial domain (often missed by single-cell RNA-seq) may independently contribute to the gene expression programs.

We recently published Giotto 4, a comprehensive package for spatial transcriptomic analysis. Check it out.

1. Shah et al. Neuron. 92(2):342-357. 2016
2. Eng, C.-L., Lawson, M., Zhu, Q., et al. Nature. 568(7751):235-239. 2019
3. Zhu et al. Nature biotechnology. 36:1183-1190. 2018
4. Dries, R.1, Zhu, Q.1, et al. Accepted in Genome biology. doi:10.1101/701680v3. 2021




Transcriptional Regulation & Regulatory Genomics

Mapping the transcriptional regulatory circuit is central to understanding disease mechanisms.

Traditionally, ChIP-seq has been the go-to technology for mapping protein-DNA interactions.

But the harsh crosslinking requirement of ChIP-seq sometimes interferes with the binding of the antibody of interest. We have successfully employed CUT&RUN 1 2 in several papers to elucidate binding of BCL11A, GATA1, and NFY during globin gene switch.

Motif footprinting is a central element of CUT&RUN analysis. We can identify at a nucleotide resolution the protection of DNA from pA-MNase enzyme digestion; applied genome-wide, motif footprinting quantitatively assigns a binding score to each motif site.

A useful software CUT&RUNTools 3 also resulted from our study.

In addition to identifying genomic locations of TF via CUT&RUN, I also use a systems-biology approach to understand how the transcription factor network is dysregulated in a subtype dependent manner in breast cancer 4.

Using ENCODE ChIP-seq data of surrogate cell lines and motif analysis we observed that these subtypes are determined by a distinct repertoire of lineage-specific transcription factors.

Finally, similar ideas about combining coexpressed genes and motif analysis allow us to identify the critical regulators of cell identity in the Mouse Cell Atlas single cell transcriptomic dataset 5.

Each TF-targets group, collectively termed as a regulon, establishes the essential motifs potentially driving the cellular differentiation process.

1. Skene and Henikoff. eLife. 2017
2. Liu, N.1, Hargreaves, V.1, Zhu, Q.1, et al. Cell. 173(2):430-442. 2018
3. Zhu et al. Genome biology. 2019
4. Zhu et al. Bioinformatics. 2020
5. Suo, S., Zhu, Q., et al. Cell report. 2018




Application of our work

The application of the above work is wide ranging, from dissecting the tumor microenvironment in cancer (with scRNAseq and spatial approaches), to dissecting at a nuclear level the role of enhancers and regulatory elements in diseases (with CUT&RUN).

We are actively exploring different application areas for spatial transcriptomics and CUT&RUN analysis.

Sickle cell disease

For example, BCL11A is a protein that is responsible for γ-globin regulation in erythroid cells in patients with β-thalassemia and sickle cell disease.

Recently, using the CUT&RUN technology, we have identified a peak of BCL11A binding at the γ-globin promoter which pointed to a simple promoter-based repression mechanism of BCL11A 1, and may provide a therapeutic lead.

Big genomics data integration

In my graduate years, I developed SEEK (search-based exploration of expression compendia), a query-based coexpression search engine for very large transcriptomic data collections, including thousands of human data sets from many different microarray and high-throughput sequencing platforms. 2

SEEK uses a query-level cross-validation–based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify genes, pathways and processes co-regulated with the query.

We are interested in collecting all datasets in GEO to explore the possible use of big genomics data for precision medicine. Ultimately, developing novel therapies for cancer relies on mining the collective biomedical knowledge in an unbiased, data-driven way.

1. Liu, N.1, Hargreaves, V.1, Zhu, Q.1, et al. Cell. 173(2):430-442. 2018
2. Zhu et al. Nature methods. 12:211-214. 2015