Selected Publications

  • Giotto, a toolbox for integrative analysis and visualization of spatial expression data. Genome biology. 2021. link. spatialgiotto.com.
    spatial transcriptomics spatial pattern mining cell-cell interaction cancer software
    Dries, R.1, Zhu, Q.1, Eng, C-H.L., Sarkar, A., Bao, F., George, R.E., Pierson, N., Cai, L., Yuan, G.C.
    Abstract
    The rapid development of novel spatial transcriptomics technologies has provided new opportunities to investigate the interactions between cells and their native microenvironment. However, effective use of such technologies requires the development of innovative computational algorithms and pipelines. Here we present Giotto, a comprehensive, flexible, robust, and open-source pipeline for spatial transcriptomic data analysis and visualization. The data analysis module implements a wide range of algorithms ranging from basic tasks such as data pre-processing to innovative approaches for cell-cell interaction characterization. The data visualization module provides a user-friendly workspace that allows users to interactively visualize, explore and compare multiple layers of information. These two modules can be used iteratively for refined analysis and hypothesis development. We illustrate the functionalities of Giotto by using the recently published seqFISH+ dataset for mouse brain. Our analysis highlights the utility of Giotto for characterizing tissue spatial organization as well as for the interactive exploration of multi- layer information in spatial transcriptomic and imaging data. We find that single-cell resolution spatial information is essential for the investigation of ligand-receptor mediated cell-cell interactions. Giotto is generally applicable and can be easily integrated with external software packages for multi-omic data integration.


  • Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature. 2019. link.
    spatial transcriptomics multiplexed RNA FISH seqFISH+ imaging technology
    Eng, C-H.L., Lawson, M., Zhu, Q., Dries, R., Koulena, N., Takei, Y., Yun, J., Cronin, C., Karp, C., Yuan, G.C., Cai, L.
    Abstract
    Imaging the transcriptome in situ with high accuracy has been a major challenge in single-cell biology, which is particularly hindered by the limits of optical resolution and the density of transcripts in single cells. Here we demonstrate an evolution of sequential fluorescence in situ hybridization (seqFISH+). We show that seqFISH+ can image mRNAs for 10,000 genes in single cells—with high accuracy and sub-diffraction-limit resolution—in the cortex, subventricular zone and olfactory bulb of mouse brain, using a standard confocal microscope. The transcriptome-level profiling of seqFISH+ allows unbiased identification of cell classes and their spatial organization in tissues. In addition, seqFISH+ reveals subcellular mRNA localization patterns in cells and ligand–receptor pairs across neighbouring cells. This technology demonstrates the ability to generate spatial cell atlases and to perform discovery-driven studies of biological processes in situ.


  • Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nature biotechnology. 2018. link. spatial.rc.fas.harvard.edu/install.html.
    spatial transcriptomics hidden markov random field spatial domain cell type mapping
    Zhu, Q., Shah, S., Dries, R., Cai, L., Yuan, G.C.
    Abstract
    How intrinsic gene-regulatory networks interact with a cell's spatial environment to define its identity remains poorly understood. We developed an approach to distinguish between intrinsic and extrinsic effects on global gene expression by integrating analysis of sequencing-based and imaging-based single-cell transcriptomic profiles, using cross-platform cell type mapping combined with a hidden Markov random field model. We applied this approach to dissect the cell-type- and spatial-domain-associated heterogeneity in the mouse visual cortex region. Our analysis identified distinct spatially associated, cell-type-independent signatures in the glutamatergic and astrocyte cell compartments. Using these signatures to analyze single-cell RNA sequencing data, we identified previously unknown spatially associated subpopulations, which were validated by comparison with anatomical structures and Allen Brain Atlas images.


  • Subtype-specific transcriptional regulators in breast tumors subjected to genetic and epigenetic alterations. Bioinformatics. 2019. link.
    gene regulatory network cancer epithelial-messenchymal transition PhD
    Zhu, Q., Tekpli, X., Troyanskaya, O.G., Kristensen, V.
    Abstract
    Motivation: Breast cancer consists of multiple distinct tumor subtypes, and results from epigenetic and genetic aberrations that give rise to distinct transcriptional profiles. Despite previous effort to understand transcriptional deregulation through transcription factor networks, the transcriptional mechanisms leading to subtypes of the disease remain poorly understood.
    Results: We used a sophisticated computational search of thousands of expression datasets to define extended signatures of distinct breast cancer subtypes. Using ENCODE ChIP-seq data of surrogate cell lines and motif analysis we observed that these subtypes are determined by a distinct repertoire of lineage-specific transcription factors. Furthermore, specific pattern and abundance of copy number and DNA methylation changes at these TFs and targets, compared to other genes and to normal cells were observed. Overall, distinct transcriptional profiles are linked to genetic and epigenetic alterations at lineage-specific transcriptional regulators in breast cancer subtypes.


  • Revealing the critical regulators of cell identity in the mouse cell atlas. Cell report. 2018. link. regulon.rc.fas.harvard.edu
    gene regulatory network regulon
    Suo, S., Zhu, Q., Saadatpour, A., Fei, L., Guo, G., Yuan, G.C.
    Abstract
    Recent progress in single-cell technologies has enabled the identification of all major cell types in mouse. However, for most cell types, the regulatory mechanism underlying their identity remains poorly understood. By computational analysis of the recently published mouse cell atlas data, we have identified 202 regulons whose activities are highly variable across different cell types, and more importantly, predicted a small set of essential regulators for each major cell type in mouse. Systematic validation by automated literature and data mining provides strong additional support for our predictions. Thus, these predictions serve as a valuable resource that would be useful for the broad biological community. Finally, we have built a user-friendly, interactive web portal to enable users to navigate this mouse cell network atlas.


  • CUT&RUNTools: a flexible pipeline for CUT&RUN processing and footprint analysis. Genome biology. 2019. link. bitbucket.org/qzhudfci/cutruntools/
    CUT&RUN motif footprinting transcription factor binding technology
    Zhu, Q., Liu, N., Orkin, S., Yuan, G.C.
    Abstract
    We introduce CUT&RUNTools as a flexible, general pipeline for facilitating the identification of chromatin-associated protein binding and genomic footprinting analysis from antibody-targeted CUT&RUN primary cleavage data. CUT&RUNTools extracts endonuclease cut site information from sequences of short read fragments and produces single-locus binding estimates, aggregate motif footprints, and informative visualizations to support the high-resolution mapping capability of CUT&RUN. CUT&RUNTools is available at https://bitbucket.org/qzhudfci/cutruntools/.


  • Direct promoter repression by BCL11A controls the fetal to adult hemoglobin switch. Cell. 2018. link.
    CUT&RUN globin gene switch gamma globin repression protein-DNA mapping
    Liu, N.1, Hargreaves, V.1, Zhu, Q.1, Kurland, J.V., Hong, J., Kim, W., Sher, F., Macias-Trevino, C., Rogers, J.M., Kurita, R., Nakamura, Y., Yuan, G.C., Bauer, D.E., Xu, J., Bulyk, M., Orkin, S.H.
    Abstract
    Fetal hemoglobin (HbF, α2γ2) level is genetically controlled and modifies severity of adult hemoglobin (HbA, α2β2) disorders, sickle cell disease, and β-thalassemia. Common genetic variation affects expression of BCL11A, a regulator of HbF silencing. To uncover how BCL11A supports the developmental switch from γ- to β- globin, we use a functional assay and protein binding microarray to establish a requirement for a zinc-finger cluster in BCL11A in repression and identify a preferred DNA recognition sequence. This motif appears in embryonic and fetal-expressed globin promoters and is duplicated in γ-globin promoters. The more distal of the duplicated motifs is mutated in individuals with hereditary persistence of HbF. Using the CUT&RUN approach to map protein binding sites in erythroid cells, we demonstrate BCL11A occupancy preferentially at the distal motif, which can be disrupted by editing the promoter. Our findings reveal that direct γ-globin gene promoter repression by BCL11A underlies hemoglobin switching.


  • Targeted exploration and analysis of large cross-platform human transcriptomic compendia. Nature methods. 2015. link. seek.princeton.edu.
    data integration search engine big data coexpression genomics gene expression PhD
    Zhu, Q., Wong, A.K., Krishnan, A., Aure, M.R., Tadych, A., Zhang, R., Corney, D.C., Greene, C.S., Bongo, L.A., Kristensen, V.N., Charikar, M., Li, K., Troyanskaya, O.G.
    Abstract
    We present SEEK (search-based exploration of expression compendia; http://seek.princeton.edu/), a query-based search engine for very large transcriptomic data collections, including thousands of human data sets from many different microarray and high-throughput sequencing platforms. SEEK uses a query-level cross-validation–based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify genes, pathways and processes co-regulated with the query. SEEK provides multigene query searching with iterative metadata-based search refinement and extensive visualization-based analysis options.


  • Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies. Bioinformatics. 2013. link.
    data integration cell type prediction gene expression PhD
    Lee, Y.-S., Krishnan, A., Zhu, Q., Troyanskaya, O.G.
    Abstract
    Motivation: Leveraging gene expression data through large-scale integrative analyses for multicellular organisms is challenging because most samples are not fully annotated to their tissue/cell-type of origin. A computational method to classify samples using their entire gene expression profiles is needed. Such a method must be applicable across thousands of independent studies, hundreds of gene expression technologies and hundreds of diverse human tissues and cell-types. We present Unveiling RNA Sample Annotation (URSA) that leverages the complex tissue/cell-type relationships and simultaneously estimates the probabilities associated with hundreds of tissues/cell-types for any given gene expression profile. URSA provides accurate and intuitive probability values for expression profiles across independent studies and outperforms other methods, irrespective of data preprocessing techniques. Moreover, without re-training, URSA can be used to classify samples from diverse microarray platforms and even from next-generation sequencing technology. Finally, we provide a molecular interpretation for the tissue and cell-type models as the biological basis for URSA’s classifications.


  • Generalized gene adjacencies, graph bandwidth, and clusters in yeast evolution. IEEE Transactions in computational biology and bioinformatics. 2009.
    comparative genomics yeast gene order analysis undergraduate
    Zhu, Q., Adam, Z., Choi, V., Sankoff, D.
    Abstract
    We present a parametrized definition of gene clusters that allows us to control the emphasis placed on conserved order within a cluster. Though motivated by biological rather than mathematical considerations, this parameter turns out to be closely related to the maximum bandwidth parameter of a graph. Our focus will be on how this parameter affects the characteristics of clusters: how numerous they are, how large they are, how rearranged they are and to what extent they are preserved from ancestor to descendant in a phylogenetic tree. We infer the latter property by dynamic programming optimization of the presence of individual edges at the ancestral nodes of the phylogeny. We apply our analysis to a set of genomes drawn from the Yeast Gene Order Browser.


  • The collapse of gene complement following whole genome duplication. BMC genomics. 2010.
    comparative genomics whole-genome duplication undergraduate
    Sankoff, D., Zheng, C., Zhu, Q.
    Abstract
    Background: Genome amplification through duplication or proliferation of transposable elements has its counterpart in genome reduction, by elimination of DNA or by gene inactivation. Whether loss is primarily due to excision of random length DNA fragments or the inactivation of one gene at a time is controversial. Reduction after whole genome duplication (WGD) represents an inexorable collapse in gene complement.
    Results: We compare fifteen genomes descending from six eukaryotic WGD events 20-450 Mya. We characterize the collapse over time through the distribution of runs of reduced paralog pairs in duplicated segments. Descendant genomes of the same WGD event behave as replicates. Choice of paralog pairs to be reduced is random except for some resistant regions of contiguous pairs. For those paralog pairs that are reduced, conserved copies tend to concentrate on one chromosome.
    Conclusions: Both the contiguous regions of reduction-resistant pairs and the concentration of runs of single copy genes on a single chromosome are evidence of transcriptional co-regulation, dosage sensitivity or other functional interaction constraining the reduction process. These constraints and their evolution over time show a consistent pattern across evolutionary domains and a highly reproducible pattern, as replicates, for the several descendants of a single WGD.


Other and Earlier Publications

  • Enhancer-dependence of gene expression increases with developmental age. Proceedings national academy of sciences. 2020.
    Cai, W., Huang, J., Zhu, Q., Li, B.E., Seruggia, D., Zhou, P., Nguyen, M., Fujiwara, Y., Xie, H., Yang, Z., Hong, D., Ren, P., Xu, J., Pu, W.T., Yuan, G.C., Orkin, S.H.

  • Accurate estimation of cell-type composition from gene expression data. Nature communications. 2019.
    Tsoucas, D., Dong, R., Chen, H., Zhu, Q., Guo, G., Yuan, G.C.

  • A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data. Plos computational biology. 2018.
    Rangan, A.V., McGrouther, C.C., Kelsoe, J., Schork, N., Stahl, E., Zhu, Q., Krishnan, A., Yao, V., Troyanskaya, O., Bilaloglu, S., Raghavan, P., Bergen, S., Jureus, A., Landen, M.

  • IFNγ-dependent tissue-immune homeostasis is co-opted in the tumor microenvironment. Cell. 2017.
    Nirschl, C., Suraez-Farinas, M., Izar, B., Prakadan, S., Dannenfelser, R., Tirosh, I., Liu, Y., Zhu, Q., Devi, S., Carroll, S.L., Chau, D., Rezaee, M., Kim, T.-G., Huang, R., Fuentez-Duculan, J., Song-Zhao, G., Gulati, N., Lowes, M.A., King, S., Quintana, F.J., Lee, Y.-S., Krueger, J.G., Sarin, K.Y., Yoon, C.H., Garraway, L., Regev, A., Shalek, A.K., Troyanskaya, O., Anandasabapathy, N.

  • Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms. Bioinformatics. 2014.
    Park, C.Y., Krishnan, A., Zhu, Q., Wong, A.K., Lee, Y.S., Troyanskaya O.G.

  • Individual and combined effects of DNA methylation and copy number alterations on miRNA expression in breast tumors. Genome biology. 2013.
    Aure, M.A., Leivonen, S.-K., Fleischer, T., Zhu, Q., Overgaard, J., Alsner, J., Tramm, T., Louhimo, R., Alnaes, G.G., Perala, M., Busato, F., Touleimat, N., Tost, J., Borresen-Dale, A.-L., Hautaniemi, S., Troyanskaya, O.G., Linjaerde, O.C., Sahlberg, K.K., Kristensen, V.N.

  • Defining cell-type specificity at the transcriptional level in human disease. Genome research. 2013.
    Ju, W., Greene, C.S., Eichinger, F., Nair, V., Hodgin, J.B., Bitzer, M., Lee, Y., Zhu, Q., Kehata, M., Li, M., Jiang, S., Rastaldi, M.P., Cohen, C.D., Troyanskaya, O.G., Kretzler, M.

  • Scaffold filling, contig fusion and comparative gene order inference. BMC bioinformatics. 2010.
    Munoz, A., Zheng, C., Zhu, Q., Albert, V.A., Rounsley, S., Sankoff, D.

  • Descendants of whole genome duplication within gene order phylogeny. Journal of computational biology. 2008.
    Zheng, C., Zhu, Q., Sankoff, D.

  • Guided genome halving: hardness, heuristics and the history of the Hemiascomycetes. Bioinformatics. 2008.
    Zheng, C., Zhu, Q., Adam, Z., Sankoff, D.

  • Removing noise and ambiguities from comparative maps in rearrangement analysis. IEEE Transactions on computational biology and bioinformatics. 2007.
    Zheng, C., Zhu, Q., Sankoff, D.

  • Parts of the problem of polyploids in rearrangement phylogeny. RECOMB comparative genomics. 2007.
    Zheng, C., Zhu, Q., Sankoff, D.

  • Algorithms for the extraction of synteny blocks from comparative maps. International workshop on algorithms in bioinformatics. 2007.
    Choi, V., Zheng, C., Zhu, Q., Sankoff, D.

  • Genome halving with an outgroup. Evolutionary bioinformatics. 2006.
    Zheng, C., Zhu, Q., Sankoff, D.