The ultimate goal of our research is to understand how genomic DNA sequence specifies gene regulation. We are currently focused on 1) developing computational tools to identify functional regulatory elements in non-coding DNA, and 2) experimentally testing and characterizing how these elements function.
In our computational work, we are using microarray gene expression data, genome-wide location analysis, and whole-genome DNA sequence to systematically identify DNA functional elements and infer combinatorial regulatory logic. We use pattern recognition algorithms to identify over-represented and phylogenetically conserved DNA sequence elements (or putative transcription factor binding sites). We then use a probabilistic Bayesian network to find the most likely functional constraints on the position, spacing, orientation, and combinations of these DNA sequence elements. This methodology has generated a large set of high confidence predictions for regulatory interactions, and is in principle applicable to any organism with microarray and genome sequence data.
In our experimental work, we are testing these computational predictions by rapid generation of transgenic GFP reporter strains in C. elegans via microparticle bombardment. C. elegans is an attractive model system for several reasons:
- Relevance to human disease: About 60% of C. elegans genes have a human homologue (Harris et al., NAR 2004); and 80% of genes implicated in human cancer have a worm homologue (Futreal et al., Nat Rev Cancer 2004; Poulin et al, Oncogene 2004).
- The high quality of the genome sequence data and microarray tools.
- Rapid and effective transformation techniques and GFP reporter assays.
- Availablity of bacterial feeding library for genome wide RNAi screens to further characterize regulatory interactions.
- Relative ease and cost of strain maintenance.