Current Predoctoral Trainees

Alexandra Signoriello 

NLM Predoctoral Trainee: 9/1/15 –

Department:  Interdepartmental Program in Computational Biology and Bioinformatics

Research Supervisor:  Corey O’Hern, PhD (Mech Engineering, Physics, Applied Physics)

Research: Alexandra Signoriello is in the third year of her graduate training. Alexandra is developing a model that will illustrate the spatial structure of a melanoma tumor microenvironment. Melanoma tumors contain of a broad spectrum of cell types that play an important role in development. Her research seeks to understand the spatial organization of these cells in the tumor. She is investigating which cell types tend to cluster, the spatial patterns surrounding blood vessels, how structures change in different locations and how this affects the cell network structure across the entire tumor.

Kevin Lopez

NLM Predoctoral Trainee: 9/1/15 –

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research:

He is developing a bedside aid for concussion & Brain injury decisions in the emergency department and rotating through labs attending CBB coursework.


Mohammed Khan

NLM Predoctoral Trainee: 9/1/14 –

Department: Molecular, Cellular, and Developmental Biology Research Supervisor: Michael Krauthammer

Research: Mohammed Khan is currently in the third year of his graduate training. He is designing computational tools to further understand the role of different classes of RNA and how they may contribute to different diseases, especially cancer. The classes of RNAs he is investigating includes the protein-encoding mRNA and long non-coding RNA (lncRNA). Greater insight in the biological function of these RNA classes could help identify potential disease markers and targets to improve disease diagnosis & medical treatment.


*Peter Williams

NLM Predoctoral Trainee: 7/1/14 – (On leave; 12/16/14-1/16/16)  Department: Applied Physics

Research Supervisor: Corey O’Hern, PhD ( Mech Engineering, Physics, Applied Physics)

Research: His research focuses on computational studies of crowding in the cytoplasm of bacterial cells. He has performed Langevin dynamics simulations to measure the growth of the structural relaxation time in model systems that mimic the dense environment of the cell.


Stefan Avey

NLM Predoctoral Trainee: 9/1/13 –

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research Supervisor: Steven Kleinstein, PhD (Pathology, Immunobiology)

Research: Stefan Avey is in the third year of his graduate training. Stefan's research focuses on understanding the early innate immune response to influenza infection and vaccination. He is interested in discovering immune signatures of response as well as developing methods to more accurately model immunological systems.


Christopher Fragoso

NLM Predoctoral Trainee: 9/1/13 –

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research Supervisor: Hongyu Zhao, PhD (Biostatistics), Stephen Dellaporta, PhD (MCDB)

Research: Chris Fragoso is in the third year of his graduate training in computational biology and bioinformatics. He is interested in developing new methods to study genetic variation in populations. Knowledge of genetic variation, gleaned from genotype by sequencing (GBS) data, facilitates the association of traits to variants found in populations of interest. He has assisted in developing a computational pipeline for a novel GBS sequencing method. Additionally, he has developed a genotype imputation method, of which a manuscript is in preparation.


Jennifer Gaines

NLM Predoctoral Trainee: 9/1/13 –

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research Supervisor: Corey O’Hern, PhD (Mech Engineering, Physics, Applied Physics) Research: Jennifer Gaines is in the second year of her graduate training. Jennifer's research focuses on using a hard sphere, atomistic model to study protein structure and protein-protein interactions. The goal of her research is to predict the effect of mutations on protein structure and binding properties. This research is in collaboration with Prof.

Lynne Regan from the Department of Molecular Biophysics and Biochemistry at Yale.


Michael Klein

NLM Predoctoral Trainee: 9/1/13 – (on leave; 1/1/14 – 8/31/14)

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research Supervisor: Hongyu Zhao, PhD (Biostatistics), David Stern, PhD (Pathology, Yale Cancer Center)

Research: Michael Klein is currently in the third year of training. His research focuses on pharmacogenomic drug screens. He is searching for new ways to relate genomic information of cancer cell lines to drug sensitivity. This research can ultimately lead to improved patient-specific treatments in the clinic as well as to generate hypothesis of new biological targets for drug design.


Namita Gupta

NLM Predoctoral Trainee: 9/1/11 –

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research Supervisor: Steven Kleinstein, PhD (Pathology)

Research: Namita Gupta is currently in the fourth year of her graduate training. She is developing computational methods to analyze large repertoires of immunoglobulin sequences generated by new sequencing technologies. She is interested in determining if the nucleotide sequence of an antibody can predict the virus or bacteria to which it binds. She has completed required coursework and her qualifying exam.

Namita will participate in the May 2015, Community Meeting: Analysis, Management, and Sharing of Antigen Receptor Repertoire Sequence Data in Vancouver, BC.


Mate Nagy

NLM Predoctoral Trainee: 9/1/11 –

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research Supervisor: Michael Krauthammer, MD, PhD (Pathology Informatics), Kei Cheung, PhD (Yale Center for Medical Informatics, Genetics, Computer Science) Research: Mate Nagy is currently in the fourth year of his graduate training. Mate is working in data mining. More specifically, he is using convolutional networks and other deep learning methods on images from PubMed publications to classify them and to find deeper information in them. He is also working on expanding the capabilities of Yale Image Finder by developing new tools for a variety of tasks by using semantic technologies. Additionally, Mate is working with the CORE at Yale to cluster patient trajectories for diagnostic and predictive purposes. He is currently working with Heart Failure patient information.


*Jason Vander Heiden

NLM Predoctoral Trainee: 9/1/11 –

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research Supervisor: Steven Kleinstein, PhD (Pathology)

Research: Jason Vander Heiden is in the fourth year of his graduate training. He is currently working on developing computational tools to analyze high-throughput sequencing data of lymphocyte receptors. Jason is applying these tools to studies characterizing B cell receptor diversity in multiple sclerosis, myasthenia gravis, and inflammatory myopathies.


*Jason Vander Heiden received Training Grant support from September 1, 2015 while Peter Williams was on leave. Jason will no longer be supported on the Grant as of July 1, 2016.




Recent Predoctoral Students

Pedro Alves (Mentor: Mark Gerstein)


Research: Pedro Alves is studying yeast genes that are essential to its quiescent state by integrating various data sources, such as gene expression studies, sub-cellular protein localization, and protein- protein interaction networks. He is also working on demonstrating the important relationships between quiescence and human neurodegenerative diseases. A second project that he is working on involves analyzing the utility of protein networks in gene expression studies. (After two months of NLM support, in the fall of 2007, Pedro was awarded a SACNAS fellowship, which supported him through 8/31/08, after which he was again supported on the NLM Training grant. NLM staff approved this unusual schedule.)

Current position: Chief Data Scientist, Banjo


Raymond Auerbach (Mentor: Mark Gerstein)

Research: Raymond Auerbach is working on several projects including: 1) analyzing results from ChIP- Sequencing experiments to locate novel transcription factor binding site patterns in the human and worm genomes, 2) developing new methods to analyze large datasets inherent in next-generation sequencing analyses experiments, and 3) integrating existing, public data with next-generation sequencing experiments to answer novel biological questions.

Current position: Data Scientist, United States Government


David Ballard (Mentor: Hongyu Zhao)

Research: David Ballard developed statistical/computational methods to better identify gene expression quantitative trait loci (eQTL) underlying complex disease. Once the eQTL are identified, he related the eQTL to clinical traits in order to establish (causative) gene networks. He also worked on applying biological/pathway information to genome wide association studies. The objective was to develop new methods to prioritize SNPs for selection for association testing using the known biological information.

Freudenberg J, Lee AT, Siminovitch KA, Amos CI, Ballard D, Li W, Gregersen PK. Locus category based analysis of a large genome-wide association study of rheumatoid arthritis. Hum Mol Genet. 2010 Oct 1;19(19):3863-72. PMID: 20639398; PMCID: PMC2935861.

Ferraro TN, Smith GG, Ballard D, Zhao H, Schwebel CL, Gupta A, Rappaport EF, Ruiz SE, Lohoff FW, Doyle GA, Berrettini WH, Buono RJ. Quantitative trait loci for electrical seizure threshold mapped in C57BLKS/J and C57BL/10SnJ mice. Genes Brain Behav. 2010 Dec 3. doi: 10.1111/j.1601- 183X.2010.00668.x. PMID: 21129161.

Current position: Research Scientist, Celgene


Paul Baranay

NLM Predoctoral Trainee: 9/1/12 – 5/31/14

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research: Paul Baranay studied ChIP-seq data from hematopoietic stem cells to identify differential expression of transcription factors. He completed the requirements for the MS degree in Computational Biology and Bioinformatics, receiving the degree in December. 2014.

Current Position: Lead Data Engineer, Wiser

Christopher Bolen (Mentor: Steven Kleinstein)

Research: Christopher Bolen is studying the rates of response to treatment of HCV in different individuals by using a combination of pretreatment gene expression data and patient demographic data to predict the patient's response. He is also designing a method to be used to predict the specific cell subset responsible for the upregulation of a set of genes in a mixed cell population.

Bolen CR, Robek MD, Brodsky L, Schulz V, Lim JK, Taylor MW, Kleinstein SH, (2013), The blood

transcriptional signature of chronic hepatitis C virus is consistent with an ongoing interferon-mediated

antiviral response. Journal of Interferon & Cytokine Research : The Official Journal of The International

Society for Interferon and Cytokine Research. 33(1): 15-23. PMCID: PMC3539252.

Bolen CR, Ding S, Robek MD, Kleinstein SH, (2013), Dynamic expression profiling of type I and type III

interferon-stimulated hepatocytes reveals a stable hierarchy of gene expression. Hepatology

(Baltimore, Md.). 59(4): 1262-72. PMCID: PMC3938553.Current position: Associate Scientist, Genentech, Inc.


Timothy Burbridge (Mentor: Michael Crair)

Research: Tim Burbridge is interested in the development of sensory circuits and the role that activity plays in determining the precise connections that are largely set at early stages in brain development. He is working on defining the roles of axon guidance and cell adhesion molecules in the development of the retinotopic map in the mouse superior colliculus, using wet-lab and neuroinformatics approaches.

Current position: PostDoc Fellow,Yale University


Christine DeLorenzo (Mentor: James Duncan)

Research: Christine DeLorenzo developed a Bayesian probability model to update preoperative brain images such that they reflect intraoperative brain deformation. This model tracks the brain shift that occurs during neurosurgery and is guided by sparse intraoperative cortical surface information, as obtained by OR stereo cameras. A game theoretic model formulation overcomes the problem of non-ideal camera calibration information, which plagues almost every stereo system, while still achieving high surface tracking accuracy. The algorithm, which has been tested in vivo and on a realistic brain phantom, calculates the most rational compromise between the competing variables of surface deformation and camera calibration parameters.

Delorenzo C, Milak MS, Brennan KG, Kumar JS, Mann JJ, Parsey RV. In vivo positron emission tomography imaging with [(11)C]ABP688: binding variability and specificity for the metabotropic glutamate receptor subtype 5 in baboons. Eur J Nucl Med Mol Imaging. 2011 Jan 29. [Epub ahead of print] PMID: 21279350.

DeLorenzo C, Kumar JS, Zanderigo F, Mann JJ, Parsey RV. Modeling considerations for in vivo quantification of the dopamine transporter using [(11)C]PE2I and positron emission tomography. J Cereb Blood Flow Metab. 2009 Jul;29(7):1332-45 PMCID: PMC2757108.

Current position: Assistant Professor, Stony Brook Medicine


Valentin Dinu (Mentor: Hongyu Zhao and Perry Miller)

Research: Valentin Dinu was the first graduate of the CBB program, receiving his PhD in May, 2007. His research focused on informatics issues involved in the analysis of SNP data as it relates to helping determine the genetic basis of disease. He developed statistical algorithms for association analysis of genomic data focused on pathway-based analysis. Valentin developed and investigated performance of algorithms for pivoting clinical data stored in Entity-Attribute-Value modeled databases. He also investigated genomic coverage and copy number polymorphism capabilities of multiple microarray platforms.

Li C, Li Y, Zhang X, Stafford P, Dinu V. ICRPfinder: a fast pattern design algorithm for coding sequences and its application in finding potential restriction enzyme recognition sites. BMC Bioinformatics. 2009 Sep 11;10:286. PMID: 19747395; PMCID: PMC2746817.

Kriseman J, Busick C, Szelinger S, Dinu V. BING: biomedical informatics pipeline for Next Generation Sequencing. J Biomed Inform. 2010 Jun;43(3):428-34. PubMed PMID: 19925883.

Current position: Associate Professor, Biomedical Informatics, Arizona State U, Phoenix








Jamie Duke (Mentor: Steven Kleinstein)

Research: Jamie Duke’s research focuses on understanding the targeting mechanisms of activation induced cytosine deaminase (AID), which is responsible for somatic hypermutation in germinal center B- cells. She has recently been working to identify cis-regulatory modules which are responsible for recruiting AID to the immunoglobulin loci and other recently identified genes. The goal is to identify why some genes are targets of AID and others are not and additionally why some of the mutated genes are repaired in an error-free manner as opposed to other genes that are repaired in an error-prone manner.

Current position: Bioinformatics Scientist, Children’s Hospital of Philadelphia


Daniel Gadala-Maria (Mentor: Steven Kleinstein)

NLM Predoctoral Trainee: 9/1/09 – 8/31/14

Department: Interdepartmental Program in Computational Biology and Bioinformatics Research Supervisor: Steven Kleinstein, PhD (Pathology)

Research: Daniel Gadala-Maria is in the sixth year of his graduate training and is studying the antibody repertoire of humans. He is currently working on computational and graph-theoretic methods to sort into clonal lineages immunoglobulin data obtained from next-generation sequencing of blood samples. He also aims to study mutation patterns and track changes in the selection pressures acting upon these clones which may arise in response to antigen exposure.

Current position: In training


Matthew Holford (Mentors: Michael Krauthammer and Kei Cheung)

Research: Matt Holford is building an integrative data warehouse containing different types of data related to cancer from different sources. He is using this database to explore research issues related to semantic data mining using Semantic Web technologies.

Current position: Senior Consulting Engineer, Attivio


Jia Kang (Mentor: Hongyu Zhao)

Research: Jia Kang’s research is mainly comprised of two parts : 1) establishing risk prediction model for complex disease traits in a high dimensional data setting 2) building a sensible genetic regulatory network by integrating various sources of data.

Current position: Principle Biometrician/Biostatistician at Merck & Co. in Rahway, NJ.


Kevin Keating (Mentor: Anna Pyle)

Research: Kevin developed a method to determine atomic coordinates for backbone atoms from low resolution RNA crystal structures. To do this, he used both a reduced representation of RNA developed by the Pyle lab and RNA backbone rotamer library developed by the Richardson lab at Duke University. The goal was to get accurate information about the reduced representation from the electron density, and then determine the appropriate rotamer from this reduced representation data.

Current position: Associate Principal Developer, Schrodinger


Hugo Lam (Mentor: Mark Gerstein)

Research: Hugo Lam’s research focused on analyzing the human genomic sequences of different individuals so as to understand the formation mechanism of structural variation, including segmental duplication and copy number variation, which could result in different genomic impacts such as gene duplication, copy number polymorphism, disease, as well as pseudogenization. His research also focused on predicting protein domain binding sites by using a data mining approach which involves comparative and structural genomics, and in investigating the evolution of these binding sites.

Lam HY, Mu XJ, Stütz AM, Tanzer A, Cayting PD, Snyder M, Kim PM, Korbel JO, Gerstein MB., (2010),

Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nature

Biotechnology. 28(1): 47-55. PMCID: PMC2951730.

Lam HY, Kim PM, Mok J, Tonikian R, Sidhu SS, Turk BE, Snyder M, Gerstein MB., (2010), MOTIPS:

automated motif analysis for predicting targets of modular protein domains. BMC Bioinformatics. 11:

243. PMCID: PMC2882932.

Current position: Senior Director of Bioinformatics, Bina Technologies



Yin Liu (Mentor: Hongyu Zhao)

Research: Yin Liu worked on developing mathematical and statistical tools for investigating signal transduction pathways in eukaryotes. Goals included: 1) to develop statistical methods to predict protein- protein interactions in the eukaryotes by incorporating large scale genomics and proteomics data, 2) to apply protein domain information and the evolutionary connections among model organisms to improve the protein-protein interaction prediction, and 3) to reconstruct the signal transduction networks in S. cerevisiae by using the protein-protein interaction information.

Kim, I., Liu, Y., and Zhao, H. Sparsity priors for protein-protein interaction predictions. In: Bayesian Modeling in Bioinformatics (Dey, D., Ghosh S., and Mallick, B., eds.), Chapman & Hall (in press).Current Position: Assistant Professor, University of Texas at Houston School of Medicine


Karen Lostritto (Mentor: Annette Molinaro)

Research: Karen Lostritto’s research focuses on developing methods for discovering patterns in high- dimensional data, specifically in survival data. She is studying non-parametric algorithms for partitioning observations based on their covariate values with the aim of minimizing the residual sum of squares for each partition. She has extended the partDSA (partitioning Deletion Substitution Addition algorithm) to accommodate censored survival data by implementing the Inverse Probability Censoring weighting scheme.

Current position: Software Engineer, Google


ThaiBinh Luong (Mentor: Michael Krauthammer)

Research: ThaiBinh Luong’s work involves mining through biomedical literature in order to map instances of gene strings and diseases to a repository of gene/disease identifiers. The aim of this process is to more easily classify research papers and identify relevant papers for researchers.

Current position: Data Scientist, University of Pennsylvania Health System


John Murray (Mentor: Xiao-Jing Wang)

Research: John Murray's research is focused on building, running, testing, and refining large-scale neural circuit models for cognitive processes that are based on anatomical and physiological data.

Current position: Assistant Professor, Yale University


Laura Mustavich (Mentor: Hongyu Zhao and Kenneth Kidd)

Research: Laura Mustavich worked on elucidating a gene regulatory network consisting of genes that regulate and genes that are regulated by candidate genes for alcohol dependence. She built a computational model of alcohol consumption and metabolism, incorporating alternative versions of key genes, and used this model to drive the biostatical analysis. She received her PhD in December, 2010.

Current position: Lecturer, LA area



Sara Nichols (Mentor: William Jorgensen)

Research: Sara Nichols' work focused on protein structure sampling algorithms for applications such as structure determination as well as protein-protein and protein-ligand interactions. More specifically she explored the potential energy surface for ab initio folding and side chain prediction and multiple receptors for docking and screening, where it is important to consider the dynamic nature of the protein's backbone. Sara also explored on how implicit solvation can be applied in computationally complex systems.

Current position: Scientist, Celgene Corporation


Darryl Reeves (Mentor: TBA)

Research: Darryl Reeves is in his first year of graduate study, doing research rotations to help identify which faculty member’s lab he wants to join.

Current position: PhD Student, Cornell University


Jonathan Reichel (Mentor: Not Applicable)

Graduate progress: Jonathan Reichel withdrew from Yale University after extensive ongoing discussions with Dr. Perry Miller (and other faculty and staff at Yale) about his interests, goals, etc., shortly before the end of his first semester of graduate studies.

Current position: Lead Bioinformatics Scientist, Memorial Sloan Kettering Cancer Center


Rebecca Robilotto (Mentor: Mark Gerstein)

Research: Rebecca Robilotto is studying comparative genomics among multiple species, including worm, fly, human, and yeast. This work includes identification of pseudogenes in the multiple organisms, as well as comparing them to other functional elements in the genome. One goal is to determine whether pseudogenes may have a type of regulatory role.

Current position: Data Scientist,True&Co


Thomas Royce (Mentor: Mark Gerstein)

Research: The goal of Tom Royce’s research was to understand the effects that give rise to measured fluorescent intensities on oligonucleotide microarrays. The effects studied included the probes' nucleotide composition, the location of those nucleotides within a probe, and cross-hybridization (the binding of non-target nucleic acids to an unintended probe). These effects, combined, dwarf the desired effect we wish to measure (that of target nucleic acid binding) by orders of magnitude. By first understanding how nucleic acids (both desired targets and undesired cross-hybridizing molecules) bind to array features, and then developing statistical methods to diminish the undesirable effects mentioned above, the goal was to improve both the accuracy and precision of tiling microarray experiments.

Current position: Assistant Vice President, Bioinformatics, Ashion Analytics


Jill Rubinstein (Mentors: Michael Krauthammer and Paul Lizardi)

Research: Jill Rubinstein expects to complete her MD/PhD training in 2012. For her CBB PhD (which she received in 2010), she studied whole-genome methylation and expression patterns in melanoma using high-throughput sequencing techniques. The goals of this research were to better characterize the relationship between methylation levels and transcription and to identify epigenetic biomarkers of potential clinical utility.

Current position: Resident, Yale


Emmett Sprecher (Mentors: David Tuck and Lyndsay Harris)

Research: Emmett Sprecher is using translational bioinformatics approaches to study trastuzumab- resistance in HER2+ breast cancer. He is using linear modeling and gene set approaches to examine gene expression data, looking for biomarkers of resistance and response to treatment, investigating copy number variations in different patients, as well as in HER2+ breast cancer cell lines.

Current position: Bioinformatics Scientific Consultant, Seleventa


Kelly Stanton (Mentors: Yuval Kluger and David Tuck)

Research: Kelly Stanton is studying erythrocytes and leukemia and particularly a cohort of RNA-sequence data. He is trying to identify alternative splice events and fusion proteins and determine their role in pathogenesis. He is also exploring techniques for non-linear dimensionality reduction of point cloud data and signal processing. Finally he is continuing work on an invasive micropapillary data set, and has determined leads for characteristic genes of the disease that are currently in experimental validation.

Current position: PostDoc Associate, Yale University


Sebastian Szpakowski (Mentors: Michael Krauthammer and Paul Lizardi)

Research: Sebastian Szpakowski focused his research on the design and analysis of microarray chips that allow to detect patterns of genomic methylation, e.g., in cancer versus normal tissues. The goal was to functionally annotate the uncharted genomic regions, known as the "junk" DNA.

Current position: Investigator, Novartis Institutes for BioMedical Research

Mohamed Uduman (Mentor: Steven Kleinstein)

Research: Mohamed Uduman’s research involves computational analysis of the immune system. Specifically, he is studying Immunoglobulin (Ig) receptor sequences and lineage trees.

Current position: Self Employed