Sandra Safo, PhD


As a biostatistician, Dr. Safo’s long-term goal is to develop and apply advanced statistical methods and computational tools for big biomedical data to advance clinical translational research and precision medicine in women’s health. Drs. Waller and Long are serving as Dr. Safo's primary mentors. 

The analysis of big biomedical data is statistically challenging because the data are complex, heterogeneous, and have number of features in the order of tens of thousands relative to the number of samples in the order of hundreds. This requires the development of novel statistical methods that can account for these statistical challenges. Recent statistical and translational works suggest that integrating different data types such as transcriptomics, genomics, metabolomics and epigenetic data can help elucidate mechanisms underlying complex diseases. Therefore, in her doctoral work at the University of Georgia, she developed innovative statistical methods for jointly analyzing epigenetic and transcriptomics data to identify genomic biomarkers highly associated with breast cancer. Dr. Safo’s findings suggested that these biomarkers could be incorporated into cancer risk prediction algorithms to improve cancer diagnosis and prognosis. She have also developed sample size methods for big data to enable clinicians who want to plan cancer studies know the required sample size to achieve a desired classification accuracy. Her postdoctoral research builds on her work in integrative data analysis. She is also the lead statistician on several research projects in obesity and cardiovascular diseases (CVD), which have resulted in four manuscripts under review.

In the current BIRCWH proposal, Dr. Safo will develop novel statistical methods that integrate genomics, metabolomics, and subclinical CVD biomarkers. Her overall goal is to help define genomic and metabolic susceptibility to CVD in HIV-infected women. This may ultimately help clinicians identify HIV-infected women who are likely to benefit from targeted therapeutic CVD interventions.

BIRCWH Accomplishments 2016-2017 


2016-2016: Travel Award to participate in ENAR Diversity Workshop, Austin, TX

2016-2018: Program to Increase Diversity among Individuals Engaged in Health-Related Research in Cardiovascular Genetics Epidemiology (PRIDE-CGE) Scholar, Washington University St Louis

Grant Submitted

Title: Advancing Methods in Integrative Analysis of Big Data: Assessing Genomics and Metabolomics Influences on Subclinical CVD in HIV-infected men and women.

Funding Agency: NIH NHLBI Mentored Career Development Award to Promote Faculty Diversity in Biomedical Research (K01).

Role: PI

Professional Service and Presentations

2016: Session Organizer of “Recent Advances in Integrative Analysis of Omics Data”, ICSA 2016,Atlanta, GA

2016: Participant of Diversity Workshops, ENAR 2016

2017: Invited speaker for session “Integrative omics analyses”, 2017 Summer Research Conference (SRC) of the Southern Regional Council on Statistics, Jekyll Island, GA

2017: Poster presentation at the PRIDE-CGE Annual Meeting, NIH, Bethesda, MD. Title: Identifying Genomic Biomarkers of Cardiovascular Diseases in HIV-Infected Women


August 21 2017- Tenure-Track faculty appointment in Biostatistics, Division of Biostatistics, University of Minnesota, MN


  1. Safo SE and Long Q. (2016). Sparse Linear Discriminant Analysis in Structured Covariates Space (2016). Peer-reviewed and accepted for publication in the 3rd IEEE International Conference on Data Science and Advanced Analytics Conference Proceedings (Acceptance rate, 23%).
  2. Safo SE and Jeongyoun A (2016). General Sparse Multi-class Linear Discriminant Analysis. Computational Statistics and Data Analysis, Volume 99, Pages 81-90.
  3. Safo SE, Li S, and Long Q (2016). Integrative analysis of transcriptomic and metabolomic data via sparse CCA with incorporation of biological information. Invited Revision for Biometrics 
  4. Safo SE, Ahn J, Jeon Y, Jung S (2016) Sparse Generalized Eigenvalue Problem with Application to Canonical Correlation Analysis for Integrative Analysis of Methylation and Gene Expression Data. Invited Revision for Biometrics
  5. Ziyi Li, Sandra E. Safo, and Qi Long (2016) Incorporating biological information in sparse principal component analysis with application to genomic data. Invited Revision for BMC Bioinformatics.
  6. Jackson SL, Safo SE, Staimez LR, Olson DE, Narayan KMV, Long Q, Lipscomb J, Rhee MK, Wilson P, Tomolo AM, Phillips LS (2016).  Glucose challenge test screening for prediabetes and early diabetes.  Accepted. Diabetic Medicine. DOI: 10.1111/dme.13270 PMID: 27727467
  7. Jackson SL, Safo SE, Staimez LR, Long Q, Rhee MK, Cunningham SA, Olson DE, Tomolo AM, Ramakrishnan U, Narayan KMV, and Phillips LS (2016). Reduced Cardiovascular Disease Incidence with a National Veterans Health Administration Lifestyle Change Program. Accepted. American Journal of Preventive Medicine. DOI 10.1016/j.amepre.2016.10.


2016: Atlanta Society of Mentoring (ASOM) Series

2016-2017: PRIDE Cardiovascular Genetic Epidemiology Summer Conference

2017: Team Science Training