Updating Local Birth Weight Percentiles and Statistical Methods Motivated by Challenges in Analysis of Microbiome Studies

  • Author / Creator
    Hajihosseini, Morteza
  • This dissertation contains two main parts. In part one, I updated the current reference for sex-specific birth weight percentiles by gestational age, overall, and for specific ethnic groups, based on data from all singleton live-birth deliveries from 2005 to 2014 in Alberta, Canada. Crude and corrected estimates for birth weight percentiles, including cut-off values for large for gestational age (LGA) and small for gestational age (SGA), were calculated by sex and sex-ethnic group and gestational age for singleton live births. Birth weights were modelled by gestational age using generalized additive modelling with non-constant variance. The findings show that LGA and SGA cut-offs were lower for females than for males for all gestational ages. The SGA and LGA percentiles were greater for both male and female very preterm infants in Alberta compared to previous national references. Ethnicity-specific LGA and SGA cut-offs for term Chinese and preterm and at-term South Asian infants were consistently lower than those for the general population in Alberta and the previous national reference. South Asian infants had lower birth weights at almost all gestational ages than the other groups. These updated birth weight percentiles presented in this study highlight the differences in SGA and LGA cut-offs among infants from South Asian, Chinese, and the general population, which may be essential for clinical perinatal care.
    Part two focused on statistical methods motivated by challenges such as zero inflation, over-dispersion, dimentionality issue, and within-sample correlation in analyzing the infants' gut microbiome data. I first evaluated the performance of three distribution-based models and discussed their ability to accommodate the challenges of gut microbiome data in a comprehensive simulation study with 27 scenarios. In addition, I used each model to analyze a real data set. Sixty-seven percent of our simulation scenarios indicated that the Zero Inflated Negative Binomial model had a lower mean squared error than the other methods, and the zero-inflated Gaussian mixture model had better statistical power. The real data application on the SKOT (the Danish abbreviation for "Dietary habits and wellbeing of young children.") Cohort I and II dataset showed the effect of maternal obesity on the taxon abundance of infants at 9- and 18-months assessments. Our study showed that univariate zero-inflated negative binomial model and negative binomial-based ManyGLM model could adequately accommodate the challenges in the gut microbiome data without requiring data transformation or normalization, depending on the goal of the study.
    Following our comprehensive review study, we proposed a Bayesian Marginal Zero-inflated Negative Binomial (BAMZINB) model, addressing complexities associated with the multivariate structure of the data, inter-variability, heteroscedastic variations, fluctuating library size, high-dimensionality issues, and the zero-inflation in microbiome data. Furthermore, we compared the performance of BAMZINB with- and without- random intercept with two alternative models, the Genewise Negative Binomial Generalized Linear Models (glmFit) implemented in edgeR and the Bayesian hierarchical Generalized Linear Model (BhGLM) implemented in BhGLM package in R software. The results of 32 simulation scenarios showed that BAMZINB models performed as well as BhGLM and glmFit, in estimating average abundance change between groups of interest. The average deviance of the models was different among various simulation scenarios. The application of the BAMZINB model on the real dataset showed the average abundance change in a specific list of bacteria over time for infants born to healthy mothers and those born to obese mothers.
    The second part of my dissertation results will help other research groups working on human gut microbiome data better understand the underlying challenges in analyzing the infants' gut microbiome data. I believe that rather than asking for the best method available for studying the effect of a covariate on taxon abundance, we should make an effort to understand the underlying structure of microbiome data, and adapt an existing method to address statistical assumptions called by the data.

  • Subjects / Keywords
  • Graduation date
    Fall 2022
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.