This study was reported in accordance with the STREGA reporting guidelines (ref https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1000022) which are available in Supplement S1
The data source
This analysis used the MalariaGEN dataset and was approved by MalariaGEN IDAC (Application: 71). Clinical phenotyping (clinical parameters, parasite counts) was available for a subset of this cohort, the Kenya cohort, and was organized by Professor Tom Williams of the KEMRI-Wellcome Unit .
This study was a meta-analysis of 11 separate studies on resistance to severe malaria. Each study contributed to a larger meta-analysis of resistance to severe malaria . Each site had the same genetic analysis pipeline (detailed in the post above), with analysis performed at each site and then meta-analysis for the summary results.
HMOX1 STR genotyping and STR length definition
This analysis used the MalariaGEN dataset. MalariaGEN SNP Matrix data was downloaded from EGA (EGA, dataset EGAD00010001799) . Imputation was performed using a recently developed haplotype reference panel based on 1000 genome data using Beagle v4.2  Previous work has shown this imputation to be reliable and has shown no benefit in filtering this imputation based on predicted genotype probability. . In this study, data from the 1000 Genomes Project  and the Human Genome Diversity project  was used to compare the imputed STR length with the actual STR length, as called by whole genome sequencing. In general, the correlation between true and imputed STR length in global populations was good (Pearson’s R > 0.8), although imputation performance was worse in some ethnic backgrounds. Previous work provides details on the imputation approach and quality control .
Previous literature on HMOX1 STR length in malaria used a wide range of reference lengths for the HMOX1 STR . Given the lack of consensus and evidence for a given definition, the decision was made to use summed repeat length as our primary definition in this study, where STR length is defined by the total number of GT repeats. For further analysis, the repeat was divided into three alleles – short (S), medium (M), and long (L), using the definitions below, consistent with previous literature and trimodal variation at this time. STR. The genotypes were then defined by the combination of these alleles, e.g. SS, SM, etc., in accordance with previous literature, leading to six genotypes at this locus .
Malaria results were extracted from the MalariaGEN dataset provided by MalariaGEN, along with gender, ethnicity and country data. Outcomes included: case status and type of severe malaria status (cerebral malaria (CM) and/or severe malarial anemia (SMA). Details of definitions are in the original MalariaGEN publication .
For one MalariaGEN collection site (Kenya), more detailed clinical phenotyping of malaria cases was available (blood pressure, platelet count, hemoglobin, CVD, parasite count, severe kidney disease and mortality). Details of definitions can be found in the relevant publication .
The main method of analysis was logistic regression on the status of severe malaria cases in each individual study site with the total length of STR repeats as the explanatory variable. Subsequently, site-specific estimates were meta-analyzed in a random-effects model to generate summary estimates over the entire cohort. For each severe malaria subtype (CM, SMA, or both), a similar analysis was performed by comparing cases of each severe malaria subtype to a) all other cases and b) all controls. Subsequently, another definition of repeat length was generated by dividing each allele across the trimodal distribution into short (S), medium (M), and long (L) alleles. The cutpoints of the alleles were short (< 27 repeats), medium (27–32 repeats) and long >32 replicates and these alleles were applied in a logistic regression against severe malaria case status.
As a third alternative model, the actual genotype was included (eg, short-short, short-medium, medium-long, long-long), in a logistic regression model against the reference genotype (medium-medium). These analyzes were again performed for each outcome in each country and meta-analyzed in a random-effects model. To ensure some precision in the estimates, models were only run if they included more than 30 cases and more than 150 patients in total.
The first 10 principal components and gender were included as covariates in our models. These principal components were provided by MalariaGEN and calculated at the site level, and therefore represent the genetic variation at each site, not across the entire meta-analysis.
Associations and interactions with clinical variables
One dataset had clinical data (Kenya). At this site, linear regression was performed with STR length on clinical biomarkers of severity: Hb, platelet count, blood pressure, leukocyte count, CVD, platelet count and parasite count (recorded to improve model fit).
Subsequently, an assessment was made of any potential interaction between STR length and clinical variables. In these models, logistic regression was performed on the status of severe malaria cases with an adjusted interaction between repeat duration and clinical variable (e.g., case ~ STR length * platelet count). This analysis was based on laboratory data suggesting that HMOX1 the variation may only have clinical impact in certain subtypes (eg, high parasite counts) of severe malaria.
#HMOX1 #STR #Polymorphism #Malaria #Analysis #Large #Clinical #Dataset #Malaria #Journal