INTRODUCTION
Prenatal diagnosis of genetic diseases can improve prenatal and postnatal care for at-risk pregnancies. Advancements in genomics techniques, such as chromosomal microarray during the early 2000s and the more recent development of next-generation sequencing (NGS), have significantly increased the prenatal diagnostic yield [1]. Amniocentesis and chorionic villi sampling are the current gold standards for diagnostic prenatal testing; however, these procedures are expensive; require a tertiary-care setting, at minimum; and carry the risk of inducing miscarriage [2]. The demand for safe, inexpensive, and noninvasive sampling techniques has increased substantially in recent years. Cell-free noninvasive prenatal testing or screening (NIPT/NIPS) strategies are currently in clinical use and can be used to identify at-risk patients who may benefit from additional testing.
The discovery of circulating cell-free fetal DNA (cffDNA) in the plasma of pregnant women forms the foundation of NIPT approaches [3] The development of massively parallel shotgun sequencing (MPSS) methods facilitated the use of cffDNA analyses in NIPT. By combining MPSS with cffDNA fragment counts, the relative overrepresentation or underrepresentation of chromosomes can be estimated [4, 5] Currently, NIPT is primarily used to detect common aneuploidies, such as trisomy 13, 18, and 21, as these approaches have higher sensitivity and specificity for these disorders than traditional maternal serum screening [5]. Cell-free NIPT has been incorporated into standard prenatal care, with more than 6 million women estimated to have undergone NIPT [5, 6]. NIPT is regularly used to screen for chromosomal abnormalities, such as aneuploidy [7, 8] and copy number variations (CNVs); however, because cffDNA primarily originates from the placenta, the detection of disease-related single-genes can be complicated by placental mosaicism [9]. Although NIPT is used in clinical practice to screen for monogenic conditions with autosomal dominant inheritance patterns caused by paternally inherited mutations or de novo mutations [10, 11], these tests have only been applied to detect a limited number of autosomal recessive inheritance patterns and maternally inherited conditions.
More than 2000 autosomal recessive disorders (ARDs) have been identified [12]. Most of these present with symptom onset early in life and have no cure, with treatment focused primarily on symptom management. ARDs are associated with considerable disease burden, affecting approximately 1.7 to 5 of every 1000 neonates [12], and the prevalence of ARDs is often higher among populations with high rates of consanguineous unions [12]. Consanguineous unions, defined as the union of two blood-related individuals who are second cousins or closer, are particularly common in Middle Eastern, South and West Asian, and sub-Saharan African societies, with over 1 billion people worldwide living in communities that view consanguineous unions as a traditional and respected social trend [13–15]. ARDs are also common in specific population groups [13–15]. For example, approximately one in four individuals with Ashkenazi Jewish heritage is a carrier for one of the following genetic conditions: Gaucher disease, cystic fibrosis, Tay-Sachs disease, familial dysautonomia, or Canavan disease [16]. In the United States, cystic fibrosis carrier frequencies are 1 in 28 for Caucasian populations and 1 in 59 for Hispanic populations. The incidence of sickle cell disease is approximately 1 in every 365 Black births [17]. Since the 1970s, carrier screening has been recognized as a cost-effective measure for reducing the reproductive risk of having children with severe ARDs, particularly among populations at high risk of specific genetic diseases, such as Tay-Sachs in the Ashkenazi Jewish population [18, 19].
NIPT analyzes chromosomal abnormalities, CNVs, and microdeletions [20, 21] using techniques typically used to analyze full genomic sequences, including chromosome selective sequencing MPSS, NGS, single-nucleotide polymorphism (SNP) analysis, DNA microarray technologies, and digital polymerase chain reaction (PCR). In MPSS, cffDNA and maternal cfDNA fragments are assessed and assigned to chromosomes, and the number of fragments is compared. Comparing fragment numbers is an effective method for identifying trisomy, as cffDNA from fetuses with trisomy will have a higher number of fragments than expected [22]. Rather than sequencing the whole genome, sequencing may target specific regions of chromosomes prone to trisomy (e.g., 21, 18, 13, X, and Y), although targeted sequencing can lead to a higher failure rate (~2% higher) than whole-genome MPSS [23]. NGS is a high-throughput, easily scalable, and rapid method used to sequence targeted DNA regions. By using unique molecular indexes (UMI) to label maternal plasma cfDNA prior to PCR amplification and sequencing, the UMI distribution can be used to identify single-nucleotide variants in the cfDNA with >99% sensitivity and specificity [22, 24]; however, the sensitivity of detecting small insertions and deletions is lower. Two key analytical approaches are used to detect ARDs: relative mutation dosage, in which the counts at a specific mutation site are compared between the cffDNA sequence and the parent genotype [25], and relative haplotype dosage, in which multiple, informative, heterozygous single-nucleotide polymorphisms are identified and used to compare haplotypes in maternal plasma, which increases accuracy and reproducibility. Currently, only pathogenic or likely pathogenic variants (based on current American College of Medical Genetics and Genomics variant classification guidelines [26]) are reported, and these variants are then confirmed in a second cfDNA aliquot using an amplicon-based NGS assay, which uses gene-specific primers to enrich the targeted region, followed by deep sequencing (>10,000×).
In the current study, we expanded the design of an amplicon-based NGS assay to measure allele imbalance in a target mutation and tested its ability to predict fetal genotypes among couples at high reproductive risk for ARDs. This method uses oligonucleotide probes designed to target and amplify specific regions of interest within a gene, followed by NGS. Our NIPT requires only maternal peripheral blood, from which plasma cfDNA is extracted. NGS technology, a proprietary bioinformatics algorithm, and population-specific data are then used to identify potential ARDs among these pregnancies.
MATERIALS AND METHODS
Patient recruitment and sample selection
Participants were recruited from King Fahad Medical City in Saudi Arabia and were enrolled during prenatal care because they were known carriers of an autosomal recessive variant, had a child with an autosomal recessive variant, or had a family history of ARD. A total of 98 patients were recruited for this study, and all tests were conducted between 10 and 18 weeks of gestation (for details, see Supplementary Table S1). Inclusion criteria were (i) pregnant women older than 18 years; (ii) a confirmed carrier status of any ARD; (iii) singleton pregnancy; and (iv) the provision of informed consent. Exclusion criteria were (i) multiple gestation pregnancies; (ii) gestational age less than 10 weeks; (iii) gestational age more than 18 weeks; or (iv) age older than 35 years.
According to the approved institutional review board protocol (21-240), we obtained written informed consent from all participants to draw blood and store maternal germline DNA and plasma samples. All participants underwent invasive testing, amniocentesis, as part of the standard of care, using standard-of-care procedures (i.e., collection of 10 ml amniotic fluid). Whole blood (5 ml) was obtained by venipuncture from each patient, and samples were transported to the laboratory for processing (Supplementary Table S1). A total of 50 samples that met quality control parameters, corresponding to 50 individual participants, were included in the study (Supplementary Table S2).
Sample processing and cfDNA extraction
Blood samples were collected from King Fahad Medical City between 2021 and 2023. Maternal blood samples were collected into Streck Cell-Free DNA BCT® (La Vista, Nebraska, USA), which were transported to the laboratory for DNA extraction. Plasma was separated from peripheral blood, and cfDNA was extracted from plasma using a ThermoFisher Kingfisher FLEX machine, following the manufacturer’s instructions. Genomic DNA (gDNA) was extracted from white blood cells, as described previously [27]. A workflow diagram is provided in Figure 1.

Workflow diagram illustrating the methodology. (a) Pregnant women pursuing prenatal testing and at high risk of autosomal recessive disease were recruited to the study. (b) Participants underwent invasive testing (amniocentesis) as part of the standard of care, in addition to supplying a blood sample for use in this study. (c) Blood samples were processed for plasma separation, and cfDNA was isolated from plasma. Maternal DNA was obtained from white blood cells. (d) Amplicon-based next-generation sequencing was performed on paired sets of maternal and fetal cfDNA for analysis.
Amplicon NGS
We developed an autosomal recessive noninvasive prenatal testing (AR-NIPT) assay using an amplicon-based NGS method and a fetal fraction (FF)-based data analysis algorithm. Amplicon NGS was used to detect known familial variants; this method uses primers designed to target and amplify specific regions of interest encompassing the genomic loci of known familial variants and internal reference polymorphisms, followed by NGS. To efficiently amplify short cfDNA, the amplicons are designed to be 100 bp (range: 98-101 bp). A panel of primers, including 91 common SNPs and 1 target mutation, was used to amplify each sample. All primers were first tested using gDNA and plasma DNA individually and then combined to a final concentration of 5 μM for multiplex PCR.
Multiplex PCR was conducted using 3-5 ng cfDNA and the primer panel. After amplification, all PCR products were ligated to an adapter with an index sequence to generate a library that was sequenced on an Illumina NovaSeq 6000, with a target yield of 50-100 Mb per library. NGS data were analyzed using open-source software and in-house scripts. The maternal variant allele frequency (VAF) was determined by amplicon-based NGS assay using maternal gDNA, whereas the plasma VAF was determined using the same assay on plasma cfDNA.
The plasma cfDNA allele number was calculated as follows:
The relationship among plasma VAF, maternal VAF, and fetal zygosity can be described as follows:
where n refers to the number of fetal mutant alleles and can range from 0 to 2, as follows: 0, wild-type (WT); 1, heterozygous mutation (HET); 2, homozygous mutation (HOM).
Fetal fraction calculation
FF values were calculated using a previously published method with modification [27]. Briefly, the FF was calculated from the median VAF values from two populations of informative loci in SNPs using the following formula:
Here, p_median refers to the median VAF of SNPs when maternal DNA is homozygous WT, and m_median refers to the median VAF of SNPs when maternal DNA is homozygous mutant.
Data processing and analysis
Sequencing data were analyzed using a custom data analysis pipeline. Sequencing data were obtained as fast-quality file format (FASTQ) files, a text-based format that contains NGS data together with quality scores for each base. Variant call format (VCF) files, which are used to store information about genetic variants, were generated from FASTQ files using open-source software: Burrows-Wheeler Aligner, a software tool used to align short DNA sequences with a reference genome using the Burrows-Wheeler transform algorithm; and Freebayes, an open-source Bayesian genetic variant caller that identifies SNPs, insertions, deletions, and structural variants from aligned sequencing data. Custom scripts were used to analyze the VCF files and call the fetal genotypes. We modified a statistical method [27] to determine the variant, zygosity status of the variants, and the FF based on the NGS data. A group of samples with known FF, fetal mutation, and fetal mutation type (i.e., WT, HET, or HOM) was used to train a statistical model. The sample mutation and zygosity status were calculated using NGS data and FF. The results of cffDNA samples were not disclosed to the patients tested as per the research consent.
Kernel density estimation (KDE) in data analysis
KDE is a non-parametric statistical method used to estimate the probability density function of a random variable. In the context of this study, KDE was employed to analyze the distribution of allele imbalance values obtained from sequencing data [28, 29] This method provided a robust framework for predicting fetal genotypes (e.g., WT, HET, or HOM) based on the allele imbalance observed in cfDNA. KDE analysis can model the distribution of data points without assuming any specific underlying distribution, making this approach particularly useful for distinguishing among genotypic categories with overlapping probability distributions. Using KDE analysis enhanced the confidence levels of genotype calling by categorizing all results into high-confidence and low-confidence groups. KDE analysis also supported the interpretation of ambiguous cases by providing probability scores for multiple genotypic outcomes, which were essential for accurate clinical reporting. The flexibility and precision of KDE in handling complex datasets, such as those involving high GC content genomic regions, were crucial for the success of the AR-NIPT method.
Baseline normalization and the training and optimization of machine learning models were conducted using 70% of the sample population, and model testing was conducted using the remaining 30% of the sample population. The receiver operating characteristic curves and their area under the curve scores were calculated and compared between the training and testing populations to evaluate the generalization ability of our methodology. We optimized genotype calling by employing multiple reference genomes in our statistical algorithm, which allows our model to account for population-specific and locus-specific high-resolution genomic CNV data and reduces the likelihood of false-positive or false-negative results.
The mutation PCR should have a similar performance as common SNPs in the multiplex PCR. When the performance of a mutation PCR is not comparable to the performance of the common SNP PCR, the variant cannot be assessed using our algorithm and will be labeled “Mutation not compatible.” For example, in Table 1, the VAF indicates that the amplification efficiency for the mutant allele differs greatly from the WT allele frequency, which serves as a proxy for the common SNP PCR amplification efficiency. Due to the size of this difference, this sample cannot be used in our KDE analysis.
Example of VAF amplification efficiency.
Case | Gene | Mutation | cfDNA mutation VAF |
---|---|---|---|
NIPT-033 | TBCE | c.155_166del | 0.329 |
Abbreviations: cfDNA, cell-free DNA; VAF, variant allele frequency. cfDNA mutation VAF is a proxy for similarity to WT allele frequency of common single-nucleotide polymorphisms. For kernel density estimate, the cfDNA mutation VAF should be close to 1.
For each variant, if the probability of minor content is >10%, the result is considered low confidence. Results with >90% probability of major content are considered high confidence.
RESULTS
AR-NIPT assay development
We developed an AR-NIPT assay using an amplicon-based NGS method and an FF-based data analysis algorithm. Of the 98 samples collected, 50 passed quality control and were included in the analysis (Supplementary Table S2). The remaining samples were excluded due to damage incurred during shipment caused by mishandling by the carrier. We processed and analyzed all 50 samples by extracting cfDNA and gDNA for amplicon-based NGS. We obtained conclusive results for 38 of 50 samples, and the NGS results for 29 of these 38 (76.3%) samples are concordant with the invasive results. Reasons for “no conclusive results” included insufficient cfDNA and mutations located in difficult genomic regions, such as those with a high GC content [high levels of guanine (G) and cytosine (C) bases in a DNA region]. Due to the strong hydrogen bonds that form between G and C bases, regions with a high GC content can be difficult to separate and amplify using typical PCR approaches, leading to reduced amplification efficiency and low sequencing coverage [30]. These factors can complicate accurate variant detection, impacting the reliability of genotypic analysis.
KDE analysis results
KDE analysis of all plasma variants, the 91 SNPs, and 1 mutation enabled the genotype of fetal DNA to be determined, and samples were grouped depending on the maternal cfDNA and cffDNA genotypes (Table 2, Fig. 2). These groups provide insights into the inheritance pattern of the target mutation and its implications for fetal health: group 1 (HOM/HOM) includes samples with HOM maternal cfDNA and HOM fetal cfDNA, indicating that the fetus has inherited the mutation from both parents; group 2 (HOM/HET) includes samples with HOM maternal cfDNA and HET fetal cfDNA, indicating inheritance of a maternal mutated allele and a paternal WT allele; group 3 (HET/HOM) includes samples with HET maternal cfDNA and HOM fetal cfDNA, suggesting the inheritance of the maternal mutated allele and a paternal allele with a mutation at the same locus; group 4 (HET/HET) includes samples with HET maternal cfDNA and HET fetal cfDNA, indicating that the fetus is a carrier, like the mother, without providing information regarding which parent the mutant allele was inherited from; group 5 (HET/WT) includes samples with HET maternal cfDNA and WT fetal cfDNA, indicating that the fetus has not inherited the maternal mutation; group 6 (WT/HET) includes samples with WT maternal cfDNA and HET fetal cfDNA, indicating that the fetus has inherited a paternal mutation; and group 7 (WT/WT) includes samples with WT maternal cfDNA and WT fetal cfDNA, indicating that the fetus is unaffected and not a carrier. This categorization enhances diagnostic accuracy and ensures targeted clinical interventions.
Samples were grouped according to maternal and fetal genotype combinations.
Maternal cfDNA genotype | Fetal cfDNA genotype | Plasma cfDNA VAF groups |
---|---|---|
HOM | HOM | Group 1 |
HOM | HET | Group 2 |
HET | HOM | Group 3 |
HET | HET | Group 4 |
HET | WT | Group 5 |
WT | HET | Group 6 |
WT | WT | Group 7 |
Abbreviations: cfDNA, cell-free DNA; HOM, homozygous; HET, heterozygous; WT, wild-type; VAF, variant allele frequency.

KDE analysis results enable the grouping of samples according to maternal cfDNA and cffDNA genotypes.
For AR-NIPT, we focused on groups 3-5 (n = 20), which include all samples with HET maternal cfDNA, because maternal heterozygosity is critical for analyzing allele imbalance and predicting fetal genotypes. All three groups displayed a normal distribution, with possible overlap between two distributions; therefore, each variant may belong to one of two genotypes.
We conducted a second KDE analysis focusing on the 20 samples (groups 3-5, Supplementary Table S3) for which the maternal genotype was determined to be heterozygous and found that four have mutations incompatible with KDE analysis, including insufficient cfDNA yield or mutations in regions with high sequence complexity. The genotypes for 10 samples could be predicted with high confidence, whereas the genotypes for the remaining 6 samples could be predicted with low confidence (Table 3, Fig. 3).
Genotype prediction confidence for groups 3-5 using KDE.
Calling categories | Sample count |
---|---|
High confidence | 10 |
Low confidence | 6 |
Mutation not compatible | 4 |
Total | 20 |
Abbreviation: KDE, kernel density estimation.

KDE analysis enables genotype prediction for cffDNA. If the probability of minor content is >10%, the result is considered low confidence and named with the probabilities of two groups (A). If the probability of major content is >90%, the result is considered high confidence and named with the major content (B).
Clinical analysis results
For clinical purposes, both WT and HET were considered negative samples that did not require immediate clinical attention. We attempted to group all 20 samples into positive (HOM) and negative (WT/HET) groups, which revealed three samples with low-confidence results (Table 4).
DISCUSSION AND CONCLUSION
Since its implementation in the 1970s, carrier screening for a limited number of ARDs, such as cystic fibrosis, hemoglobinopathies, and spinal muscular atrophy, has been a cost-effective approach for reducing ARD risk in high-risk populations [18, 19]. Because the bulk of all circulating cfDNA is of maternal origin, paternally inherited fetal variants and de novo variants that are absent from the maternal genome can be readily detected by investigating polymorphic regions using parental haplotypes as a reference [31]. This approach is particularly beneficial when pregnancy is unplanned, for late-gestational prenatal carrier testing, when the partner is unavailable for testing, or when patients cannot afford other methods.
NIPT has been developed to detect monogenic disorders with autosomal dominant inheritance patterns, which are primarily caused by de novo mutations and comprise approximately 40% of all severe postnatal single-gene disorders [32]. A novel cffDNA-based NIPT can be performed as early as 9 weeks for singleton pregnancies [27, 33]. Designed to screen for diseases associated with a single-gene etiology, high rates of de novo mutations associated with disease incidence, and categorized with either an autosomal dominant or X-linked inheritance pattern, the gene panel includes 30 causative genes associated with conditions such as Noonan syndrome, Cornelia de Lange syndrome, and osteogenesis imperfecta, which have a fairly high combined cumulative prevalence [27, 33].
Although NIPT for paternally inherited monogenic conditions is common in clinical practice, these tests have only been applied to detect a limited number of maternally inherited conditions, including ARDs and X-linked disorders. Maternally inherited conditions are more challenging to detect because they require assessing whether the fetal genome contains the maternal allele, which is more difficult but still feasible [34–36] if the inherited allele is genetically identical to the maternal allele. However, ongoing research using target haplotyping has returned promising results [37]. Establishing tests for the detection of monogenic conditions is also limited by the relatively small cohorts of positive cases, and additional studies with larger cohorts are necessary to identify the full clinical impact of NIPT for monogenic diseases [33, 34, 38].
We tested a new amplicon-based NGS NIPT that requires only maternal peripheral blood for the screening of monogenic ARDs. This process uses an amplicon-based NGS approach to measure the allele imbalance of a target mutation, comparing the VAF of maternal gDNA with the VAF of plasma cfDNA. We used a KDE method to calculate the probability of fetal genotypes from NGS data based on the FF and allele imbalance. In a recently reported proof-of-concept case series, we were able to successfully apply this approach to identify the fetal status of Neimann–Pick disease, type C1 variants in three cases during the first trimester, demonstrating the potential of this approach to identify high-risk pregnancies early [39]. The AR-NIPT method provides a noninvasive alternative to traditional invasive techniques, such as amniocentesis and chorionic villus sampling, which carry miscarriage risks and require specialized facilities. In addition, by utilizing amplicon-based NGS and advanced bioinformatics to detect ARDs, the AR-NIPT method addresses key limitations of existing NIPT methods that focus on aneuploidies and paternally inherited dominant disorders.
The AR-NIPT assay demonstrated a success rate of 76.3% concordance with invasive testing for the 38 samples with conclusive results. High-confidence genotype predictions were achieved for 10 of 20 samples with HET maternal cfDNA, highlighting the potential of this method to accurately detect fetal genotypes using a noninvasive approach. This innovative NIPT method provides several advantages: (i) Because only a maternal blood sample is necessary, this noninvasive NIPT can easily be integrated into routine prenatal care while reducing the risks posed to the fetus; (ii) The KDE method leverages a suite of bioinformatics tools to provide a statistical framework for predicting the fetal genotype based on allele imbalances, potentially increasing the accuracy of the test relative to more traditional methods; (iii) This test allows for molecular identification of pregnancies at high risk for ARDs, which can be critical for timely decision-making and intervention; (iv) The use of amplicon-based NGS and open-source tools offers a cost-effective alternative to more traditional methods. The limitations include some samples with insufficient cfDNA yield and regions with high GC content, both of which contributed to inconclusive results, highlighting the need for improved sequencing methods. These limitations must be addressed for the AR-NIPT method to reach the near 99% accuracy achieved by NIPTs for targeted conditions.
This novel NIPT method has the potential to improve prenatal care by offering expectant parents a noninvasive, cost-effective, accessible, and accurate option for screening ARDs. However, as with any new medical test, further validation and comparison with existing methods will be necessary to fully understand its efficacy and limitations. We recommend conducting further validation studies with larger and more diverse cohorts, encompassing various genetic backgrounds and consanguinity rates, to enhance the robustness, accuracy, and generalizability of the AR-NIPT method while also addressing current limitations such as cfDNA yield and sequencing challenges in high GC content regions.