Flowchart showing the model training process from data QC, through model training, accuracy calculation, and performance comparison.

MGI Sequencing Platform and Sentieon Machine Learning Model Help Improve Small Variant Calling Accuracy for Whole Genome Sequencing Data

DNAscope, a germline variant calling pipeline from San Jose-based analytics firm Sentieon, achieved superior SNP and indel accuracy compared to other standard datasets, according to a recent preprint from the company. The study authors coupled DNAscope with MGI Standard MPS chemistry, forming a new DNAscope model for the MGI DNBSEQ-G400* sequencing platform.

The highly accurate Genome Analysis Toolkit (GATK) HaplotypeCaller is the industry standard small variant caller. However, existing short-read variant callers, including HaplotypeCaller, have an imperfect match with high-confidence variant calls, especially in clinically relevant complex genomic regions. Improving accuracy at these sites is increasingly important, the study authors said, as next-generation sequencing data is increasingly used in the clinic.

Sentieon’s DNAscope combines established methods of haplotype-based variant callers with machine learning for greater accuracy. The system improves active region detection and local assembly for higher levels of sensitivity and robustness, especially in high complexity regions. DNAscope produces candidate variants with informative annotations, and the candidates are transferred to a machine learning model for variant genotyping.

DNAscope MGI Model v0.5 training on the DNBSEQ-G400 sequencing platform

The researchers developed an MGI DNAscope model to analyze the accuracy of current PE150 WGS reads generated on MGI’s DNBSEQ-G400 mid-throughput benchtop sequencer, which the researchers say was selected for its comprehensiveness and flexibility.

Flowchart showing the model training process from data QC, through model training, accuracy calculation, and performance comparison.
Overview of the model training and benchmarking pipeline.

The platform, combined with the trained DNAscope model, provided greater accuracy than previously published benchmarks and other mainstream platforms. It also exhibited faster processing speed, while allowing for fewer false negative and false positive variant calls, as Sentieon’s DNAscope was able to more accurately model systematic error patterns.

“MGI provides high-quality sequencing products, giving the genomics community more choice in sequencing platforms. At Sentieon, we are very pleased to optimize DNAscope for MGI sequencers and jointly launch this secondary analysis solution for MGI customers,” said Jun Ye, CEO of Sentieon. “This optimized pipeline helps MGI and its customers achieve high performance computing efficiency, ease of use and high accuracy. We look forward to continuing our collaborations in the future to provide high quality solutions to the genomics industry. »

With high accuracy and low cost, the DNBSEQ-G400 can make high-performance sequencing projects more accessible to researchers and clinicians, especially those working on human disease and population diversity using large sets of samples. Combined with Sentieon’s DNAscope, the system can provide enhanced variant calling, including in complex genomic regions that are clinically relevant, potentially facilitating more relevant clinical discoveries and treatment strategies.

The DNBSEQ-400 sequencer exposed on a table with a playback strip
The DNBSEQ-400 at the ASHG.

As evidenced by this study, MGI’s DNBSEQ-G400 optimizes day-to-day sequencing while demonstrating higher accuracy in detecting SNPs and indels, according to Yongwei Zhang, CEO of MGI Americas. The system has a daily data output of up to 1440 GB with the ability to run 1 or 2 flow cells using 2 types of flow cells (550M/readings vs 1800M/readings) and various read length options of SE50 to SE400 or PE300. Built with a new flow cell system and optimized optical and biochemical systems, the DNBSEQ-G400 can flexibly and quickly support a variety of different sequencing and data analysis in areas such as basic research , clinical research, forensic medicine and agriculture. With the DNBSEQ-G400 launched in the United States in August and the first instruments placed in customer labs within a week, Zhang said MGI demonstrates its commitment to local customers and partners to make sequencing more accurate, flexible and affordable. .

“MGI will compete based on our advantages of DNBSEQ technologies, comprehensive workflow solutions including upstream sample/library preparation automation, downstream BioIT solutions, excellent customer from our local team and cost,” Zhang said. “Our goal is to enable more and more customers to have the choice between higher quality data and lower cost when purchasing sequencing platforms. We’ll help them do more cutting-edge research or offer more accurate and affordable clinical tests. »

*Unless otherwise stated, StandardMPS and CoolMPS sequencing reagents and sequencers for use with these reagents are not available in Germany, Spain, UK, Hong Kong, Sweden, Belgium, Italy, Finland , Czech Republic, Switzerland, Portugal, Austria and Romania. No purchase orders for StandardMPS products will be accepted in the United States before January 1, 2023.

#MGI #Sequencing #Platform #Sentieon #Machine #Learning #Model #Improve #Small #Variant #Calling #Accuracy #Genome #Sequencing #Data

Leave a Comment

Your email address will not be published. Required fields are marked *