This site is part of the Siconnects Division of Sciinov Group

This site is operated by a business or businesses owned by Sciinov Group and all copyright resides with them.

ADD THESE DATES TO YOUR E-DIARY OR GOOGLE CALENDAR

Registration

New refget version improves reliability of genomic data analysis

19 JULY, 2023

CRAM is a popular and efficient file format for storing DNA sequences, able to reduce storage costs by up to 50%. It achieves that staggering compression by relying on a reference sequence -; a bit of DNA considered typical.

Compare your own genes against the reference, and you will begin to see variation: genetic differences that could lead to everything from freckles to a high risk of breast cancer.

Instead of storing all three billion base pairs of the reference sequence alongside the DNA being studied, CRAM files simply hold onto the reference sequence's name.

When it's time to decompress the data, refget steps in -; helping you "get" the "reference" you need.

"The consequences of comparing genomic data to incorrect or misaligned reference sequences are serious. Genetic variants may be classified as pathogenic or harmless incorrectly, and patients could receive improper care. Being exact matters," he said.

By assigning a unique identifier to reference sequences, refget solves a tricky naming problem in genomics.

Central authorities like the International Nucleotide Sequence Database Collaboration (INSDC), Ensembl, and the University of California, Santa Cruz (UCSC) Genome Browser use different naming conventions for the same reference sequence.


Why you need refget for any genomic analysis

For the initial refget release in 2018, the GA4GH Large Scale Genomics Work Stream tailored the API to support CRAM.

But Yates and team quickly realized that refget could smooth over issues in other genomic data formats and models. VCF and SAM also support refget identifiers, with growing community interest in using them.

"refget is a fundamental building block for GA4GH standards," said Yates. "It can solve problems beyond CRAM, for any file format or data model that requires a reference sequence. With refget, you know exactly what sequence you're talking about."

Source: https://www.news-medical.net/news/20230719/New-refget-version-improves-reliability-of-genomic-data-analysis.aspx


Subscribe to our News & Updates