College of Engineering, Computing and Applied Sciences; College of Science

Study produces most complete look at the human genome, especially in regions previously thought too complex to resolve

Share:

A human reference genome serves as a roadmap for researching genetic variations linked to diseases, understanding evolution and advancing personalized medicine.

A team of scientists — including researchers from the Clemson University Center for Human Genetics and the Department of Genetics and Biochemistry — has produced the most complete and accurate look at the human genome to date, especially in regions that have long been considered too complex to resolve.

Headshot of woman with brown hair and glasses wearing an orange scarf
Miriam Konkel

The researchers — members of the Human Genome Structural Variation Consortium— assembled near-complete genomes from 65 individuals representing a variety of the world’s populations, advancing the scientific exploration of complex genetic structural variation.

“The greatest advancement in this work is that we can get into these really difficult regions and build the structure and show the genetic variation within them, and now investigate how they are contributing to phenotypes or disease,” said Miriam Konkel, an assistant professor in the Department of Genetics and Biochemistry and a member of the University’s Center for Human Genetics.

Genome sequencing and assembly involves breaking down a DNA sample into smaller fragments, determining the sequence of each fragment and then piecing those fragments back together to reconstruct the complete genome. 

Structural variations are genetic code alterations that span more than 50 base pairs, the rungs of the DNA ladder. They occur in several ways: deletions, inversions, duplications, transpositions, mobile element insertions and more intricate rearrangements. Scientists study these variations to see whether they significantly affect gene function or gene expression. 

Headshot of man wearing a plaid shirt standing in front of greenery.
Mark Loftus

These changes were hard to detect until the recent advent of newer sequencing technologies and analytical algorithms, as well as larger collections of more complete, diverse genomes.

“By closing those gaps, we’re improving our ability to study human health across all populations. It’s a major step toward more inclusive and precise genomic medicine,” said Mark Loftus, formerly a postdoctoral fellow in Konkel’s lab. He is now an associate computational scientist at The Jackson Laboratory.

Major milestone

“This work marks a major milestone in the field of genetics, moving from studying incomplete or isolated fragments of the human genome to being able to fully sequence, assemble, and analyze many complete, phased human genomes. It’s also not simply a technological advance; it’s a shift toward building a more complete reference library that better captures the full spectrum of human genetic diversity, with meaningful clinical and scientific implications,” Loftus continued. “It’s exciting to think about how much more we’ll be able to discover now that we are beginning to finally see the whole picture.”

Although complex structural variations have been difficult to spot and analyze, they are important finds because they are much more likely to alter the expression of genes. After identifying such variation between and within populations, it is now easier to determine whether the differences result in disease or other traits, like helping our ancestors adapt to their environments.

headshot of man wearing a blue t-shirt.
Gianni Martino

“Most structural variants occur in just one of two copies of a gene, making it hard to discern their effect. We can now trace altered expression back to the specific copy of DNA responsible. It opens the door to a deeper understanding of genetic disease,” explains Gianni Martino, a Ph.D. candidate in the Konkel Lab. 

Genome sequencing by the consortium closed 92% of all the gaps in previous assemblies — most of which corresponded to these complex variants. In analyzing this set of diverse human genomes, the international collaboration of scientists uncovered up to 26,115structural variants per individual, for a total of more than 175,000 sequence-resolved events that were seen at least once. 

Other highlights included:

Improved assemblies of several Y chromosomes. Y chromosomes are difficult to assemble because they contain many highly repetitive sequences. Observing several new Y chromosome assemblies, the researchers investigated one of the most extensive densely packed regions of the human genome, known as Yq12. While acknowledging that the Yq12 region remains challenging to probe, the researchers have begun making inroads to determine variation. Their findings show that it is among some of the most variable portions of a human’s Y chromosome. 

New look at the major histocompatibility complex. This complicated region, highly relevant to disease research, is associated with immune function and autoimmunity dysfunction. One location examined for variations in this complex was important to vaccine response and autoimmune diseases. Other studies of this complex region examined variations in areas responsible for coding cell surface receptors that sense and signal the presence of viruses and other invaders. 

Centromere variations. Genome regions associated with centromeres are among the most prone to mutations. The lengths of more than one-fifth of centromeres vary by more than 1.5-fold, and about one-third vary in structure. Not surprisingly, the researchers found a large number of new variants: more than 4,000 based on their complete sequence of 1,246 centromeres. 

Survival motor neuron genes (SMN1/SMN2). These genes are in a structurally complex region of biomedical interest. Mutations in, or lack of, the SMN1 gene are linked to spinal muscular atrophy (caused by the lack of a protein needed for muscle movement). SMN2 is a less powerful backup gene but a target of one of the most successful gene therapies. These genes are embedded in a region of long, repeated DNA sequences. This has made full sequencing nearly impossible until now. Through their assemblies of this region, researchers obtained the structure and copy number of these and a few other genes among several of the individuals in their study. They distinguished functional copies of SMN1 and SMN2. Their analysis also suggested potential disease-risk sites in a few of the genomes analyzed. 

Senior scientists and institutions heading the project, in addition to Konkel, include Evan Eichler at the University of Washington, Jan Korbel at the European Molecular Biology Laboratory in Germany, Tobias Marschall at Heinrich Heine University in Germany, and Charles Lee and Christine R. Beck at The Jackson Laboratory for Genomic Medicine in Connecticut. 

Detailed findings were published in the scientific journal Nature in an article titled, “Complex genetic variation in nearly complete human genomes.” 

Several agencies provided funding for the researcher, including the National Institutes of Health through Institute of General Medical Sciences grant 1P20GM139769.

Adapted from a release from the University of Washington School of Medicine

Want to Discuss?

Get in touch and we will connect you with the author or another expert.

Or email us at news@clemson.edu

    This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.