An international team of researchers — including Clemson University scientists Julia George and David Clayton — has published a paper in the prestigious journal Nature updating their efforts to produce high-quality genome sequences for all the Earth’s nearly 72,000 vertebrate species.
The genomes generated by the Vertebrate Genome Project will allow scientists to address fundamental questions in biology and disease and to identify species most genetically at risk for extinction. The research is crucial because the planet is in the midst of its sixth mass extinction event — the worst since the die-off of dinosaurs 66 million years ago — with one in eight vertebrate species at risk of extinction.
The team, which includes scientists from over 50 institutions in 12 countries, has worked over the past five years to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes.
The researchers outlined lessons learned from generating genome assemblies for 16 species representing six major vertebrate lineages in the paper titled “Towards complete and error-free genome assemblies of all vertebrate species.”
The researchers confirmed that long-read sequencing technologies are essential for maximizing genome quality. Long-read sequencing produces longer pieces of contiguous genetic data than short-read sequencing, making it easier to assemble the DNA sequences into whole chromosomes. The VGP used a state-of-the-art approach of combining long-read and long-range chromosome scaffolding approaches with novel algorithms that put the pieces of genome assembly puzzle together.
Key insights came from attempts to improve the genome assembly for one species, the zebra finch, a songbird which has been the focus of intensive research by George and Clayton for 30 years. In 2010, the zebra finch, which has a similar vocal learning process to humans, became the second bird in the world to have an initial assembly of its sequenced genome. The chicken, an important agricultural species, was the first.
Applying their experience, George, an associate professor in Clemson’s Department of Biological Science, and Clayton, who chairs the College of Science’s Department of Genetics and Biochemistry, recognized problems in the prior assemblies and helped to advance improved methods for assembling genomes.
“To sequence a genome, you have to sequence a lot of pieces of that genome. They’re like pieces of a puzzle that you have to fit together,” George said. “But there are two copies of every gene in a typical genome, so it’s like you have two jigsaw puzzles you have to assemble, but all the pieces are together in one pile. You have to figure out how the pieces fit together and the puzzle to which they belong. There were problems in the methods that were leading to pieces ending up in the wrong puzzle.
“With our efforts with quality control in the zebra finch, we recognized they had misassembled some chromosomes in the zebra finch,” George continued. “The assembly group fixed that, and they fixed their methods so it wouldn’t happen again with other species.” A first
The VGP produced the first sequence of the zebra finch’s W chromosome, which is female-specific.
“For the first time, we have all the genes for both the male and female zebra finch,” Clayton said.
Besides the zebra finch, the species sequenced by the VGP were the pale spear-nosed bat, greater horseshoe bat, Canada lynx, platypus, Kakapo, Anna’s hummingbird, Goode’s Thornscrub tortoise, two-lined caecilian, zig-zag eel, climbing perch, flier cichlid, eastern happy, channel bull blenny, blunt-snouted clingfish and thorny skate.
The researchers said that despite remaining imperfections, the VGP’s reference genomes are the most complete and highest quality to date for each species sequenced. When the VGP began generating genomes beyond Anna’s hummingbird in 2017, only eight vertebrate species in GenBank had genomes that met its target continuity metrics and none were haplotype phased. Now, there are 130.
The research is part of the VGP’s first phase that seeks to sequence approximately 260 vertebrate orders. Orders are defined as lineages separated by 50 million or more years of divergence from each other. There are three additional phases.
To sequence all vertebrates within 10 years, 125 would have to be completed each week.
“In some ways, this is a proof of concept,” Clayton said. “What would have been a completely ludicrous idea even 15 years ago — to say that we can sequence everything on the planet — is now a viable target.”
Comparing the DNA sequences of all vertebrates will enable understanding of how genes have contributed to the evolution and survival of these species.
“It will pay off in ways that we can’t anticipate now, so besides the things we know we can do, there are probably a lot of things that we will discover that we weren’t planning to discover,” George said.
The College of Science pursues excellence in scientific discovery, learning and engagement that is both locally relevant and globally impactful. The life, physical and mathematical sciences converge to tackle some of tomorrow’s scientific challenges, and our faculty are preparing the next generation of leading scientists. The College of Science offers high-impact transformational experiences such as research, internships and study abroad to help prepare our graduates for top industries, graduate programs and health professions. clemson.edu/science
Get in touch and we will connect you with the author or another expert.
Or email us at firstname.lastname@example.org