SNPs, Tag SNPs & Haplotypes
By David Kattenburg
Tom Hudson is a busy guy. Few have contributed more to our understanding of how the human genome works — for good or ill — than this congenial, 53 year-old native of Arvida, Quebec.
A clinical immunologist by training, Hudson has played a lead role in the discovery and mapping of haplotypes, a feature of the human genome that has become instrumental in identifying the genetic origins of disease.
As a postdoctoral student at the Massachusetts Institute of Technology in the early 1990s, the Human Genome Project only just underway, Hudson led a team engaged in mapping the genome’s transcriptional units — those base-pair sequences that get transcribed into messenger RNA, which in turn gets translated into functional proteins. With the aid of a robot called a Genomatron, the group identified 10,000 functional genes.
Hudson returned to Montreal in 1996. Over the course of the following years — as the director of the Montreal Genome Center, then the McGill University and Genome Quebec Innovation Centre — Hudson and his team would identify genes associated with asthma, leprosy, multiple sclerosis, Type II diabetes, and Crohn’s Disease. It was the 2001 publication of findings on a gene linked to Crohn’s that led to the creation of the HapMap, and the launch of the International HapMap Project.
To understand what a haplotype is, one must first understand the concept of the single nucleotide polymorphism, or SNP (pronounced snip). DNA, life’s molecular source code, consists of two very long threads wound around each other in double-helical fashion. Each thread consists of a series of building blocks called nucleotides containing one of four different nitrogenous bases: adenine (A), thymine (T), guanine (G), or cytosine (C). The sequence of nucleotides on one thread complements the sequence on the other, according to a precise base-pairing rule: Adenines always pair with thymines, and guanines with cytosines (All Toes Go Crunch).
The DNA molecule, in turn, is condensed into those structures we call chromosomes. Humans have 23 pairs of chromosomes in each of their body cells, one of each pair from mum, the other from dad (i.e. 46 in total; our diploid number). The diploid genome is six billion base-pairs long (The human genome is often referred to in its haploid dimensions: the three billion base pairs in the 23 chromosomes of sperm and egg).
Now, most of the six billion base pairs making up the human genetic code are exactly the same from individual to individual. Every one in a thousand positions, however, there’s a slight difference: an A has been substituted for a G (or vice versa), or a T for a C. This is a single nucleotide polymorphism, or SNP. An estimated twenty million SNPs exist in the human genome, the sum total of genetic variability accumulated over 150,000 years of modern human evolution.
Pondering these SNPs, something nifty occurred to researchers like Tom Hudson: Single nucleotide polymorphisms could be recruited as signposts for nearby genes that cause disease — genes a few thousand or tens of thousands of base pairs away to which one or another SNP is intimately associated.
To understand how genetic intimacies of this sort might occur, you have to understand the mechanics of a phenomenon called “crossing over,” in which each of the 23 chromosome pairs in a primordial sex cell delicately shuffle, or recombine their respective nucleotide morsels before saying bye-bye, and getting partitioned into haploid sperm or egg cells.
Think of a deck of cards, where the Queen of Hearts and King of Diamonds have been placed next to each other. In an average act of shuffling, the couple will tend to stay close to each other, if not together. Gene linkage is kind of the same.
So, Tom Hudson and others hypothesized, armed with a complete SNP map, they could search for SNPs that people with a disease like Crohn’s all tend to have, but that healthy people don’t. In this case, they could conclude that that SNP is very close (linked) to the culprit gene and narrow down their search accordingly, scrutinizing nearby segments of the genome for “open reading frames” (genes that get transcribed/translated into proteins).
Given the daunting task of scanning twenty million SNPs, it naturally generated excitement, back in the early 2000s, when Tom Hudson and others discovered that SNPs cluster together in groups, or “haplotypes,” as these islands of variation came to be called. Haplotype mapping began forthwith. The first iteration of the HapMap was completed in October 2005.
Soon, Hudson and others clued in to something else amazing: each haplotype existed in a relatively few number of variants, and human populations around the world could be compared on the basis of these. The International HapMap project was born.
As the HapMap became more and more detailed, yet another discovery was added to the mix: Haplotype variants could be rapidly distinguished from each other on the basis of SNPs unique to each variant. Tag SNPs, they called these. Facilitated by the latest in high-throughput DNA sequencing technologies, tag SNP mapping heralded the emergence of the genome-wide association study, and made direct-to-consumer genetic testing a subject of popular debate, spit tests and all.
Given the enormity of all this science, Tom Hudson would seem to be the sort of person to place bets on for a Nobel Prize. At the same time, the president and scientific director of the Ontario Institute for Cancer Research, editor-in-chief of the journal Human Genetics and freshly minted Officer of the Order of Canada (2013) is as unassuming and hospitable as a top-flight scientist or anyone else can be — as the above audio doc reveals.
Dr. Tom Hudson and I sat down for an interview in his office back in the winter, prior to touring his DNA sequencing lab and computer facility. Listen to the link on top.