Recombinant loss of heterozygosity RecLOH.

I have seen some comments about explanations of RecLOH being clear as mud so here is my attempt. Firstly, as with any new field of study, everyone is keen to establish their own language if only to keep outsiders out so some degree of decoding is necessary. Secondly this may appear too simple to some so please bear with the intent of writing this.

Genetic information is encoded by the order of the nucleotide bases of DNA with the four bases being adenine (A), thymine (T), guanine (G), and cytosine (C). G and C pair together and T and A pair together. As DNA is a double helix one strand of it mirrors the other strand so if one strand has three bases in line (say ATG) then the other strand will have TAC to mirror it.

A mutation represents some sort of change in that order and mutations may occur in both somatic (body) and sex cells.

Haplogroups.

One type of mutation is the single point mutation or single nucleotide polymorphism (SNP). This occurs when during replication a substitution occurs in one pair of nucleotide bases. So if the original sequence contained the pair AT and the replicated sequence contained the pair GC we have an SNP. It simply means we have a substituion of one pair of bases for another pair. Not much use to genealogists but they can have dire effects for the organism.

The relative mutation rate for an SNP is extremely low in humans which makes them ideal for marking the history of the human genetic tree. SNPs are named Xn, that is a letter and a number. The letter indicates the lab or research team that discovered the SNP and the number indicates the order in which it was discovered. For example, M170 is the 170th SNP documented by the Human Population Genetics Laboratory at Stanford University, which uses the letter M.

In genealogy SNP results are used to determine haplogroup and to date Haplogroups number from A to S. The original haplogroup (arising in Africa) is A and all other derive from this but most haplogroups have sub-groups designated with a number R1, R2 and further to R1a and R1b for example.

Insertions and Deletions.

Other types of mutation are the insertions and deletions (or INDELs). A frameshift mutation involves some number of deletions (say) that is not a multiple of three. Take the sentence “the dog barked at the moon” and then delete the first “g” to get “the doa rkedat th emo on” which is pretty dramatic and makes little sense. If an embryo has DNA that looks like that then it would most likely die.

Insertions that add an extra three nucleotides to the DNA strand are called three nucleotide insertions appropriately enough. As three nucleotides represent a codon, the units of the cellular alphabet, they can have quite unexpected effects at the molecular level. The DNA code when read correctly causes the cells to produce quite complex molecules and if there is a change in that code then something other than the original and expected will be produced.

For example, the triplet codons for the amino acid isoleucine are AUU, AUC, AUA, and AUG. As the DNA is being transcribed the enzyme involved “knows” that this sequence is the code for it to insert the amino acid isoleucine at this point in the protein it is making. Substitution of the the last nucleotide in the DNA sequence coding for this amino acid would result in no change in the resulting protein because isoleucine would be inserted in the protein chain in each case.

On the other hand, an error which changed the first base of the codon to either an U, C, or G would cause the wrong amino acid to be inserted in place of isoleucine. For example, a substitution of a G for the first A in the codon would result in insertion of the amino acid valine instead of isoleucine.

The insertion of the wrong amino acid in a functional region of a protein may cause the protein to be so severely misshapen that it cannot function–even to the point of causing the death of the organism. For example, swapping an A for a T in one of the codons in the gene for hemoglobin results in the insertion of valine instead of glutamine in the protein molecule causing the disease sickle cell anemia.

Short Tandem Repeats (STR).

But not all parts of a DNA strand are functional in this sense and these parts are sometimes refered to as “junk DNA”. INDELs in this DNA have no noticeable effects on the organism but these mutations are of great value to genealogists for a number of reasons, not the least of which the person, and their descendants, survive and pass the mutation on.

Short tandem repeat (STR) sequences are much shorter (2-10 bps) and may be repeated as many as 100 times at a given location on a chromosome. A short tandem repeat (STR) in DNA occurs when a pattern of two or more nucleotides are repeated and the repeated sequences are directly adjacent to each other. If for example a base sequence CATG is repeated a a number of times on a piece of “junk dna”, expressed as (CTAGCATGCATG) or (CATG)3 in this case, it would be an STR with a value of 3. Being of no life treatening consequence it would be passed on to subsequent generations.

The human genome contains hundreds of thousands of these STRs all evenly distributed on all the chromosomes. It is the occurrence of STRs on the Y chromosome that is of particular interest to genealogists seeking to identify the male line ancestry of a person where they are often referred to as Y-STRs.

The study of these short tandem repeats on the Y chromosome has given rise to the discovery of specific markers that were termed DYSnnnn where DYS stands for DNA Y-chromosome Segment followed by a number. A list of DYS markers commonly used in genealogy can be found on the Familypedia site which also gives the length of the sequence as well as the bases involved.

To date over 100 of these markers have been recognised on the Y chromosome and common genetic testing involves from 10 to 67 of these markers. Measurements are made at each marker to determine the number of short tandem repeats there are at each marker. For example at marker DYS455 the results can show values from 8 through to 12. If the results from two people’s tests show that over 67 markers they have exactly the same results for each marker then it is reasonably assumed that they share a common male ancestor in fairly recent times (2 to 3 hundred years with high probability). Tests on lesser number of markers are less conclusive with the 10 marker test usually producing many hundreds of exact matches. Anyone seriously thinking of using Y chromosome DNA testing for genealogical purposes should be tested at the 67 marker level.

Mechanisms for differences.

If at a particlar marker site a person’s results show that there are 18 repeats then either by way of an insertion or deletion that number can change up or down (respectively). Delete one repeat and the result goes back to 17. If for example it is known that a particular marker mutates on average once every 3 hundred years or so and you match someone on all other markers exactly but differ on this one marker by one then it would be fair to assume you have a common ancestor somewhere in the last three hundred years or so.

Some sites are more prone to mutations than others and so the diffence in scores on a particular marker is some indication of time. The greater the difference in scores on a marker then the further back in time you might expect to find a common ancestor. Also if two people’s scores are different over several markers then it is assumed that a great deal of time has passed since their shared common ancestor walked and talked.

Recombinational Loss of Heterozygosity (RecLOH)

Stripped of all of its jargon this simply means the loss of difference between chromosomes by the process of recombination. One form of recombination occurs when one piece of the DNA breaks off and goes off to join up with another piece of DNA. It is an extemely important process and it in effect is a way an organism can shuffle its genetic material and come up with something different.

One example of specialised recombination is genetic engineering where DNA from different species can be recombined leading to an organism with a very different gene set to its “parents”. We have probably all heard about the cats, monkeys and rabbits that glow in the dark as a result of genetic engineering, some red and some green.

A special type of recombination occurs when two chromosome pairs break at the same point and the pieces that have broken off cross over to the opposite chromosome of the pair. Here we have a reciprocal exchange of genetic material between the two chromosomes.

But what happen with RecLOH is that there is an unreciprocal exchange of genetic material between the chromosome pairs and the genetic code on one chromosome is copied to the other. Suppose that during cell division a piece of DNA breaks off and disappears. The cell sees this and to repair the damage simply copies the corresponding strand that didn’t disappear onto the chromosome that is missing a piece in place of the piece that disappeared. The end result is two chromosome segments that are now identical so we have a loss of difference (or heterozygosity) bought about by recombination.

So how can this happen with the Y chromosome which does not have a “pair” in the conventional sense. Well of course it can’t so what we have is a process that copies a piece of the chromosome to itself and in order to do that we need a palindromic strand – which simply means a strand that is looped back on itself. Such a thing would look like this:

.

RecLOH 1

 

where the blue part is the forward strand,the yellow is the loop to produce the “hairpin and red is the backward strand. Now suppose a deletion occurs in the backward strand as follows:

 

RecLOH 2

 

Now the cell is not likely to leave this alone and by the normal recombination repair processes it could copy the information that is on the forward strand onto the backward strand yielding something like this:

 

 

At the point the deletion was repaired we now have identical DNA strands or a loss of heterozygosity bought about by recombination (RecLOH). What is important for genealogists is what was in the region of the deletion and repair. If there were Y-STR markers in that region which were different before the deletion, now they are not.

For example DYS459a and DYS459b may have values 9 & 10 but if DYS459a is on the forward strand over the deletionpoint (see above diagram) and is then copied to the backward strand where DYS459b is normally found (but in this example was deleted) the result will be that the measure for DYS459b will now be now a 9. But how do we know that was not a simple deletion, reducing the 10 to a 9?

This raises the obvious question about what length of deletions might be considered “normal” as it is quite possible that several markers could be overwritten if the deletion is large enough. I have seen a discussion involving a RecLOH event that included these markers with these values 459 (10-10) 464 (14-14-14-14) and CDY (36-36) which is one heck of a deletion. It also demonstrates how to identify a RecLOH event, duplicate values for a number of markers that are positioned on palindromes.

The Y chromosome has several of these hairpin structures designated P1, P2 and so on to P10 currently but the exact structure is still a matter of debate. One possible structure can be found on the DNA Fingerprint site.

The different palindromes have different recombination frequencies with most events seeming to happen on palindrome 1 (P1) where several markers are situated. RecLOH events seem to happen more frequently than SNP mutations but less frequently than STR mutations which is kind of logical. This gives us a chance to study the time-gap in between SNPs and STRs. But it is early days as yet and any genealogist studying this needs to remember we are talking about measures in the many hundreds and probably thousands of years, well before recorded time.

As we have seen small insertions and deletion of short tandem sequences is not likely to threaten function. Larger deletions, even involving several markers, can be dealt with in the cell by copying the missing part from the part that is still there. Although the end result has lost information it is still viable and cell function has not been disturbed. In all likelihood however much larger deletion may render the Y-chromosome inoperable which means that male can only produce daughters and so the mutation “daughters out”.