How thick is “blood”? Am I really related to my 5th cousin?

Here’s a picture of my great grandparents John Henry Hattersley and Bertha Herrmann at Niagara Falls in 1910:

Photo of two well dressed people posing on rocks in front of Niagara falls

I’m presuming this is a real photograph and not staged with a backdrop or something, but I really don’t know. I think this was taken on Luna Island on the American Side.

I don’t know much of anything about them except that Bertha is the only non-English-origin great-grandparent of mine I’m aware of; I don’t remember my grandfather talking about them.  At one point I really got into tracking my family tree: I even discovered at one point that my lineage can be traced back to Boston Colony (via John Viall through my paternal grandmother’s father Clifford Viall Thomas). But I long ago stopped imagining that I was somehow learning about myself once I got to people beyond living memory.

It’s always bothered me when people say siblings share half of their genes. Similarly, people who can trace their decent to someone famous (Charlemagne, Jefferson) and seem to think this reflects well on their genes or something.  There are a few reasons this can’t be right, a few of which I think about in particular:

  1. We share 98.8% of our DNA with chimpanzees—we must share much more than that with our siblings!  In fact it seems that the “average pairwise diversity” of randomly selected strangers is around 0.1%.
  2. There is some level of discretization with DNA inheritance.  Obviously it can’t be at the base-pair or codon level, or else we wouldn’t be able to reliably inherit entire genes.  If the “chunks” we inherit from each parent are large enough, small number statistics will be push the number significantly from 50%.
  3. Mutations slowly change genes down lineages
  4. Combinations of genes and epigenetic factors have strong effects on traits

Point 2 is not something I really understand yet, except that talking to biologists I think the number of “chunks” that get passed down on each side is ~hundreds, but is also random making the problem quite tricky.  Still, ~hundreds means that we are probably close enough to ~50% inheritance from grandparents on each side (+/- 10% or less) that we can get a rough idea of how related we really are to people on our tree in terms of shared DNA.

So let’s take a closer good at point 1 above:

Let’s assume the amount of identical DNA we get from each ancestor is given by 2-n where n is the number of generations back they are (grandparents: n=2, so 25% inherited DNA, ignoring discrete “chunks” and mutations). This makes sense: except for (many) details like the X/Y chromosomes, mitochondria, and probably a bunch of other things, each ancestor a given number of generations back has an equal chance of having contributed a bit of DNA to you.

Finding the amount of shared inheritance is thus a matter of going back to the first shared ancestor and counting all of the shared ancestors at that level (which will be 1 in the case of for half siblings and 2 for full siblings, except for details coming later).

So cousins share 2/4 grandparents, each of whom had a 2-2 chance of contributing a bit of DNA, so they have 1/8 shared bits of DNA or around 12.5%.

Second cousins (the children of first cousins) share 2/8 grandparents, so the number is 6.25%. Each generation gap gives us a factor of 1/4: a factor of 2 from the extra opportunity on each line to “lose” that bit of DNA, one on each side.

Now we get into the fun of “removed cousins”, which just counts the generation gap between cousins. You don’t usually get big numbers of “removals” among living people because it requires generations to happen much faster along one line than another—big numbers like “1st cousins 10 times removed” are usually only seen when relating people to their distant ancestors.

So my kids are my first cousins’ “first cousins once removed”, and all of their kids would be “second cousins once removed”. The rule is that if you have “c cousins r removed” (so c=2 r=1 means “second cousins once removed”) then you have to go back n=c+r+1 generations from the one and n=c+1 from the other to find the common ancestors.  So removals count the number of opportunities to “lose” a bit of DNA that occur on only one side of the tree.

Putting it all together: the amount of shared DNA we share with a cousin is 2-(2c+r+1) (siblings have c=r=0, subtract one from the exponent if the connection is via half siblings).

But there’s a limit: this only works if none of the other ancestors are related, but in the end we’re all related. If cousins have children, this increases the number of shared ancestors and raises the commonalities. And, of course, mutations work the other way, lowering the amount of identical bits.

So why is this interesting? Because the “we’re all related” thing is true at the 0.1% level in DNA, meaning that if you make c high enough, you’ll get an answer that’s below the baseline for humans. Since log2(0.1%) = -10, we have that if 2c+r+1 >10, the DNA connection is no stronger than we’d expect for random strangers.

This means that if you meet your 4th cousins (i.e. your great-grandparents were cousins) your genealogical relationship is mostly academic and barely based on “blood”!  By 5th cousins, you’re no more related than you are to the random person on the street in terms of common DNA.

Even worse, if we have hundreds of “chunks” we randomly inherit from parents, then it’s even possible (and here I’m a bit less sure of myself) that you share no commonly inherited genetic material with someone as distantly related as a 5th cousin!

Again, this calculation makes a lot of assumptions about genes from different ancestors being uncorrelated, and in particular communities that have been rather insular for a very long time must have at least a bit more kinship with each other than they do with similar communities on different continents.  But from what I’ve gathered this effect isn’t that large: the variance in genetics within a community, even an insular one, is still usually larger than the difference across communities.  That is, the average person from one place is more similar genetically to the average from another place than they are to a random person in their own place.

And also, this doesn’t mean you can’t prove descent from someone more than 10 generations past via DNA—that might indeed be possible by looking at where common bits of DNA are in the chromosomes and similar sorts of correlations (I would guess).

Anyway, the bottom line is that it’s fun to do family trees and learn about our ancestors where we can, but we definitely shouldn’t get too hung up on the idea that we’re learning about the origins of our genes and kinship via biology—even setting aside the fact of old family trees being full of adopted and “illegitimate” children, the actual genetics dilute out so fast it hardly matters past great-grandparents.

 

One thought on “How thick is “blood”? Am I really related to my 5th cousin?

  1. Jason Melancon

    Considering “blood” ties in terms of DNA makes sense because it’s quantifiable, at least in theory. But I suspect the picture remains largely the same when considering other senses of connection, such as the transmission of culture and other soft-science relationships that are harder to measure. There is a background level of influence that non-relatives have on our psyche, which is usually exceeded by our immediate family. I would expect that increasing distance (c) dilutes these similarities too, down to the background level.

Leave a Reply

Your email address will not be published. Required fields are marked *