DNA Tips & Tools

The Venn Diagram Effect in DNA Clusters

Who is in a cluster? And how are they related to you?  I recently wrote about “Who Is in a Cluster?” and introduced the concept of Mountains and Valleys. I think a better term is the Venn Diagram Effect. This post continues to explain and illustrate that concept.

As I mentioned in my last post, when we cluster our DNA matches using either the Leeds Method or an automated adaptation, we usually expect all of the matches in a cluster to be descended from one couple or individual and we expect all of the matches to be related to each other. But, this is not always the case. And yet, the results when this is not the case are still incredibly helpful!

For now, let’s talk about the original range of DNA matches when doing a Leeds Method chart: 90 to 400 centiMorgans (cM). In this range, most people’s DNA matches will be 2nd and 3rd cousins (including those once removed, twice removed, etc.)

Below is an example of a cluster that has been diagramed which shows that all of the DNA matches are descended from one couple – the test taker’s great, great grandparents – and all of the DNA matches are related to each other.

But sometimes, we get a cluster like the one diagrammed below. In this case, not all of the DNA matches are related to each other. The 3rd cousin (3C) on the far left is not related to the third cousin once removed (3C 1R) and the third cousin twice removed (3C 2R) on the right. Additionally, not all of the matches are descended from one couple. Instead, they descend from two sets of the test taker’s great, great grandparents. These two sets of great, great grandparents are the two sets of parents of the great grandparents shown.

Instead of relationships, the cluster diagram below adds names to those ancestors. This cluster contains 2nd cousins, but it also 3rd cousins from both sides of that family line. In this example, the DNA matches include both descendants of Mike & Sally – the test taker’s 2nd cousins – but also descendants of Mike’s parents and descendants of Sally’s parents. These descendants are the test taker’s 3rd cousins.

Below is a Venn Diagram. Both circles represent 3rd cousins, though one is on the paternal side of the great grandparent couple and the other is on the maternal side of the great grandparent couple. The intersection includes the 2nd cousins. But, it also includes the test taker. This can be valuable for those working with unknown parentage who can now easily identify a likely great grandparent couple!

Lastly, I’m combining a cluster diagram with the Venn Diagram circles. If you diagram a cluster and see this Venn Diagram Effect, hopefully you will now have a better idea of what it represents!

Please let me know if you have any questions!

 

15 thoughts on “The Venn Diagram Effect in DNA Clusters

  • Donde Smith

    Thanks, Dana. This really helps me to visualize results.

    Reply
    • Thanks, Donde! I’m glad you found it helpful.

      Reply
  • Christine

    Great explanation…thanks! Amazing to see how far this has come since you first published!

    Reply
  • Jeanne

    Hi! I am working through this process and have 2 basic questions. First, when we do the first clustering of relatives under 400 cM, how low do we go, and then, when extending, how much further in terms of cMs? Secondly, can the process be used with MyHeritage and FTDNA relatives? FTDNA has the ICW category, by MyHeritage is a little awkward in showing “shared matches.” Thanks for your help!

    Reply
    • Hi, Jeanne. When you do the first clustering, you go down to 90 cM. (The goal is to include your 2nd and 3rd cousins.) And, yes, you can use it with both MyHeritage and FTDNA.

      Reply
  • Ruben

    Hey Dana,

    When you are of unknown parentage, and have 3rd cousins that have the highest of 27 cM in the cluster (with 17 clusters in total), the report is thus ineffective and should be ignored due to extremely faraway common ancestors?

    Also, does the largest (first) cluster indicate the most recent branch of your tree since it makes it possible for more combinations of dna match relationships in it (makes it a longer ancestral line)? I take it maybe likewise the smallest cluster gives fewer options of dna match relationships since your branch for it can be further back in time and matches usually do not pop up so much that far back in time (thus creating shorter ancestral line)?

    Do you think this interpretation is valid?

    Reply
    • Hi, Ruben. First of all, if you are of unknown parentage, you mentioned 3rd cousins who share 27 cM of DNA. How do you know they’re 3rd cousins?If they are truly a 3rd cousin, though, you can use them.

      As far as your other suggestion, often people using the Leeds Method create 4 clusters and they could be any size and they are all the same distance back in time. But DNA matches who share more DNA are more likely to “grab” more matches as they have more DNA they can share with others.

      Hope this helps!
      Dana

      Reply
  • Ruben

    Sorry, Dana, forgot to mention I am getting myheritage cluster and not doing a manual one.

    Does it change the game at all? I was told the company adjusts the threshold themselves according to what they think is best for a kit in question.

    That got me thinking whether the largest cluster could be really close. For example, my first cluster has 14 people and they are clearly of german heritage (nearly all from Germany too or german migrants back into germany). The highest there is 27 cM (she is labeled 3rd-5th cousin). Is it at all possible for this cluster to be one grandparent’s line? It looks like lots of other clusters – Finnish, Turkish, and Lithuanian for example – get split into smaller ones as more people are added to these clusters, whereas the top german one never gets split and only gets bigger and bigger as more people come in. I wonder what you think about this difference of cluster behavior.

    Thank you so much!
    Ruben

    Reply
    • Ruben, Interesting observation on that cluster that keeps getting bigger and bigger! I’d just recommend you keep working with it.

      Dana

      Reply
      • Ruben

        Dana, I am so glad I am not wasting your time and perhaps getting you engaged as well on these interesting issues!

        I will follow the first cluster for sure. Thanks for the tip!

        I have 751 matches in myheritage, of which only 76 matches are used on generated autocluster.

        I have 16 clusters – do you think these clusters cover all of my biological family branches (all 4 grandparents)? I know the 4 groups would be split (since 16 > 4), but wonder if myheritage considers matches for all sides of the family when providing the report?

        I guess my concern is whether there are users who had missing branches in clustering reports. I can imagine a great-great-great grandparent cluster missing (which is very distant), but can an entire grandparent cluster or branch be theoretically missing in the report (or several missing splits into smaller clusters for that missing grandparent)?

        If not, can I assume that these 16 clusters I have are my 16 great-great-grandparents’ branches?

        Thank you!

        Ruben.

        Reply
        • Hi, Ruben. As far as MyHeritage AutoCluster, my understanding is that they choose what they think create your best clusters of those matches who share 400 cM or less. They use about 100 matches, so I’m not sure why yours is only using 76. (Maybe they’ve changed something recently or maybe your matches just worked out differently – I do not know the specifics of what the program does.) Unfortunately, for most of us this means a lot of clusters of only 3 people who are distantly related making clusters that are quite difficult to work with.

          MyHeritage would not know which part of your family the matches come from, so they are not taking that into account. And, the clusters likely cover your 4 grandparents in some way but it really depends on who has tested at that site. And, yes, an entire grandparent line could be missing and there are definitely people who experience this. For example, their ancestors might be from countries where testing is illegal.

          Lastly, I think it is almost impossible that you would have clusters specifically relating to your 16 great-great-grandparent lines. Your best goal is to look at each cluster and figure out how the members are related to each other and then how they’re related to you. (Not an easy task!)

          Dana

          Reply
  • AnonyMouse

    Wow Dana you amaze us yet again, this is so insightful

    Reply
  • Pingback: Who Is in a SuperCluster? - Dana Leeds

Make a Comment