Automated Clustering

Understanding Cluster Matrices

When using an automated clustering tool such as Genetic Affairs’ AutoCluster or DNAGedcom’s Collins Leeds Method, the output is in the form of a matrix. Here are some screenshots to help you better understand these clusters in this matrix format.

The Names

In the matrix, the names are listed in the same order from left to right (along the top) as they are from top to bottom (along the left side). Each cell represents the intersection of two people.

The Color Clusters

Like the Leeds Method, color clusters are created showing the people who have a close connection to each other. In this example, we have three clusters: orange, green, and red.

Colored Cells

When two people show up as Shared Matches or In Common With, the cell at the intersection becomes a colored cell. In this example, the cell would become orange, green, or red. For example, Abraham and Betsey A show up as Shared Matches. Kevin Day and Alfred show up as Shared Matches. And Ryne Q and Eileen V show up as Shared Matches.

Dark-Colored Cells

The dark-colored cells form a diagonal line from the upper left to the lower right. This is the midline and these form in cells where a person intersects with themself. The portion of the matrix above and to the right of the midline is a mirror image of the portion and below and to the left of the midline. It might be easiest to look at the gray cells in the upper-right and lower-left corner to see these mirror images.

“White” (or Very Light Gray) Cells

The “white” (or very light gray) cells indicate intersections where two people are NOT Shared Matches. In this example, Abraham and Reese W do NOT show up as Shared Matches. The same is true of George Z and Betsey A and of Virginia R and Alfred. You will have a lot of these white cells outside of the clusters, but you also can have white cells within clusters as in the example of George Z and Betsey A.

Gray Cells

Cluster Matrix

Gray cells are incredibly important. These appear at intersections where one person was clustered into one color, but the other person was clustered into another color. And yet the two people show up as Shared Matches. In this example, Abraham sorted into the orange cluster, but Eileen V sorted into the green cluster. However, they show up as Shared Matches to each other and are likely related.

Significant Gray Cells Between Clusters

When you have significant gray cells between clusters, that represents a probable connection between those clusters. In this example, there are a lot of gray cells between the brown, pink, and dark gray clusters. People who sorted into the brown cluster, for instance, had Shared Matches with people who sorted into the pink and dark gray clusters. This shows a likely connection between all three of these clusters. These three clusters are probably from the same section of your family tree.

Questions?

I appreciate a reader asking me to explain these cluster matrices, and I hope this helps many of you. If you have any questions about the matrix format or anything to do with clustering, please ask!

26 thoughts on “Understanding Cluster Matrices

  • Thank you so much for this, it explains it so clearly and makes other things so much more understandable.

    Reply
  • Sue Muspratt

    Great explanation thank you

    Reply
  • Jean Baird

    Just tried genetic affairs. Would matches be in the graph not be related to folks? Is there any way this method could help us find an unnamed parent?

    Reply
    • Hi, Jean. I’m not sure what you mean about matches “in the graph not be related to folks.” Everyone in the graph is a DNA match to you so they should be related to you and connected to the other people in whichever cluster they are in. And, yes! This can definitely help people find an unnamed parent and many people are using it to do just that!

      Reply
  • Carol Hehemann

    This is the first explination that makes sence. Just hope I will be able to read mine.

    Reply
    • I’m glad it makes sense, and best wishes, Carol!

      Reply
  • Patricia Peoples

    Thank you so much! This helps tremendously. Just one question. Not exactly sure what you mean by “the portions of the matrix above and below this midline are mirror images”?

    Reply
    • Great question and I’ll try to clarify (and update my post). The dark-colored cells run from the upper left to the lower right of the matrix and can be called the midline. If you look at the cells “above” – meaning above and to the right – of this midline, they form the same pattern as those “below” – meaning below and to the left – of this midline. In other words, you could fold this image in half along that line and the cells that are filled in would be matches up cell by cell with each color. It might be easiest to look at the gray cells and see that the ones in the upper right-hand corner are a mirror image of the ones that are in the lower left-hand corner. You could actually work with only 1/2 of this chart – that above and to right of the midline or below and to the left of the midline – and have all of the information. But, seeing the clusters as “squares” instead of triangles is probably helpful to most. This was a difficult thing to explain in more detail, so please let me know if you still have any questions and I will reword my reply.

      Reply
  • Pingback: Friday's Family History Finds | Empty Branches on the Family Tree

  • June Chan

    Wonderful explanation. Have been totally lost regarding the grey cells until reading this. Thank you

    Reply
  • Thank you Dana, for the best explanation I’ve read, about grey cells. I knew the basics of what they mean, but now it has really sunk in. I believe that the answers to two of my longtime and frustrating brick walls will tumble once I figure all this out.

    Reply
    • Thanks, Diane. I’m glad it helped! 🙂 And, please let me know if that brick wall crumbles!

      Reply
  • Wendy Schultz

    This is so interesting, Dana – thank you so much! I’m curious about why some of my closer DNA matches (approx 150 cM) don’t show up in my clusters.

    Reply
    • Hi, Wendy. Those closer matches won’t show up if they don’t match enough people in any specific cluster. I’m not the programmer and don’t know the specifics, but that’s the general reason. If you change your cM range, they might show up in a cluster somewhere. Hope this helps!

      Reply
  • Malcolm Bruce

    Hi Dana,

    Great explanation. I have four adjacent clusters that I believe belong to the same family. The gray cells also suggest this. Is there any significance to having a cluster split into groups like that instead of appearing in a single cluster?

    Thanks

    Reply
    • There can be! But not necessarily. The clusters are based, of course, on who shares DNA with whom. So, if you have 4 related clusters, they might show some significance. For example, one cluster might be descendants of one child while the other cluster is descendants of another child or several children. But, it’s usually not quite so clearly separated. If you have enough members in the cluster and enough of them with trees, hopefully you can learn more about why those particular clusters formed.

      Reply
  • Travis Riley

    Yes indeed, a great explanation. Thanks. Any suggestions for more in-depth reading? Or, is it just that simple?

    Reply
    • Hi, Travis. I think it’s really just that simple. 🙂 But, if you have any other questions, please feel free to ask! Best wishes!

      Reply
  • Wendy Miles

    Hi Dana, I do appreceaite these samples of yours, however when I downloaded my MH clusters, the coloured graphs soon disappear & leave only numbers,[the ancestry lot had same but kept the colours also.]
    Can you explain what those numbers actually represent please? Is it to do with the closeness of the match to me?
    Wendy

    Reply
    • Hi, Wendy. I think you might be experiencing a glitch – it’s certainly not something I’ve seen before. Can you email me a screenshot? I’d be interested in seeing it. My email is leeds_dana@yahoo.com

      Reply
  • Wendy Miles

    Thank you Dana, screen shots on the way. I hope I’m using the same email address I gave yesterday, Computer hickups causing a few worries.
    Wendy Miles

    Reply
  • Wendy Miles

    Thank you Dana, screen shots on the way. I hope I’m using the same email address I gave yesterday, Computer hickups causing a few worries.
    Wendy Miles

    Reply
    • Hi, Wendy. I didn’t realize you had used the Shared Clustering tool. Yes, those clusters have numbers on them with cells being either empty or having a 0, 1, or 2. There are also decimal places, but we cannot see those. The darker the red indicates a higher number. I think he has more information in his “white pages,” although I cannot find it right now.

      Reply
  • Sabrina Torrez

    I am still lost on how you can figure out a missing parent.

    Reply
    • Hi, Sabrina. If you haven’t read about diagramming clusters – like here https://www.danaleeds.com/visualizing_clusters_2nd_3rd_cousins/ – you might try that. Each cluster represents a part of your family. If you can figure out how the people in the cluster are related to each other, you can hopefully figure out how you (or the test taker) are related to those matches.

      Best wishes!
      Dana

      Reply

Make a Comment