Automated Clustering

Clustering Without Downloading Data Using Shared Clustering

Shared Clustering is an automated tool created by Jonathan Brecher that clusters your DNA matches. Like the Leeds Method and other automated tools, it creates clusters showing how your DNA matches are related to each other. These clusters help you visualize genetic relationships. Best of all, this is a free tool that can be used with any testing site!

Jonathan just released an update that allows you to create clustering without downloading your data. Instead, you manually enter the information into a spreadsheet. It’s more time consuming, but it works well!

Before you get started, you’ll need to download the Shared Clustering tool here. (You can also find Jonathan’s instructions for “clustering without downloading DNA” here. Also, please note that this is available for MS Windows only.)

Entering Your Data

You enter your data pretty much the same way you start a Leeds Method chart. Here are the basic steps according to Jonathan:

    1. Open a new file in Excel.
    2. Label the first two columns with these exact labels: “Name” and “Shared Centimorgans.”
    3. Fill in the names and the number of shared centimorgans in the first two columns.
    4. Enter the names in the same order along the top.
    5. Enter the data for “shared matches” by marking each shared match with a “X.”
    6. Save your spreadsheet as a .xlsx file.
Example of data entered for Shared Clustering. In this example, Abe had the following shared matches: Gary, Don, Paul, Deb, Frances, Dan, Jan, etc.

Running Your Report

    1. Open the Shared Clustering application on your computer.
    2. Go to the “cluster” tab.
    3. In the “Saved data file” field, select the .xlsx file you just created. (The “Cluster output file” will automatically be filled in.)
    4. Click the appropriate “Cluster completeness” option based on the cM you used.
    5. Click on “Process Saved Data” and your new heat map will open automatically!
Shared Clustering Heat Map
Shared Clustering Heat Map – Notes Added

When I first looked at this chart, I saw two clusters – the small one on the upper left with 5 people, and the larger one on the bottom right with 15 people. And, even seeing 2 clusters is helpful! But, there are actually 4 clusters. And, the clusters overlap.

(Note: Paul, the great-nephew, is showing as a match to all of the matches on both the mom’s and dad’s side except for Britney. This makes sense! But, he just doesn’t happen to share any DNA with Britney.)

Seeing the Clusters

These heat maps actually give us more detail than other tools, but it’s a little harder to see at first. It helps to run your first report using primarily matches that you’ve already identified. This will help you learn how to “see” and identify the clusters.

Shared Clustering with Outlines
Shared Clustering with outlines showing the “clusters”

In the above example, you can see the clusters actually overlap on this heat map. For example, some matches are on mom’s dad’s side, some are on mom’s mom’s side, and some are on both.

Give It a try!

Give it a try and let me know what you think! Shared Clustering also has a Facebook group you can join: “Shared Clustering User Group.”

10 thoughts on “Clustering Without Downloading Data Using Shared Clustering

  • Susan Howard

    For MS Windows only, not for Mac O/S

    Reply
    • Sorry! I use Windows and didn’t realize that.

      Reply
  • Catherine

    Hi!
    Just double checking… In the entering data list you say to put a ‘1’ in box between matches… In the image it has an ‘x’…. ?
    Kind regards,
    Catherine.

    Reply
    • Oops! Yes, you put an “X” in the box. I’ll fix that. Thanks!

      Reply
      • Jonathan Brecher

        Actually, either will work, by design. 🙂
        (almost any non-blank value will also work fine.)

        Reply
  • Thank you for this tutorial! Using this method, could you include matches from multiple sites in the same spreadsheet? For example, could I use the same spreadsheet for Ancestry matches AND FTDNA matches? Or do the different sites use different calculations that make it difficult to combine into 1 sheet?

    Reply
    • Interesting idea. I think it could work to a limited degree. It would work best if you had some matches who you knew were on the two testing sites. For example, let’s say Bob matches Adam and Carl at Ancestry. Bob and Adam have also tested at FTDNA. So, as you’re setting up the spreadsheet, you would show that Bob matches both Adam and Carl as well as anyone Bob matches on Ancestry and anyone Bob matches on FTDNA. That could be helpful! But, let’s say Bob doesn’t have any matches that show up on both sites. Well, Bob is still “in common” so those clusters might cluster next to each other. I’m kind of thinking “aloud” but am going to have to try this. 🙂

      Reply
    • Hi, Lisa. I’ve thought about this before, but not specifically using this system where you’re entering your own data. It might work if you could identify enough matches that were the same on more than one site. I will be looking into this when I get some time, but let me know if you give it a try!

      Reply
  • Will this also work with Google Sheets or does it absolutely have to be Excel?
    Thank you!

    Reply
    • It looks like it needs to be Excel since he said it needs to be an .xlsx file. But, you can reach out to the programmer, Jonathan Brecher: https://bit.ly/2YTNDLu

      Reply

Make a Comment