Understanding your genetic raw data

Let’s get into some of the details.

We’ve talked about genes coding for proteins. And that proteins can act as enzymes, signaling molecules, and structural pieces for the cell.  

I’ve explained a few ways you can use your genetic data to optimize your diet or prevent chronic conditions.

Now it is time to get to the nitty gritty.

What you are actually looking at with the data from 23andMe or AncestryDNA? 

First things first, everyone should download their genetic raw data file and store it safely. This is your data…  you own it and you should use it.  (Need help? Download directions for 23andMe or AncestryDNA).

When you download the genetic raw data file, it is a ZIP file. Double click to unzip it, and move the .txt file to a safe location on your hard drive.

If you open the file using a text editor, it will look like this:

The first thing you will probably notice is that the file has a lot of data in it — about 650,000 rows. 

While this may seem large, it is actually less than 1% of your genome. 

What do the columns mean? 

The rsid column gives a unique identifier to each genetic variant listed. The numbers that start with ‘rs’ are a unique id for each genetic variant. This rs id is used worldwide to identify that particular change in the genome. (The ‘rs’ stands for reference SNP.)

Sometimes in 23andMe data you will see a number that starts with an “i” instead of “rs”.  These are an internal numbering system just for 23andMe. 

The chromosome column let’s you know which chromosome the SNP is located on. (recap: 23 pairs of chromosomes, numbered 1-22 and then X and Y)

The position is where on the chromosome the SNP is located.  Researchers start the numbering from one end of the chromosome and mark the positions all along to the other end. 

Finally, you see the “genotype” column. This will show you which nucleotide bases you have on each chromosome. For the first line of this data, on chromosome 1 at position 734462, I inherited an A from Mom and an A from Dad.  

If you are looking at AncestryDNA data, it will look very similar, but you’ll notice that the genotype is divided up into two separate columns – allele 1 and allele2. Together these two letters make up your genotype. 

Example time: When you look through the 650,000 rows of rs ids and genotypes, you’ll notice that mixed in amongst the A, C, G, and Ts there will sometimes be a “D” or an “I”. D stands for deletion. This means for that particular position on a chromosome, you are missing a nucleotide base (or multiple nucleotide bases). I stands for insertion. Similarly, at some spots in the genome, people have extra nucleotide bases inserted. It isn’t just your data that has the extras or is missing a chunk — at that spot in the genome, everyone will see a DD or a DI or an II.

So what is a chromosome, anyway? 

You’ll notice that your genetic data is listed by chromosome number. A chromosome is just the packaged up DNA in the nucleus of the cell.  While we often picture chromosomes as separate and neat (below), that image is only true for when the cell is getting ready to divide. At other times, the chromosomes are more loosely jumbled together in the nucleus. 

Your genetic data will list chromosomes 1 – 22 (you have two copies, one from mom, one from dad) and either an XX (female) or an X and a Y (male). In AncestryDNA data, the X chromosome information is labeled as chromosome 23 and the Y chromosome info is labeled 24.

Mitochondrial DNA

You’ll also find at the end of your genetic raw data file some SNPs from your mitochondrial DNA. Cells contain organelles called mitochondria, which produce ATP for energy. The mitochondria have their own unique DNA. Mitochondria are inherited almost entirely from mom, so the mitochondrial DNA is often used in genealogy for determining maternal ancestry.

Orientation:

Your DNA is a double stranded helix. This means that for each position on a chromosome, you have a nucleotide on one side that is bound to a nucleotide on the other side of the DNA strand.  

Researchers define which side of the DNA they are looking at by direction – either called forward and reverse or plus and minus strand.

This needs a picture…  

Why is orientation important?

Everything in your genetic raw data file is given in the forward or plus orientation. 

But.. not every SNP is defined by researchers on the plus strand. Some research studies refer to SNPs on the minus strand. 

When reading through research on SNPs defined on the minus strand, you may need to mentally convert between the strands to match with your genetic data. 

Just remember that…

A (adenine) always binds to T (thymine).  Likewise, G (guanine) always binds to C (cytosine). 

A = T       and        C = G

For Genetic Lifehacks articles, everything is going to be given on the plus strand to match with your genetic data. 

What’s up with the different versions numbers?

One of the most frequent questions that I get from readers is ‘why isn’t rs __ in my genetic data’? 

Your genetic data file from 23andMe or AncestryDNA is only covering a tiny part of your genome. Less than 1%.

Periodically, the companies change which SNPs that they include in their sequencing of your genetic data. 

Here’s a quick breakdown of versions:

  • 23andMe v5:   mid-2017 –> now
  • 23andMe v4:   late-2013 –> mid-2017
  • 23andMe v3:    late 2010 –> 2013  
  • AncestryDNA v2 from 2016 –> now
  • AncestryDNA v1 is prior to early 2016

You’ll notice that all Genetic Lifehacks articles will tell which version of data the rs id is found in…   

Wrapping this up!

Thanks so much for joining in on the Genetic Lifehacks “Getting Started with Genetics” course! 

My hope is that the background information in this email course has helped you understand a little more about your genetic data and how you can use it to optimize your health.