Data Analysis Codelab

The properly rendered version of this document can be found at Read The Docs.

If you are reading this on github, you should instead click here.

There are a collection of analyses upon variants documented in codelab Data Analysis using Google Genomics.

In this codelab, you will use Google Genomics, Google BigQuery, Apache Spark, and R to explore the 1,000 Genomes dataset. Specifically, you will:

  • run a principal component analysis (either from scratch or using pre-computed results)
  • use BigQuery to explore population variation
  • zoom in to specific genome regions, including using the Genomics API to look all the way down to raw reads
  • run a GWAS over the variants within BRCA1
  • visualize and annotate results using various R packages, including Bioconductor

To make use of this upon your own data:

  1. First, load your data into Google Genomics and export your variants to BigQuery. See Load Data into Google Genomics for more detail as to how to do this.
  2. Update the BigQuery table name, variant set id, and read group set in the example to match those of your data.

Have feedback or corrections? All improvements to these docs are welcome! You can click on the “Edit on GitHub” link at the top right corner of this page or file an issue.

Need more help? Please see https://cloud.google.com/genomics/support.