Bioconductor Annotation

The properly rendered version of this document can be found at Read The Docs.

If you are reading this on github, you should instead click here.

Bioconductor provides a convenient way to annotate small regions of the genome.

require(GoogleGenomics)
require(VariantAnnotation)
require(BSgenome.Hsapiens.UCSC.hg19)
require(TxDb.Hsapiens.UCSC.hg19.knownGene)

GoogleGenomics::authenticate("/PATH/TO/YOUR/client_secrets.json")

variants <- getVariants(datasetId="10473108253681171589", chromosome="17", start=41196311, end=41277499)
granges <- variantsToGRanges(variants)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
codingVariants <- locateVariants(granges, txdb, CodingVariants())
codingVariants

coding <- predictCoding(rep(granges, elementLengths(granges$ALT)),
                        txdb,
                        seqSource=Hsapiens,
                        varAllele=unlist(granges$ALT, use.names=FALSE))
coding

A more extensive example of variant annotation with Bioconductor is documented towards the end of codelab Data Analysis using Google Genomics.

To make use of this upon your own data:

  1. First, load your data into Google Genomics. See Load Data into Google Genomics for more detail as to how to do this.
  2. If you do not have them already, install the necessary Bioconductor packages. See Using Bioconductor for more detail as to how to do this.
  3. Update the parameters to the getVariants call the example above to match that of your data and desired genomic region to annotate.

Have feedback or corrections? All improvements to these docs are welcome! You can click on the “Edit on GitHub” link at the top right corner of this page or file an issue.

Need more help? Please see https://cloud.google.com/genomics/support.