Variant Calling

NGS variant calling is the process of identifying and characterizing genetic variants within a sequenced DNA (or less frequently RNA) sample. 

NGS variant calling is a vast and multifaceted topic within the field of genomics and bioinformatics. It encompasses various approaches depending on the types of genetic variants being analyzed. Here are some common types of NGS variant calling and some of the tools often used for each:

  • SNP Calling (SNVs): GATK, FreeBayes, VarScan...
  • Indel Calling (Indels): GATK, Samtools, VarScan, Pindel...
  • Structural Variant (SV) Calling: Lumpy, Delly, Manta, GRIDSS, Strelka...
  • CNV Calling (CN): CNVkit, CNVnator, Control-FREEC...
  • Somatic Variant Calling: Mutect2, VarScan, Strelka, manta, ASCAT, CNVkit...

The choice of the variant calling approach and tool depends on the specific research goals, the nature of the data (e.g., DNA-seq, RNA-seq, whole-genome, exome), and the desired level of sensitivity and specificity. Often, a combination of tools and strategies is used to improve the accuracy of the variant callset. This is particularly true when it comes to Structural Variants. Additionally, best practices and quality control measures, including filtering to remove low-quality variants, are critical to ensure reliable variant calls in NGS data analysis.

After obtaining the variant call format (VCF) file with the variants, the next crucial step is to annotate the identified variants to gain insights into their potential functional significance and relevance in the context of your study. Usually this is done with tools such as Funcotator, ANNOVAR, or the Variant Effect Predictor (VEP) that provide annotations based on known genomic databases, including information about gene names, coding consequences, allele frequencies in populations, and potential functional impact.

Finally, you can use specialized packages like maftools to generate summary statistics and visualizations of the mutation data. This is often useful to assess the results, e.g., if the mutation load is higehr than expected, you might wnat to go back and tweak the parameters used in mutect2, or do some additional filtering.

NGS variant calling represents a complex and substantial topic. We don't typically advise beginners ​to delve into it, ​as it often demands a significant commitment of time and effort. Nevertheless, if you are determined,​ we have included a non-comprehensive list of resources below to help you with NGS variant calling, particularly in the context of somatic analysis.

TO DO

SNVs

The " GATK Best Practices" is a set of guidelines and recommended workflows developed by the Broad Institute for the analysis of NGS data. These have become a widely adopted standard in the genomics community and it's probably the best place to start. Workflows include best practices for Somatic and Germline SNVs, Somatic CNV, RNAseq variant calling, and others, but we want to specifically recommend the "Somatic SNV best practices", that uses Mutect2, and the "Germline SNV best practices", that employs the GATK HaplotypeCaller

Furthermore, if you want to have a full understanding of all the steps involved in these particular workflows, we recommend that you take a look at the "Variant Analysis with GATK Course" lectures available on YouTube

SOMATIC VARIANT CALLING GERMLINE VARIANT CALLING Youtube lectures

Annotations

Once you have a final list of variants, typically you will use a tool like Funcotator or VEP to annotate them. Since Funcotator is part of the GATK, if you have followed their best practices for somatic variant calling, that might be a good option for the annotations too. Check their website for details on how to use it. 

One additional advantage of using Funcotator is that it allows you to output the annotated variants in MAF format, a tabular text file that can be used with maftools. This is a really useful  R package that provides a collection of functions and visualization tools for the analysis of variants, allowing you to generate, among other things, summary plots, oncoplots, or to compare mutation burden of your cohort against other TCGA studies. Check the maftools website for more information on this package.

CNA

ASCAT is a well-established method for calling copy number alterations in NGS data, particularly in cancer genomics. Check their website to learn how to use it.