Analysis of Next Generation Sequencing Data

Firstly, we take an introduction to DNA sequencing, particularly to NGS, and its typical applications in bioinformatics and biology. During this course students will learn new algorithms that are used in modern software for analyzing NGS data, and which are not included in “Algorithms in Bioinformatics” course. Along with more algorithmic-like lectures students will learn how to work with different bioinformatics tools and will write scripts of different complexity for data analysis.

  • Introduction to NGS.
    Introduction to Next Generation Sequencing from bioinformatics point of view; brief history of DNA sequencing; data formats and modern sequencing platforms; simple scripts for analyzing NGS data; technical details -- working on Linux server through SSH, python, scp.
  • Short read alignment; Bowtie.
    Burrows-Wheeler transform and its application to short read alignment problem; analyzing Illumina reads using Bowtie aligner.
  • Analyzing longer reads; BWA-SW.
    Aligning longer reads with both indels and mismatches to a reference genome; analyzing Roche 454 reads using BWA-SW aligner.
  • Correcting sequencing errors; Quake.
    Typical Illumina sequencing errors; coverage based and quality aware error correction.
  • De novo genome assembly; Velvet assembler.
    De Bruijn graph approach for whole genome short read de novo assembly; removing typical errors; repeat resolution.
  • Assessment of genome assemblies.
    Contig alignment; assessing quality of genome assemblies; detecting contaminants; NCBI archive.
  • Different sequencing applications.
    Metagenomics; single-cell sequencing; RNA-Seq; de novo transcriptome assembly; reference-based transcriptome assembly; alternative splicing and gene expression.
  • Lecturer:Andrey Prjibelski