Bioinformatics
About this page
I first entered the world of bioinformatics in August of 2020, with zero background in computational work or statistics. The learning curve to get started in computational biology is steep and resources are scattered and not always well documented. Since I got started, I have been involved in over 10 projects at various stages of maturity, including published works, and written new methods to analyse data types I deal with routinely.
I still consider myself to be a very junior bioinformatician and am learning more every day. However, as I go through this process, I thought it might be useful to compile a list of resources that I have found most useful thus far.
Here, you will find a list of documents, packages, softwares and tutorials that I have used and/or routinely use (links included). I have included these resources here as I have found them to be well documented and their outputs biologically useful and meaningful. Importantly, there are thousands of resources available that I have yet to explore and this list will continue to grow. But hopefully, this will be useful, from one junior bioinformatician to another! Please feel free to get in touch with suggestions / clarifications!
Getting started with R / Python
Udemy R/Python A-Z for Data Science by Kirill Eremenko
- Much more affordable compared to equivalent courses available
- Short and engaging
- Provides the bare minimum to get started with R / Python syntax
Managing Conda and Jupyter: Quick tutorial here, documentation for Conda environments here.
Bulk RNA sequencing
Differential expression analysis
DESeq2
- Staple for DE analysis in RNAseq
- My personal 'go-to' for bulkRNAseq analysis
EdgeR
- Alternative method for DE analysis, uses TMM normalisation
Limma
- Useful for microarray DE analysis, uses quantile normalisation
For detailed read on differences in these methods, read Dillies et al., Briefings in Bioinformatics, 2013.
Enrichment analysis
Gene set enrichment analysis (GSEA)
- Important to understand principles, read original paper here.
- GSEA website: software and curated molecular signatures.
- Quick implementation (in R) with fgsea and msigdbr.
Single-sample GSEA
- GSEA without contrasts, eg. if you want to analyse each sample as its own independent variable.
- Read more here.
- Web tool to implement ssGSEA available on GenePattern.
Gene ontology (GO)-based hypergeometric test
- topGO
- Note that this type of functional enrichment analysis is problematic when implemented wrongly. Read more here.
Single-cell RNA sequencing
Analysis
Scanpy
- Basic toolkit for analysing scRNAseq data in Python.
Seurat
- Basic toolkit for analysing scRNAseq data in R.
Integration
scVI
- Probablistic models for scRNAseq analysis
- Also very useful for reference mapping and label transfer
Others
- BBKNN
- Scanorama
- Harmony
Which tool should I use? - Depends on the data and computational setup available!
Read Luecken et al. Nature Methods (2022) for benchmarking of different scRNAseq integration methods.
Useful tools
Trajectory analysis
- Palantir
- Slingshot
Cell-cell interaction analysis
- CellphoneDB
Curated data base for published single-cell transcriptomic datasets
Svensson, Beltrame and Pachter, Database 2020. https://doi.org/10.1093/database/baaa073
Interfacing Single-cell and bulk sequencing
Pseudobulk scRNAseq analysis
- Increases robustness and reduces false-discovery rate in single-cell differential expression.
- Read more about DE testing in scRNAseq here.
- Method by Marioni lab: pseudoBulkDGE.
- My personal method: CLpseudobulk.
Deconvolution
- CIBERSORTx: very common and well accepted deconvolution tool
- MuSiC: One of my favourite tools for direct single-cell directed bulk deconvolution
- DWLS: Another useful tool for single-cell directed bulk deconvolution.
Benchmarking deconvolution pipelines: Cobos et al. Nature Communications, 2020.
Spatial transcriptomics
Analysis
- Scanpy and Seurat built-in functions (see scRNAseq) sufficient for most analyses.
- Squidpy provides additional tools and is built on Scanpy.
Deconvolution and Spatial mapping of cell types
- Cell2location
- Tangram
Data visualisation
ggplot2 in R - a staple! The R course on Udemy gives a good crash course on ggplot (see above).
ktplots: Useful for visualising single-cell RNAseq data and cell-cell interactions.
dittoSeq: Visualising single-cell and bulkRNAseq, with colour-blind accesibility considerations.
ggpubr: ggplot2 theme for publication-ready plots.
EnhancedVolcano: Publication-ready volcano plots.
A twitter thread on useful considerations when visualising data here.