scRNA-seq analysis

기존 bulk방식의 RNA-seq 데이터의 수집 및 분석 방식과 single cell RNA-seq 방식의 차이는 다음과 같다.

Genomics: The single life, https://doi.org/10.1038/491027a

Single-cell RNA-seq reveals cellular heterogeneity that is masked by bulk RNA-seq methods, https://www.10xgenomics.com/blog/single-cell-rna-seq-an-introductory-overview-and-tools-for-getting-started

(A) While bulk gene expression assays provide an average read-out of transcription over many cells, single-cell RNA-seq allows the assaying of gene expression in individual cells. (B) Single-cell approaches facilitate working with complex systems such as embryos, where groups of cells with radically different expression profiles can be analysed without contamination from neighbouring tissues. https://www.embopress.org/doi/full/10.15252/msb.20178046

Process for barcoding single-cell data

https://doi.org/10.1038/s41581-018-0021-7

https://www.nature.com/articles/s12276-018-0071-8

Remove Doublet

https://www.sciencedirect.com/science/article/pii/S2405471219300730

https://doi.org/10.1016/j.cels.2018.11.005

Gene cell-type annotation

- CanserSEA : http://biocc.hrbmu.edu.cn/CancerSEA/home.jsp

Software

- SCSA (python) : https://github.com/bioinfo-ibms-pumc/SCSA

- scMatch (python) : https://github.com/asrhou/scMatch

uses FANTOM5, https://fantom.gsc.riken.jp/5/

데이터를 분석하기 위해서는 데이터의 구조부터 이해해야 한다.

Data structure

{ Cell barcode | UMI (Unique Molecular Identifiers) | cDNA }

http://data-science-sequencing.github.io/Win2018/lectures/lecture16/ (left), https://medium.com/biosupermarket/%EA%B0%99%EC%9D%B4%EC%8B%A4%EC%8A%B5-single-cell-rna-sequencing-1-technologies-1-fa42e23b336e (right)

Quality control

Normalization

Sctransform ( UMI count )

: Regularized negative binomical regression, no longer influenced by technical characteristics

Start with GLM,

$$ \log(E(x_i)) = \beta_0 + \beta_1 \log_{10} m $$

where $ x_i $ is the vector of UMI counts assigned to gene $ i $ and $ m $ is the vector of molecules assigned to the cells, i.e. $ m_j = \sum_i x_{ij} $.

Then use NB parameter with mean $ \mu $ and variance given as $ \mu + \frac{\mu^2}{\theta} $.

Pearson residuals:

$ r_{ij} = \frac{x_{ij} - \mu_{ij}}{\sigma_{ij}} $

$ \mu_{ij} = \exp(\beta_{0i} + \beta_{1i} log_{10} m_j) $\

$ \sigma_{ij} = \sqrt{\mu_{ij} + \frac{\mu_{ij}^2}{\theta_i} } $

Regularized NB regression model captures and removes variance driven by technical differences, while retaining biologically relevant signal.

- Tools : scran, SCnorm, sctransform, bayNorm

Batch effect correction

Batch effect란 수집된 scRNA-seq 데이터가 다른 사이트나 시간 또는 경험이 다른 사람들에 의해서 수집되었을 경우 발생할 수 있는 non-biological factor를 말한다.

- Tools : ComBat, mnnCorrect, Seurat (Canonical correlation analysis)

Imputation and smoothing

scRNA-seq 데이터에는 많은 0을 가지고 있다.

- Tools : scImpute, DrImpute, SAVER, MAGIC, scVI, SAVER-X, netNMF-sc

Cell cycle assignment

분석 결과가 single cell 의 주기에 영향을 받는 연구나, cell cycle에 관련된 연구를 진행할 경우 assign을 해 주어야 한다.

- Tools : cyclone, Seurat

Feature selection

목적에 맞는 영향력있는 유전자를 선별하기 위해 필요한 과정이다.

- Tools : GiniClust

Dimensionality reduction and visualization

- Tools : PCA, UMAP, t-SNE

Unsupervised clustering

- Tools : k-means algorithm, Phenograph, Louvain algorithm

Pseudotime

Clustering을 했더라도 chemical concentration 또는 time couses 와 같은 pseudotime에 의해 cell 이 어떻게 분화하게 되었는지 trajectory를 보여주는 과정이다.

(A) By observing similarities between the expression profiles of cells, it is possible to order cells along an axis of pseudotime that recapitulates developmental processes. (B) Having established this ordering, genes that show significant changes in expression along the developmental pathway may be identified. https://www.embopress.org/doi/full/10.15252/msb.20178046

- Tools : Monocle, DPT, TSCAN, Mpath, RNAvelocity, scVelo

Differential expression

- Tools : non-parametric Wilcoxon test, MAST (Gaussian hurdle model), MetaCell (bootstrapping)

Reference

- Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data, https://www.nature.com/articles/s41596-020-00409-w

- The triumphs and limitations of computational methods for scRNA-seq, https://www.nature.com/articles/s41592-021-01171-x

- Data Science for High-Throughput Sequencing, http://data-science-sequencing.github.io/

- https://www.10xgenomics.com/blog/single-cell-rna-seq-an-introductory-overview-and-tools-for-getting-started

- https://github.com/seandavi/awesome-single-cell

- https://youtu.be/qgasqiiEA1g

저작자표시 (새창열림)

'Study' 카테고리의 다른 글

Signature matrix (0)	2021.12.29
RNA velocity (0)	2021.11.02
ICGC database (0)	2021.06.24
Nanopore (0)	2021.06.22
Strand-ambiguous SNPs (0)	2021.05.10

Analytic reasoning

scRNA-seq analysis

'Study' 카테고리의 다른 글

댓글

티스토리툴바

scRNA-seq analysis

'Study' 카테고리의 다른 글

관련글

댓글

티스토리툴바