본문 바로가기
Study

scRNA-seq analysis

by wycho 2021. 8. 19.

기존 bulk방식의 RNA-seq 데이터의 수집 및 분석 방식과 single cell RNA-seq 방식의 차이는 다음과 같다.

Genomics: The single life, https://doi.org/10.1038/491027a
Single-cell RNA-seq reveals cellular heterogeneity that is masked by bulk RNA-seq methods, https://www.10xgenomics.com/blog/single-cell-rna-seq-an-introductory-overview-and-tools-for-getting-started
(A) While bulk gene expression assays provide an average read-out of transcription over many cells, single-cell RNA-seq allows the assaying of gene expression in individual cells. (B) Single-cell approaches facilitate working with complex systems such as embryos, where groups of cells with radically different expression profiles can be analysed without contamination from neighbouring tissues. https://www.embopress.org/doi/full/10.15252/msb.20178046

 

Process for barcoding single-cell data

더보기
https://doi.org/10.1038/s41581-018-0021-7

 

https://doi.org/10.1038/nprot.2016.154

 

https://www.nature.com/articles/s12276-018-0071-8

 

Remove Doublet

더보기
https://www.sciencedirect.com/science/article/pii/S2405471219300730
https://doi.org/10.1016/j.cels.2018.11.005

 

Gene cell-type annotation

 

 

데이터를 분석하기 위해서는 데이터의 구조부터 이해해야 한다.

 

Data structure

{ Cell barcode | UMI (Unique Molecular Identifiers) | cDNA }

http://data-science-sequencing.github.io/Win2018/lectures/lecture16/ (left), https://medium.com/biosupermarket/%EA%B0%99%EC%9D%B4%EC%8B%A4%EC%8A%B5-single-cell-rna-sequencing-1-technologies-1-fa42e23b336e (right)

 

Quality control

 

Normalization

더보기

Sctransform ( UMI count )

: Regularized negative binomical regression, no longer influenced by technical characteristics

Start with GLM,

$$ \log(E(x_i)) = \beta_0 + \beta_1 \log_{10} m $$

where \( x_i \) is the vector of UMI counts assigned to gene \( i \) and \( m \) is the vector of molecules assigned to the cells, i.e. \( m_j = \sum_i x_{ij} \).

 

Then use NB parameter with mean \( \mu \) and variance given as \( \mu + \frac{\mu^2}{\theta} \).

 

Pearson residuals:

\( r_{ij} = \frac{x_{ij} - \mu_{ij}}{\sigma_{ij}} \)

\( \mu_{ij} = \exp(\beta_{0i} + \beta_{1i} log_{10} m_j) \)\

\( \sigma_{ij} = \sqrt{\mu_{ij} + \frac{\mu_{ij}^2}{\theta_i} } \)

 

Regularized NB regression model captures and removes variance driven by technical differences, while retaining biologically relevant signal.

 

 

- Tools : scran, SCnorm, sctransform, bayNorm

 

Batch effect correction

더보기

Batch effect란 수집된 scRNA-seq 데이터가 다른 사이트나 시간 또는 경험이 다른 사람들에 의해서 수집되었을 경우 발생할 수 있는 non-biological factor를 말한다.

 

- Tools : ComBat, mnnCorrect, Seurat (Canonical correlation analysis)

 

Imputation and smoothing

더보기

scRNA-seq 데이터에는 많은 0을 가지고 있다. 

 

- Tools : scImpute, DrImpute, SAVER, MAGIC, scVI, SAVER-X, netNMF-sc

Cell cycle assignment

더보기

분석 결과가 single cell 의 주기에 영향을 받는 연구나, cell cycle에 관련된 연구를 진행할 경우 assign을 해 주어야 한다.

 

- Tools : cyclone, Seurat

Feature selection

더보기

목적에 맞는 영향력있는 유전자를 선별하기 위해 필요한 과정이다.

 

- Tools : GiniClust

Dimensionality reduction and visualization

더보기

 

 

- Tools : PCA, UMAP, t-SNE

Unsupervised clustering

더보기

 

 

- Tools : k-means algorithm, Phenograph, Louvain algorithm

Pseudotime

더보기

Clustering을 했더라도 chemical concentration 또는 time couses 와 같은 pseudotime에 의해 cell 이 어떻게 분화하게 되었는지 trajectory를 보여주는 과정이다.

 

(A) By observing similarities between the expression profiles of cells, it is possible to order cells along an axis of pseudotime that recapitulates developmental processes. (B) Having established this ordering, genes that show significant changes in expression along the developmental pathway may be identified. https://www.embopress.org/doi/full/10.15252/msb.20178046

 

- Tools : Monocle, DPT, TSCAN, Mpath, RNAvelocity, scVelo

Differential expression

더보기

 

- Tools : non-parametric Wilcoxon test, MAST (Gaussian hurdle model), MetaCell (bootstrapping)

 

 

 

 

 

 

 

 

 

 

Reference

- Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data, https://www.nature.com/articles/s41596-020-00409-w

- The triumphs and limitations of computational methods for scRNA-seq, https://www.nature.com/articles/s41592-021-01171-x

- Data Science for High-Throughput Sequencing, http://data-science-sequencing.github.io/

- https://www.10xgenomics.com/blog/single-cell-rna-seq-an-introductory-overview-and-tools-for-getting-started

- https://github.com/seandavi/awesome-single-cell

- https://youtu.be/qgasqiiEA1g

 

 

 

'Study' 카테고리의 다른 글

Signature matrix  (0) 2021.12.29
RNA velocity  (0) 2021.11.02
ICGC database  (0) 2021.06.24
Nanopore  (0) 2021.06.22
Strand-ambiguous SNPs  (0) 2021.05.10

댓글