기존 bulk방식의 RNA-seq 데이터의 수집 및 분석 방식과 single cell RNA-seq 방식의 차이는 다음과 같다.
Process for barcoding single-cell data
Remove Doublet
Gene cell-type annotation
DB
- CellMarker : http://biocc.hrbmu.edu.cn/CellMarker/
- CanserSEA : http://biocc.hrbmu.edu.cn/CancerSEA/home.jsp
Software
- SCSA (python) : https://github.com/bioinfo-ibms-pumc/SCSA
- scMatch (python) : https://github.com/asrhou/scMatch
uses FANTOM5, https://fantom.gsc.riken.jp/5/
데이터를 분석하기 위해서는 데이터의 구조부터 이해해야 한다.
Data structure
{ Cell barcode | UMI (Unique Molecular Identifiers) | cDNA }
Quality control
Normalization
Sctransform ( UMI count )
: Regularized negative binomical regression, no longer influenced by technical characteristics
Start with GLM,
$$ \log(E(x_i)) = \beta_0 + \beta_1 \log_{10} m $$
where \( x_i \) is the vector of UMI counts assigned to gene \( i \) and \( m \) is the vector of molecules assigned to the cells, i.e. \( m_j = \sum_i x_{ij} \).
Then use NB parameter with mean \( \mu \) and variance given as \( \mu + \frac{\mu^2}{\theta} \).
Pearson residuals:
\( r_{ij} = \frac{x_{ij} - \mu_{ij}}{\sigma_{ij}} \)
\( \mu_{ij} = \exp(\beta_{0i} + \beta_{1i} log_{10} m_j) \)\
\( \sigma_{ij} = \sqrt{\mu_{ij} + \frac{\mu_{ij}^2}{\theta_i} } \)
Regularized NB regression model captures and removes variance driven by technical differences, while retaining biologically relevant signal.
- Tools : scran, SCnorm, sctransform, bayNorm
Batch effect correction
Batch effect란 수집된 scRNA-seq 데이터가 다른 사이트나 시간 또는 경험이 다른 사람들에 의해서 수집되었을 경우 발생할 수 있는 non-biological factor를 말한다.
- Tools : ComBat, mnnCorrect, Seurat (Canonical correlation analysis)
Imputation and smoothing
scRNA-seq 데이터에는 많은 0을 가지고 있다.
- Tools : scImpute, DrImpute, SAVER, MAGIC, scVI, SAVER-X, netNMF-sc
Cell cycle assignment
분석 결과가 single cell 의 주기에 영향을 받는 연구나, cell cycle에 관련된 연구를 진행할 경우 assign을 해 주어야 한다.
- Tools : cyclone, Seurat
Feature selection
목적에 맞는 영향력있는 유전자를 선별하기 위해 필요한 과정이다.
- Tools : GiniClust
Dimensionality reduction and visualization
- Tools : PCA, UMAP, t-SNE
Unsupervised clustering
- Tools : k-means algorithm, Phenograph, Louvain algorithm
Pseudotime
Clustering을 했더라도 chemical concentration 또는 time couses 와 같은 pseudotime에 의해 cell 이 어떻게 분화하게 되었는지 trajectory를 보여주는 과정이다.
- Tools : Monocle, DPT, TSCAN, Mpath, RNAvelocity, scVelo
Differential expression
- Tools : non-parametric Wilcoxon test, MAST (Gaussian hurdle model), MetaCell (bootstrapping)
Reference
- Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data, https://www.nature.com/articles/s41596-020-00409-w
- The triumphs and limitations of computational methods for scRNA-seq, https://www.nature.com/articles/s41592-021-01171-x
- Data Science for High-Throughput Sequencing, http://data-science-sequencing.github.io/
- https://github.com/seandavi/awesome-single-cell
- https://youtu.be/qgasqiiEA1g
'Study' 카테고리의 다른 글
Signature matrix (0) | 2021.12.29 |
---|---|
RNA velocity (0) | 2021.11.02 |
ICGC database (0) | 2021.06.24 |
Nanopore (0) | 2021.06.22 |
Strand-ambiguous SNPs (0) | 2021.05.10 |
댓글