GSEA ( Gene Set Enrichment Analysis )
: 오믹스 데이터 분석을 통해서 최종적인 gene set을 찾게 되었을 때, 이것들의 pathway는 어디에서 많이 보이는지 찾아주는 라이브러리이다.
- Homepage : https://www.gsea-msigdb.org/gsea/index.jsp
- Pape : https://doi.org/10.1073/pnas.0506580102
- Github : https://github.com/zqfang/GSEApy
- Tutorial : https://gseapy.readthedocs.io/en/latest/introduction.html
$ pip install gseapy
- Gene set library : https://maayanlab.cloud/Enrichr/#libraries
- Reactome : https://reactome.org/download/current/ReactomePathways.gmt.zip
Shell 에서 실행하거나 프로그래밍하여 실행하는 두 가지 방법이 있다.
# An example to reproduce figures using replot module.
$ gseapy replot -i ./Gsea.reports -o test
# An example to run GSEA using gseapy gsea module
$ gseapy gsea -d exptable.txt -c test.cls -g gene_sets.gmt -o test
# An example to run Prerank using gseapy prerank module
$ gseapy prerank -r gsea_data.rnk -g gene_sets.gmt -o test
# An example to run ssGSEA using gseapy ssgsea module
$ gseapy ssgsea -d expression.txt -g gene_sets.gmt -o test
# An example to use enrichr api
# see details of -g below, -d is optional
$ gseapy enrichr -i gene_list.txt -g KEGG_2016 -d pathway_enrichment -o test
def enrichment(glist):
names = gp.get_library_name()
gname = 'gene_set_list.txt'
if not os.path.isfile(gname):
with open(gname,'w') as f:
for name in names:
f.write(name+'\n')
gset = 'Reactome_2016'
gset = 'KEGG_2021_Human'
gset = 'ReactomePathways.gmt'
dname = re.split('[_.,]', gset)
enr = gp.enrichr(gene_list = glist,
gene_sets = gset,
organism = 'Human',
description = 'Pathway in gene set library of ' + gset,
outdir = dname,
# no_plot=True,
cutoff = 0.5
)
print(enr.results)
https://gseapy.readthedocs.io/en/latest/faq.html
Q: What the difference between ssGSEA and Prerank
A: In short, - prerank is used for comparing two group of samples (e.g. control and treatment), where the gene ranking are defined by your custom rank method (like t-statistic, signal-to-noise, et.al). - ssGSEA is used for comparing individual samples to the rest of all, trying to find the gene signatures which samples shared the same (use ssGSEA when you have a lot of samples).
The statistic between prerank (GSEA) and ssGSEA are different. Assume that we have calculated each running enrichment score of your ranked input genes, then
es for GSEA: max(running enrichment scores) or min(running enrichment scores)
es for ssGSEA: sum(running enrichment scores)
'Tools' 카테고리의 다른 글
Webpage with Streamlit (0) | 2021.10.28 |
---|---|
Marp - Markdown to PPT (0) | 2021.10.13 |
SCSA - scRNA-seq Annotation (0) | 2021.09.01 |
Single Cell analysis tools (0) | 2021.08.13 |
Clustering - Fuzzy (0) | 2021.06.18 |
댓글