본문 바로가기
Tools

GSEA

by wycho 2021. 10. 1.

GSEA ( Gene Set Enrichment Analysis )

: 오믹스 데이터 분석을 통해서 최종적인 gene set을 찾게 되었을 때, 이것들의 pathway는 어디에서 많이 보이는지 찾아주는 라이브러리이다.

 

- Homepage : https://www.gsea-msigdb.org/gsea/index.jsp

- Pape : https://doi.org/10.1073/pnas.0506580102

- Github : https://github.com/zqfang/GSEApy

- Tutorial : https://gseapy.readthedocs.io/en/latest/introduction.html

$ pip install gseapy

- Gene set library https://maayanlab.cloud/Enrichr/#libraries

- Reactome : https://reactome.org/download/current/ReactomePathways.gmt.zip

 

 

Shell 에서 실행하거나 프로그래밍하여 실행하는 두 가지 방법이 있다.

# An example to reproduce figures using replot module.
$ gseapy replot -i ./Gsea.reports -o test


# An example to run GSEA using gseapy gsea module
$ gseapy gsea -d exptable.txt -c test.cls -g gene_sets.gmt -o test

# An example to run Prerank using gseapy prerank module
$ gseapy prerank -r gsea_data.rnk -g gene_sets.gmt -o test

# An example to run ssGSEA using gseapy ssgsea module
$ gseapy ssgsea -d expression.txt -g gene_sets.gmt -o test

# An example to use enrichr api
# see details of -g below, -d  is optional
$ gseapy enrichr -i gene_list.txt -g KEGG_2016 -d pathway_enrichment -o test

 

def enrichment(glist):
    names = gp.get_library_name()
 
     gname = 'gene_set_list.txt'
    if not os.path.isfile(gname):
        with open(gname,'w') as f:
            for name in names:
                f.write(name+'\n')

    gset = 'Reactome_2016'
    gset = 'KEGG_2021_Human'
    gset = 'ReactomePathways.gmt'
    dname = re.split('[_.,]', gset)
    enr = gp.enrichr(gene_list   = glist,
                     gene_sets   = gset,
                     organism    = 'Human',
                     description = 'Pathway in gene set library of ' + gset,
                     outdir      = dname,
                     # no_plot=True,
                     cutoff      = 0.5
                     )

    print(enr.results)

 

https://gseapy.readthedocs.io/en/latest/faq.html

 

Q: What the difference between ssGSEA and Prerank
A: In short, - prerank is used for comparing two group of samples (e.g. control and treatment), where the gene ranking are defined by your custom rank method (like t-statistic, signal-to-noise, et.al). - ssGSEA is used for comparing individual samples to the rest of all, trying to find the gene signatures which samples shared the same (use ssGSEA when you have a lot of samples).

The statistic between prerank (GSEA) and ssGSEA are different. Assume that we have calculated each running enrichment score of your ranked input genes, then

es for GSEA: max(running enrichment scores) or min(running enrichment scores)
es for ssGSEA: sum(running enrichment scores)

 

 

 

 

'Tools' 카테고리의 다른 글

Webpage with Streamlit  (0) 2021.10.28
Marp - Markdown to PPT  (0) 2021.10.13
SCSA - scRNA-seq Annotation  (0) 2021.09.01
Single Cell analysis tools  (0) 2021.08.13
Clustering - Fuzzy  (0) 2021.06.18

댓글