본문 바로가기

TCGA data

by wycho 2021. 8. 30.



TCGA 에서는 DNA와 RNA 데이터를 여러가지 형태로 제공한다.


다음의 순서로 원하는 데이터를 다운로드 받는다.


데이터 정보 ( aligned by GRCh38 )

: https://docs.gdc.cancer.gov/Encyclopedia/pages/RNA-Seq/


1. 데이터 선택

: [ Files ] 탭에서 데이터 분류를 선택하고, [ Cases ] 탭에서 데이터 종류를 선택한다.

2. 그래프 위쪽 부분에 [ Manifest ] 버튼을 눌러 데이터 리스트를 다운로드 받는다.

: gdc_manifest.2021-08-27.txt


3. 데이터 정보를 JSON 형태로 다운로드 받는다.

: files.2021-08-27.json

- Number of data list : 16843

- Data shape : [{'data_format': 'TXT', 'cases': [{'case_id': '1a93dd15-f404-484c-b67c-98169d5522c7', 'project': {'project_id': 'TCGA-LAML'}}], 'access': 'open', 'file_name': '486778af-8f8e-4000-9812-409604e274a5.FPKM.txt.gz', 'data_category': 'Transcriptome Profiling', 'annotations': [{'annotation_id': '7cf91358-0db5-5fe9-9817-6e22dbb460c6'}], 'file_size': 550816}, ... ]


4. GDC Data Transfer Tool 을 다운받는다.

- https://gdc.cancer.gov/access-data/gdc-data-transfer-tool

- https://github.com/NCI-GDC/gdc-client

$ wget https://gdc.cancer.gov/files/public/file/gdc-client_v1.6.1_Ubuntu_x64.zip


5. 원하는 폴더를 생성하여 manifest 데이터 리스트에 있는 파일들을 다운로드 받는다.

$ gdc-client download -m gdc_manifest.2021-08-27.txt --debug --log-file logfile.txt

파일들은 [ Transcript ID | FPKM ] 의 두 컬럼 데이터와 annotation.txt 데이터가 각 case_id 폴더에 있다.



- TCGA-BRCA  ( 1222 )  -  Breast invasive carcinoma

- TCGA-GBM   (  174 )  -  Glioblastoma multiforme
- TCGA-LGG   (  529 )  -  Brain Lower Grade Glioma

- TCGA-ACC   (   79 )  -  Adrenocortical carcinoma
- TCGA-PCPG  (  186 )  -  Pheochromocytoma and Paraganglioma
- TCGA-THCA  (  568 )  -  Thyroid carcinoma

- TCGA-CHOL  (   45 )  -  Cholangiocarcinoma
- TCGA-COAD  (  521 )  -  Colon adenocarcinoma
- TCGA-READ  (  177 )  -  Rectum adenocarcinoma
- TCGA-ESCA  (  173 )  -  Esophageal carcinoma
- TCGA-LIHC  (  424 )  -  Liver hepatocellular carcinoma
- TCGA-PAAD  (  182 )  -  Pancreatic adenocarcinoma
- TCGA-STAD  (  407 )  -  Stomach adenocarcinoma

- TCGA-CESC  (  309 )  -  Cervical squamous cell carcinoma and endocervical adenocarcinoma
- TCGA-UCEC  (  587 )  -  Uterine Corpus Endometrial Carcinoma
- TCGA-UCS   (   56 )  -  Uterine Carcinosarcoma

- TCGA-HNSC  (  546 )  -  Head and Neck squamous cell carcinoma
- TCGA-UVM   (   80 )  -  Uveal Melanoma

- TCGA-DLBC  (   48 )  -  Lymphoid Neoplasm Diffuse Large B-cell Lymphoma
- TCGA-LAML  (  151 )  -  Acute Myeloid Leukemia
- TCGA-THYM  (  121 )  -  Thymoma

- TCGA-SKCM  (  472 )  -  Skin Cutaneous Melanoma

- TCGA-SARC  (  265 )  -  Sarcoma

- TCGA-MESO  (   86 )  -  Mesothelioma
- TCGA-LUAD  (  594 )  -  Lung adenocarcinoma
- TCGA-LUSC  (  551 )  -  Lung squamous cell carcinoma

- TCGA-BLCA  (  433 )  -  Bladder Urothelial Carcinoma
- TCGA-PRAD  (  551 )  -  Prostate adenocarcinoma
- TCGA-TGCT  (  156 )  -  Testicular Germ Cell Tumors
- TCGA-OV    (  379 )  -  Ovarian serous cystadenocarcinoma
- TCGA-KICH  (   89 )  -  Kidney Chromophobe
- TCGA-KIRC  (  611 )  -  Kidney renal clear cell carcinoma
- TCGA-KIRP  (  321 )  -  Kidney renal papillary cell carcinoma



Cancer Type Cohort ID Samples Death Event Median OS (yrs)  Permission Link Download
Pheochromocytoma and Paraganglioma TCGA PCPG 179 6 NA Y TCGA, GDAC Download 
Prostate adenocarcinoma TCGA PRAD 499 10 NA Y TCGA, GDAC Download 
Adrenocortical carcinoma TCGA ACC 92 33 NA Y TCGA, GDAC Download 
Pancreatic adenocarcinoma TCGA PAAD 185 99 1.66 Y TCGA, GDAC Download 
Ovarian serous cystadenocarcinoma TCGA OV 591 335 3.81 Y TCGA, GDAC, CPTAC Download 
Lung squamous cell carcinoma TCGA LUSC 504 203 4.64 Y TCGA, GDAC Download 
Mesothelioma TCGA MESO 87 72 1.54 Y TCGA, GDAC Download 
Skin Cutaneous Melanoma TCGA SKCM 470 221 6.58 Y TCGA, GDAC Download 
Stomach and Esophageal carcinoma TCGA STES 628 236 2.58 Y TCGA, GDAC Download 
Uterine Corpus Endometrial Carcinoma TCGA UCEC 548 91 NA Y TCGA, GDAC Download 
Uterine Carcinosarcoma TCGA UCS 57 34 2.22 Y TCGA, GDAC Download 
Uveal Melanoma TCGA UVM 80 23 3.82 Y TCGA, GDAC Download 
Thymoma TCGA THYM 124 9 NA Y TCGA, GDAC Download 
Thyroid carcinoma TCGA THCA 503 16 NA Y TCGA, GDAC Download 
Lung adenocarcinoma TCGA LUAD 522 179 4.11 Y TCGA, GDAC Download 
Testicular Germ Cell Tumors TCGA TGCT 134 3 NA Y TCGA, GDAC Download 
Stomach adenocarcinoma TCGA STAD 443 162 2.86 Y TCGA, GDAC Download 
Sarcoma TCGA SARC 261 97 5.45 Y TCGA, GDAC Download 
Colorectal adenocarcinoma TCGA COADREAD 629 124 6.94 Y GDAC, CPTAC Download 
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma TCGA DLBC 48 9 17.60 Y TCGA, GDAC Download 
Esophageal carcinoma TCGA ESCA 185 74 2.19 Y TCGA, GDAC Download 
Cholangiocarcinoma TCGA CHOL 45 19 3.84 Y TCGA, GDAC Download 
Cervical and endocervical cancers TCGA CESC 307 71 8.48 Y TCGA, GDAC Download 
Bladder urothelial carcinoma TCGA BLCA 412 178 2.84 Y TCGA, GDAC Download 
Liver hepatocellular carcinoma TCGA LIHC 377 125 4.91 Y TCGA, GDAC Download 
Glioblastoma multiforme TCGA GBM 595 472 1.21 Y TCGA, GDAC Download 
Breast invasive carcinoma TCGA BRCA 1097 151 10.81 Y TCGA, GDAC, CPTAC Download 
Kidney renal papillary cell carcinoma TCGA KIRP 291 44 NA Y TCGA, GDAC Download 
Brain Lower Grade Glioma TCGA LGG 515 123 7.29 Y TCGA, GDAC Download 
Glioma TCGA GBMLGG 1110 595 2.05 Y TCGA, GDAC Download 
Kidney renal clear cell carcinoma TCGA KIRC 537 175 7.57 Y TCGA, GDAC Download 
Acute Myeloid Leukemia TCGA LAML 200 109 1.34 Y TCGA, GDAC Download 
Pan-kidney cohort TCGA KIPAN 941 230 NA Y TCGA, GDAC Download 
Head and Neck squamous cell carcinoma TCGA HNSC 528 218 4.75 Y TCGA, GDAC Download 
Kidney Chromophobe TCGA KICH 113 11 NA Y TCGA, GDAC Download 




- Systematic pan-cancer analysis of tumour purity (2015), https://www.nature.com/articles/ncomms9971

- https://rnbeads.org/methylomes.html



'etc' 카테고리의 다른 글

Read table data with Parquet  (0) 2021.10.18
pandas - Merge, join, concatenate and compare  (0) 2021.10.08
알기 쉬운 이야기 - 면역&바이러스  (0) 2021.08.05
Data  (0) 2021.07.30
[book] 나만의 유전자  (0) 2021.07.05
