https://portal.gdc.cancer.gov/repository
TCGA 에서는 DNA와 RNA 데이터를 여러가지 형태로 제공한다.
다음의 순서로 원하는 데이터를 다운로드 받는다.
데이터 정보 ( aligned by GRCh38 )
: https://docs.gdc.cancer.gov/Encyclopedia/pages/RNA-Seq/
1. 데이터 선택
: [ Files ] 탭에서 데이터 분류를 선택하고, [ Cases ] 탭에서 데이터 종류를 선택한다.
2. 그래프 위쪽 부분에 [ Manifest ] 버튼을 눌러 데이터 리스트를 다운로드 받는다.
: gdc_manifest.2021-08-27.txt
3. 데이터 정보를 JSON 형태로 다운로드 받는다.
: files.2021-08-27.json
- Number of data list : 16843
- Data shape : [{'data_format': 'TXT', 'cases': [{'case_id': '1a93dd15-f404-484c-b67c-98169d5522c7', 'project': {'project_id': 'TCGA-LAML'}}], 'access': 'open', 'file_name': '486778af-8f8e-4000-9812-409604e274a5.FPKM.txt.gz', 'data_category': 'Transcriptome Profiling', 'annotations': [{'annotation_id': '7cf91358-0db5-5fe9-9817-6e22dbb460c6'}], 'file_size': 550816}, ... ]
4. GDC Data Transfer Tool 을 다운받는다.
- https://gdc.cancer.gov/access-data/gdc-data-transfer-tool
- https://github.com/NCI-GDC/gdc-client
$ wget https://gdc.cancer.gov/files/public/file/gdc-client_v1.6.1_Ubuntu_x64.zip
5. 원하는 폴더를 생성하여 manifest 데이터 리스트에 있는 파일들을 다운로드 받는다.
$ gdc-client download -m gdc_manifest.2021-08-27.txt --debug --log-file logfile.txt
파일들은 [ Transcript ID | FPKM ] 의 두 컬럼 데이터와 annotation.txt 데이터가 각 case_id 폴더에 있다.
Breast
- TCGA-BRCA ( 1222 ) - Breast invasive carcinoma
Brain
- TCGA-GBM ( 174 ) - Glioblastoma multiforme
- TCGA-LGG ( 529 ) - Brain Lower Grade Glioma
Endocrine
- TCGA-ACC ( 79 ) - Adrenocortical carcinoma
- TCGA-PCPG ( 186 ) - Pheochromocytoma and Paraganglioma
- TCGA-THCA ( 568 ) - Thyroid carcinoma
Gastrointestinal
- TCGA-CHOL ( 45 ) - Cholangiocarcinoma
- TCGA-COAD ( 521 ) - Colon adenocarcinoma
- TCGA-READ ( 177 ) - Rectum adenocarcinoma
- TCGA-ESCA ( 173 ) - Esophageal carcinoma
- TCGA-LIHC ( 424 ) - Liver hepatocellular carcinoma
- TCGA-PAAD ( 182 ) - Pancreatic adenocarcinoma
- TCGA-STAD ( 407 ) - Stomach adenocarcinoma
Gynecologic
- TCGA-CESC ( 309 ) - Cervical squamous cell carcinoma and endocervical adenocarcinoma
- TCGA-UCEC ( 587 ) - Uterine Corpus Endometrial Carcinoma
- TCGA-UCS ( 56 ) - Uterine Carcinosarcoma
HeadNeck
- TCGA-HNSC ( 546 ) - Head and Neck squamous cell carcinoma
- TCGA-UVM ( 80 ) - Uveal Melanoma
Hemato
- TCGA-DLBC ( 48 ) - Lymphoid Neoplasm Diffuse Large B-cell Lymphoma
- TCGA-LAML ( 151 ) - Acute Myeloid Leukemia
- TCGA-THYM ( 121 ) - Thymoma
Skin
- TCGA-SKCM ( 472 ) - Skin Cutaneous Melanoma
ST
- TCGA-SARC ( 265 ) - Sarcoma
Thoracic
- TCGA-MESO ( 86 ) - Mesothelioma
- TCGA-LUAD ( 594 ) - Lung adenocarcinoma
- TCGA-LUSC ( 551 ) - Lung squamous cell carcinoma
Urologic
- TCGA-BLCA ( 433 ) - Bladder Urothelial Carcinoma
- TCGA-PRAD ( 551 ) - Prostate adenocarcinoma
- TCGA-TGCT ( 156 ) - Testicular Germ Cell Tumors
- TCGA-OV ( 379 ) - Ovarian serous cystadenocarcinoma
- TCGA-KICH ( 89 ) - Kidney Chromophobe
- TCGA-KIRC ( 611 ) - Kidney renal clear cell carcinoma
- TCGA-KIRP ( 321 ) - Kidney renal papillary cell carcinoma
Cancer Type | Cohort | ID | Samples | Death Event | Median OS (yrs) | Permission | Link | Download |
Pheochromocytoma and Paraganglioma | TCGA | PCPG | 179 | 6 | NA | Y | TCGA, GDAC | Download |
Prostate adenocarcinoma | TCGA | PRAD | 499 | 10 | NA | Y | TCGA, GDAC | Download |
Adrenocortical carcinoma | TCGA | ACC | 92 | 33 | NA | Y | TCGA, GDAC | Download |
Pancreatic adenocarcinoma | TCGA | PAAD | 185 | 99 | 1.66 | Y | TCGA, GDAC | Download |
Ovarian serous cystadenocarcinoma | TCGA | OV | 591 | 335 | 3.81 | Y | TCGA, GDAC, CPTAC | Download |
Lung squamous cell carcinoma | TCGA | LUSC | 504 | 203 | 4.64 | Y | TCGA, GDAC | Download |
Mesothelioma | TCGA | MESO | 87 | 72 | 1.54 | Y | TCGA, GDAC | Download |
Skin Cutaneous Melanoma | TCGA | SKCM | 470 | 221 | 6.58 | Y | TCGA, GDAC | Download |
Stomach and Esophageal carcinoma | TCGA | STES | 628 | 236 | 2.58 | Y | TCGA, GDAC | Download |
Uterine Corpus Endometrial Carcinoma | TCGA | UCEC | 548 | 91 | NA | Y | TCGA, GDAC | Download |
Uterine Carcinosarcoma | TCGA | UCS | 57 | 34 | 2.22 | Y | TCGA, GDAC | Download |
Uveal Melanoma | TCGA | UVM | 80 | 23 | 3.82 | Y | TCGA, GDAC | Download |
Thymoma | TCGA | THYM | 124 | 9 | NA | Y | TCGA, GDAC | Download |
Thyroid carcinoma | TCGA | THCA | 503 | 16 | NA | Y | TCGA, GDAC | Download |
Lung adenocarcinoma | TCGA | LUAD | 522 | 179 | 4.11 | Y | TCGA, GDAC | Download |
Testicular Germ Cell Tumors | TCGA | TGCT | 134 | 3 | NA | Y | TCGA, GDAC | Download |
Stomach adenocarcinoma | TCGA | STAD | 443 | 162 | 2.86 | Y | TCGA, GDAC | Download |
Sarcoma | TCGA | SARC | 261 | 97 | 5.45 | Y | TCGA, GDAC | Download |
Colorectal adenocarcinoma | TCGA | COADREAD | 629 | 124 | 6.94 | Y | GDAC, CPTAC | Download |
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | TCGA | DLBC | 48 | 9 | 17.60 | Y | TCGA, GDAC | Download |
Esophageal carcinoma | TCGA | ESCA | 185 | 74 | 2.19 | Y | TCGA, GDAC | Download |
Cholangiocarcinoma | TCGA | CHOL | 45 | 19 | 3.84 | Y | TCGA, GDAC | Download |
Cervical and endocervical cancers | TCGA | CESC | 307 | 71 | 8.48 | Y | TCGA, GDAC | Download |
Bladder urothelial carcinoma | TCGA | BLCA | 412 | 178 | 2.84 | Y | TCGA, GDAC | Download |
Liver hepatocellular carcinoma | TCGA | LIHC | 377 | 125 | 4.91 | Y | TCGA, GDAC | Download |
Glioblastoma multiforme | TCGA | GBM | 595 | 472 | 1.21 | Y | TCGA, GDAC | Download |
Breast invasive carcinoma | TCGA | BRCA | 1097 | 151 | 10.81 | Y | TCGA, GDAC, CPTAC | Download |
Kidney renal papillary cell carcinoma | TCGA | KIRP | 291 | 44 | NA | Y | TCGA, GDAC | Download |
Brain Lower Grade Glioma | TCGA | LGG | 515 | 123 | 7.29 | Y | TCGA, GDAC | Download |
Glioma | TCGA | GBMLGG | 1110 | 595 | 2.05 | Y | TCGA, GDAC | Download |
Kidney renal clear cell carcinoma | TCGA | KIRC | 537 | 175 | 7.57 | Y | TCGA, GDAC | Download |
Acute Myeloid Leukemia | TCGA | LAML | 200 | 109 | 1.34 | Y | TCGA, GDAC | Download |
Pan-kidney cohort | TCGA | KIPAN | 941 | 230 | NA | Y | TCGA, GDAC | Download |
Head and Neck squamous cell carcinoma | TCGA | HNSC | 528 | 218 | 4.75 | Y | TCGA, GDAC | Download |
Kidney Chromophobe | TCGA | KICH | 113 | 11 | NA | Y | TCGA, GDAC | Download |
Reference
- Systematic pan-cancer analysis of tumour purity (2015), https://www.nature.com/articles/ncomms9971
- https://rnbeads.org/methylomes.html
'etc' 카테고리의 다른 글
Read table data with Parquet (0) | 2021.10.18 |
---|---|
pandas - Merge, join, concatenate and compare (0) | 2021.10.08 |
알기 쉬운 이야기 - 면역&바이러스 (0) | 2021.08.05 |
Data (0) | 2021.07.30 |
[book] 나만의 유전자 (0) | 2021.07.05 |
댓글