본문 바로가기
Tab

Command

by wycho 2020. 8. 7.

Spliting fasta

$ faidx -x reference.fa

chr1.fa chr2.fa ... chrY.fa

$ samtools faidx reference.fa chr1 > reference_chr1.fa

 

Indexing fasta

$ samtools faidx chr21.fa

$ cat chr21.fa.fai

chr21   46708999        7       60      61

NAME Name of this reference sequence
LENGTH Total length of this reference sequence, in bases
OFFSET Offset in the FASTA/FASTQ file of this sequence's first base
LINEBASES The number of bases on each line
LINEWIDTH The number of bytes in each line, including the newline
QUALOFFSET Offset of sequence's first quality within the FASTQ file

 

$ alias prl="bash -c '(for i in {1..22};do eval echo \$@ ;done) |parallel \"{}\" ' _"

$ prl 'bcftools view -r ${i} data.vcf.gz -Oz -o data_chr${i}.vcf.gz'

 

Sort

$ (grep ^"#" data.vcf; grep -v ^"#" data.vcf | sort -k1,1V -k2,2g) > sorted.vcf

$ find path -type f -name "*.txt" -print0 | sort -zV | xargs -0 cat | sort -g -k 9 -o outfilename

 

Grouping

df1=df.groupby('GENE')['SNP'].apply(' '.join).reset_index()

df2=df1['GENE'].to_frame()
df3=df1['SNP'].str.split(' ',expand=True).fillna('')
df4=df2.join(df3,how='right')
df4.to_csv('groupFile_geneBasedtest.txt',header=None,index=None,sep='\t')

https://stackoverflow.com/questions/36271413/pandas-merge-nearly-duplicate-rows-based-on-column-value

 

 

Space-separate to Tab-separate

awk OFS="\t" '{$1=$1}1' data.txt > data_tab.txt

OFS="\t" # set output separator as a tab
{$1=$1}  # remove extra spaces and set OFS as tab
1        # with awk, true, so print the current line

https://stackoverflow.com/questions/59472326/what-does-this-mean-awk-ofs-t-1-11-filepath

 

 

Insert 0

num=2

a=format(num, '03')
b={0:04d}.format(num)

a=002
b=0002

 

Split a file 

$ split -d -n r/4 data.txt data_ # round robin way to split lines divide by 4
data_00 data_01 data_02 data_03

 

$ vi ~/.config/matplotlib/matplotlibrc
backend: TkAgg

 

$ rpm -ql libxml2-devel

in R
Sys.setenv(XML_CONFIG="/usr/bin/xml2-config")

'Tab' 카테고리의 다른 글

참고  (0) 2020.07.28
링크  (0) 2020.07.28
Glossary  (0) 2020.07.09

댓글