Spliting fasta
$ faidx -x reference.fa
chr1.fa chr2.fa ... chrY.fa
$ samtools faidx reference.fa chr1 > reference_chr1.fa
Indexing fasta
$ samtools faidx chr21.fa
$ cat chr21.fa.fai
chr21 46708999 7 60 61
NAME | Name of this reference sequence |
LENGTH | Total length of this reference sequence, in bases |
OFFSET | Offset in the FASTA/FASTQ file of this sequence's first base |
LINEBASES | The number of bases on each line |
LINEWIDTH | The number of bytes in each line, including the newline |
QUALOFFSET | Offset of sequence's first quality within the FASTQ file |
$ alias prl="bash -c '(for i in {1..22};do eval echo \$@ ;done) |parallel \"{}\" ' _"
$ prl 'bcftools view -r ${i} data.vcf.gz -Oz -o data_chr${i}.vcf.gz'
Sort
$ (grep ^"#" data.vcf; grep -v ^"#" data.vcf | sort -k1,1V -k2,2g) > sorted.vcf
$ find path -type f -name "*.txt" -print0 | sort -zV | xargs -0 cat | sort -g -k 9 -o outfilename
Grouping
df1=df.groupby('GENE')['SNP'].apply(' '.join).reset_index()
df2=df1['GENE'].to_frame()
df3=df1['SNP'].str.split(' ',expand=True).fillna('')
df4=df2.join(df3,how='right')
df4.to_csv('groupFile_geneBasedtest.txt',header=None,index=None,sep='\t')
Space-separate to Tab-separate
awk OFS="\t" '{$1=$1}1' data.txt > data_tab.txt
OFS="\t" # set output separator as a tab
{$1=$1} # remove extra spaces and set OFS as tab
1 # with awk, true, so print the current line
https://stackoverflow.com/questions/59472326/what-does-this-mean-awk-ofs-t-1-11-filepath
Insert 0
num=2
a=format(num, '03')
b={0:04d}.format(num)
a=002
b=0002
Split a file
$ split -d -n r/4 data.txt data_ # round robin way to split lines divide by 4
data_00 data_01 data_02 data_03
$ vi ~/.config/matplotlib/matplotlibrc
backend: TkAgg
$ rpm -ql libxml2-devel
in R
Sys.setenv(XML_CONFIG="/usr/bin/xml2-config")
댓글