I am using the UCSC lift over tool and the associated chain to lift over the results of my GWAS summary statistic file (a tab separated file) from build 38 to build 37. The GWAS summary stat file looks like:
1 chr1_17626_G_A 17626 A G 0.016 -0.0332 0.0237 0.161 1 chr_20184_G_A 20184 A G 0.113 -0.185 0.023 0.419
Follwing is the UCSC tool with the associated chain I am using:
- liftover: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver
- chain file: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz
I want to create a file in bed format from GWAS summary stat fle that is the required input by the tool, where I would like the first three columns to be tab separated and rest of the columns to be merged in a single column and separated by a non tab separator such as “.” so as to preserve them while running the lift over. The first three columns of the input bed file would be:
awk '{print chr$1, $3-1, $3}' GWAS summary stat file > ucsc.input.file #$1 = chrx - where x is chromosome number #$2 position -1 for SNPs #$3 bp position hg38 for SNPs
The above three are the required columns for the tool.
My questions are:
- How can I use a non tab separator say “:” to merge rest of the columns of the GWAS summary stat file in one column?
- After running the liftover, how can I unpack the columns separated by 😕
Advertisement
Answer
I am not sure if this answers your questions but please take a look.
You can use awk to merge multiple columns by :
awk '{print $1 ":" $2 ":" $3}' file
and then say you want to replace :
by tab in $1
then you can do
awk -F ":" '{gsub(/:/,"t",$1)}1' file
Is this of any help?