I am trying to filter a text file with columns based on two conditions. Due to the size of the file, I cannot use the column numbers (as there are thousands and are unnumbered) but need to use the column names. I have searched and tried to come up with multiple ways to do this but nothing is returned to
Tag: bioinformatics
Lifting over GWAS summary statististic file from build 38 to build 37
I am using the UCSC lift over tool and the associated chain to lift over the results of my GWAS summary statistic file (a tab separated file) from build 38 to build 37. The GWAS summary stat file looks like: Follwing is the UCSC tool with the associated chain I am using: liftover: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver chain file: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz I want to
How to speed-up sed that uses Regex on very large single cell BAM file
I have the following simple script that tries to count the tag encoded with “CB:Z” in SAM/BAM file: Typically it needs to process 40 million lines. That codes takes around 1 hour to finish. This line sed ‘s/.*CB:Z:([ACGT]*).*/1/’ is very time consuming. How can I speed it up? The reason I used the Regex is that the “CB” tag column-wise
How can I add a column of ascending numbers for each scaffold in my bed file
So I have a file like this, with each row representing a position in the scaffolds with some positions omitted. (There are actually a lot more rows for each scaffold): and ultimately i want to make 100kb sized windows for each scaffold separately (the last window on each scaffold would be less than 100kb).This is what it should look like:
How to make the bash script work with one command after another?
I have a bash script like below. First it will take sorted.bam files as input and use “stringtie” tool give each sample gtf as output. Then path for each sample gtf will be given into mergelist.txt. and then use “stringtie merge” on them to get “stringtie_merged.gtf”. I totally have 40 sorted.bam files. I separated the commands with ; After running
batch extracting data from files, naming new files according to string in input file
With Linux I want to automatically extract data from .dat files and name the new files according to a string in the input files: I have 300 .dat files with a data structure as follows: . . . DE name1, contig1 . . SQ information1 // . . DE name1, contig2 . . SQ information2 // . where the “.”
blast could not create a unit counts container
I build a blast local database. However, when I run the blastn command I got this error message: T0 “/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_250088_130.14.22.10_9008__PrepareRelease_Linux64-Centos_1448906370/c++/compilers/unix/../../src/algo/winmask/seq_masker_istat_factory.cpp”, line 170: Error: ncbi::CSeqMaskerIstatFactory::DiscoverStatType() – could not open T0 “/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_250088_130.14.22.10_9008__PrepareRelease_Linux64-Centos_1448906370/c++/compilers/unix/../../src/algo/winmask/seq_masker_istat_factory.cpp”, line 271: Error: ncbi::CSeqMaskerIstatFactory::create() – could not create a unit counts container I am using this command to create the blast local database: And this is my command for executing
What is the best way to evaluate two variables representing a single pipeline command in bash?
I have a function produce which determines whether a file is present and if not it runs the following command. This works fine when the command output simply writes to stdout. However in the command below I pipe the output to a second command and then to a third command before it outputs to stdout. In this scenario I get
split text file (Genome data) based on column values keeping header line
I have a big genome data file (.txt) in the format below. I would like to split it based on chromosome column chr1, chr2..chrX,chrY and so forth keeping the header line in all splitted files. How can I do this using unix/linux command? genome data result Answer Is this data for the human genome (i.e. always 46 chromosomes)? If so,