Hard to explain in short here. Those files are genomic data of different collections for each same material to get the gene expression abundance by "sequence" (string in computer term, (count number, compare to "grep "ATCG" file.rice | wc") ) at different time. They are concatenated to have summed abundance in total.
That's roughly what I meant with back some steps, but the situation is far away from a clear picture, what you try to do.
I understand for sure that you have plenty of GB of strings/sequences.
Regarding this command
I'm sorry to say, but this command and all alikes are nonsense. What you do here is - as I said - concatenate compressed files which will not be possible to be recovered.
Maybe the following is what you try to achieve(And this needs cpu-power!):
I guess you like to combine different sequence sets for different projects. If the sequence sets are statically - if they do not change(strict requirement), you maybe able to just use the data-set you need dynamically assigned to a project.
Example for dynamic assignment of static data to different projects
The following data/sequence-sets are available(directory structure):
..and maybe you have a project structure too, which is using specific files of the data set(The arrows mean the files are symbolic links to the real data files):
So if you want the data for project_01 you may just create the data on standard output(which should be fed into your processing - whatever that is) on the fly with this command:
The basic idea is: Don't copy data around if not really needed. Have the parts once where needed and keep them forever as they are.
--
To further reduce I/O you can compress the Data files. If you have xz available: use that! It can compress far better than gzip - it needs more cpu-power too, but if compressing speed matters that can be parallelized quite good in your scenario. And you don't even have to permanently decompress your files. As a replacement for cat there's zcat(gzip) and xzcat(xz) to decompress to stdout when you need it.
i need to execute 5 jobs at a time in background and need to get the exit status of all the jobs i wrote small script below , i'm not sure this is right way to do it.any ideas please help.
$cat run_job.ksh
#!/usr/bin/ksh
####################################
typeset -u SCHEMA_NAME=$1
... (1 Reply)
I want to log into a remote server transfer over a new config and then backup the existing config, replace with the new config.
I am not sure if I can do this with BASH scripting.
I have set up password less login by adding my public key to authorized_keys file, it works.
I am a little... (1 Reply)
Hi All,
I am trying to run this script. I have a small problem:
each "./goada.sh" command when done produces three files (file1, file2, file3) then they are moved to their respective directory as can be seem from this script snippet here.
The script goada.sh sends some commands for some... (1 Reply)
Status quo is, within a web application, which is coded completely in php (not by me, I dont know php), I have to fill out several fields, and execute it manually by clicking the "go" button in my browser, several times a day.
Thats because:
The script itself pulls data (textfiles) from a... (3 Replies)
I need to find a smarter way to process about 60,000 files in a single directory.
Every night a script runs on each file generating a output on another directory; this used to take 5 hours, but as the data grows it is taking 7 hours.
The files are of different sizes, but there are 16 cores... (10 Replies)
Hello,
I am running GNU bash, version 3.2.39(1)-release (x86_64-pc-linux-gnu). I have a specific question pertaining to waiting on jobs run in sub-shells, based on the max number of parallel processes I want to allow, and then wait... (1 Reply)
Dear all,
I'm a newbie in programming and I would like to know if it is possible to parallelize the script:
for l in {1..1000}
do
cut -f$l quase2 |tr "\n" "," |sed 's/$/\
/g' |sed '/^$/d' >a_$l.t
done
I tried:
for l in {1..1000}
do
cut -f$l quase2 |tr "\n" "," |sed 's/$/\
/g' |sed... (7 Replies)
Hello,
the bulk of my work is run by scripts. An example is as such:
#!/bin/bash
awk '{print first line}' Input.in > Intermediate.ter
awk '{print second line}' Input.in > Intermediate_2.ter
command Intermediate.ter Intermediate_2.ter > Output.out
It works the way I want it to, but it's not... (1 Reply)
I have multiple jobs and each job dependent on other job.
Each Job generates a log and If job completed successfully log file end's with JOB ENDED SUCCESSFULLY message and if it failed then it will end with JOB ENDED with FAILURE.
I need an help how to start.
Attaching the JOB dependency... (3 Replies)
How to run several bash commands put in bash command line without needing and requiring a script file.
Because I'm actually a windows guy and new here so for illustration is sort of :
$ bash "echo ${PATH} & echo have a nice day!"
will do output, for example:... (4 Replies)