Hello,
I have a bunch of jobs (cp, cat or ln -s) on big files (10xGB in size):
One command at a line!
I can send them into background by adding & at the end of each line, but then all at once the server (24 cores) get over-loaded with my 100x jobs.
Or, I can run the bash script, so that the jobs are run one-by-one, then the available cores my admin assigned me (8 ~ 16 normally) are not fully used.
The challenge is the file names of each line are so random that it is hard to script into nice format for loops or regex.
I was thinking there may be some way like but I am not sure the exact option:
How to use the parallel for my case to spread the jobs to available cores and speed up the work?
Thanks!
i need to execute 5 jobs at a time in background and need to get the exit status of all the jobs i wrote small script below , i'm not sure this is right way to do it.any ideas please help.
$cat run_job.ksh
#!/usr/bin/ksh
####################################
typeset -u SCHEMA_NAME=$1
... (1 Reply)
I want to log into a remote server transfer over a new config and then backup the existing config, replace with the new config.
I am not sure if I can do this with BASH scripting.
I have set up password less login by adding my public key to authorized_keys file, it works.
I am a little... (1 Reply)
Hi All,
I am trying to run this script. I have a small problem:
each "./goada.sh" command when done produces three files (file1, file2, file3) then they are moved to their respective directory as can be seem from this script snippet here.
The script goada.sh sends some commands for some... (1 Reply)
Status quo is, within a web application, which is coded completely in php (not by me, I dont know php), I have to fill out several fields, and execute it manually by clicking the "go" button in my browser, several times a day.
Thats because:
The script itself pulls data (textfiles) from a... (3 Replies)
I need to find a smarter way to process about 60,000 files in a single directory.
Every night a script runs on each file generating a output on another directory; this used to take 5 hours, but as the data grows it is taking 7 hours.
The files are of different sizes, but there are 16 cores... (10 Replies)
Hello,
I am running GNU bash, version 3.2.39(1)-release (x86_64-pc-linux-gnu). I have a specific question pertaining to waiting on jobs run in sub-shells, based on the max number of parallel processes I want to allow, and then wait... (1 Reply)
Dear all,
I'm a newbie in programming and I would like to know if it is possible to parallelize the script:
for l in {1..1000}
do
cut -f$l quase2 |tr "\n" "," |sed 's/$/\
/g' |sed '/^$/d' >a_$l.t
done
I tried:
for l in {1..1000}
do
cut -f$l quase2 |tr "\n" "," |sed 's/$/\
/g' |sed... (7 Replies)
Hello,
the bulk of my work is run by scripts. An example is as such:
#!/bin/bash
awk '{print first line}' Input.in > Intermediate.ter
awk '{print second line}' Input.in > Intermediate_2.ter
command Intermediate.ter Intermediate_2.ter > Output.out
It works the way I want it to, but it's not... (1 Reply)
I have multiple jobs and each job dependent on other job.
Each Job generates a log and If job completed successfully log file end's with JOB ENDED SUCCESSFULLY message and if it failed then it will end with JOB ENDED with FAILURE.
I need an help how to start.
Attaching the JOB dependency... (3 Replies)
How to run several bash commands put in bash command line without needing and requiring a script file.
Because I'm actually a windows guy and new here so for illustration is sort of :
$ bash "echo ${PATH} & echo have a nice day!"
will do output, for example:... (4 Replies)
Discussion started by: abdulbadii
4 Replies
LEARN ABOUT DEBIAN
cd-hit-2d-para
CD-HIT-2D-PARA.PL(1) User Commands CD-HIT-2D-PARA.PL(1)NAME
cd-hit-2d-para.pl - divide a big clustering job into pieces to run cd-hit-2d or cd-hit-est-2d jobs
SYNOPSIS
cd-hit-2d-para.pl options
DESCRIPTION
This script divide a big clustering job into pieces and submit jobs to remote computers over a network to make it parallel. After
all the jobs finished, the script merge the clustering results as if you just run a single cd-hit-2d or cd-hit-est-2d.
You can also use it to divide big jobs on a single computer if your computer does not have enough RAM (with -L option).
Requirements:
1 When run this script over a network, the directory where you
run the scripts and the input files must be available on all the remote hosts with identical path.
2 If you choose "ssh" to submit jobs, you have to have
passwordless ssh to any remote host, see ssh manual to know how to set up passwordless ssh.
3 I suggest to use queuing system instead of ssh,
I currently support PBS and SGE
4 cd-hit-2d cd-hit-est-2d cd-hit-div cd-hit-div.pl must be
in same directory where this script is in.
Options
-i input filename for 1st db in fasta format, required
-i2 input filename for 2nd db in fasta format, required
-o output filename, required
--P program, "cd-hit-2d" or "cd-hit-est-2d", default "cd-hit-2d"
--B filename of list of hosts, requred unless -Q or -L option is supplied
--L number of cpus on local computer, default 0 when you are not running it over a cluster, you can use this option to divide a big
clustering jobs into small pieces, I suggest you just use "--L 1" unless you have enough RAM for each cpu
--S Number of segments to split 1st db into, default 2
--S2 Number of segments to split 2nd db into, default 8
--Q number of jobs to submit to queue queuing system, default 0 by default, the program use ssh mode to submit remote jobs
--T type of queuing system, "PBS", "SGE" are supported, default PBS
--R restart file, used after a crash of run
-h print this help
More cd-hit-2d/cd-hit-est-2d options can be speicified in command line
Questions, bugs, contact Weizhong Li at liwz@sdsc.edu
cd-hit-2d-para.pl 4.6-2012-04-25 April 2012 CD-HIT-2D-PARA.PL(1)