Improve script - slow process with big files Post: 302990411

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

looking for solution to improve process replicate files to remote loc.

looking for solution to replicate 1.5GB files to a remote location... Currently, this process looks like the following: move 1.5GB files into a staging area. compress files. rsync files to remote server. remove compressed files. I have performed some timings, and compress seems more...

2. Shell Programming and Scripting

bash script working for small size files but not for big size files.

Hi, I have one file stat. Stat file contents are as follows: for example. H50768020040913,00260100,507680,13,0000000643,0000000643,00000,0000 H50769520040808,00260100,507695,13,0000000000,0000000000,00000,0000 H50770620040611,00260100,507706,13,0000000000,0000000000,00000,0000 Now i...

3. AIX

How to send big files over slow network?

Hi, I am trying to send oracle archives over WAN and it is taking hell a lot of time. To reduce the time, I tried to gzip the files and send over to the other side. That seems to reduce the time. Does anybody have experienced this kind of problem and any possible ways to reduce the time. ...

4. Shell Programming and Scripting

egrep is very slow : How to improve performance

We have an egrep search in a while loop. egrep -w "$key" ${PICKUP_DIR}/new_update >> ${PICKUP_DIR}/update_record_new ${PICKUP_DIR}/new_update is 210 MB file In each iteration, the egrep on an average takes around 50-60 seconds to search. Ther'es nothing significant in the loop other...

5. UNIX for Advanced & Expert Users

sed working slow on big files

HI Experts , I'm using the following code to remove spaces appearing at the end of the file. sed "s/*$//g" <filename> > <new_filename> mv <new_filename> <filename> this is working fine for volumes upto 20-25 GB. for the bigger files it is taking more time that it is required...

6. Shell Programming and Scripting

Very big text file - Too slow!

Hello everyone, suppose there is a very big text file (>800 mb) that each line contains an article from wikipedia. Each article begins with a tag (<..>) containing its url. Currently there are 10^6 articles in the file. I want to take random N articles, eliminate all non-alpharithmetic...

7. UNIX for Dummies Questions & Answers

How do I slow down a process?

Hello, I've been searching for something that slows down a process for some time now. Slow down as in make time pass by slower. I have rarely turned to asking a forum in the past but at this point I've given up. For example: if I made a program that would print "Hello" in 5 seconds, I would use...

8. HP-UX

Script execution is very slow when trying to find all files and their owners on HP-UX box

Hi, I have a HP-UX server were I need to list all the files in the entire file system, their directory path, last modified date, owner and group. I do not need to search the file contents. I created the script given below and I am excluding directories and files of type tmp, temp and log. The...

9. Solaris

Rsync quite slow (using very little cpu): how to improve its speed?

I have "inherited" a OmniOS (illumos based) server. I noticed rsync is significantly slower in respect to my reference, FreeBSD 12-CURRENT, running on exactly same hardware. Using same hardware, same command with same source and target disks, OmniOS r151026 gives: test@omniosce:~# time...

10. Shell Programming and Scripting

Bash script search, improve performance with large files

Hello, For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the...

LEARN ABOUT DEBIAN

cd-hit-para

CD-HIT-PARA.PL(1)						   User Commands						 CD-HIT-PARA.PL(1)

NAME

       cd-hit-para.pl - divide a big clustering job into pieces to run cd-hit or cd-hit-est jobs

SYNOPSIS

       cd-hit-para.pl options

DESCRIPTION

	      This  script  divide a big clustering job into pieces and submit jobs to remote computers over a network to make it parallel.  After
	      all the jobs finished, the script merge the clustering results as if you just run a single cd-hit or cd-hit-est.

	      You can also use it to divide big jobs on a single computer if your computer does not have enough RAM (with -L option).

   Requirements:
	      1 When run this script over a network, the directory where you

	      run the scripts and the input files must be available on all the remote hosts with identical path.

	      2 If you choose "ssh" to submit jobs, you have to have

	      passwordless ssh to any remote host, see ssh manual to know how to set up passwordless ssh.

	      3 I suggest to use queuing system instead of ssh,

	      I currently support PBS and SGE

	      4 cd-hit cd-hit-2d cd-hit-est cd-hit-est-2d

	      cd-hit-div cd-hit-div.pl must be in same directory where this script is in.

       Options

       -i input filename in fasta format, required

       -o output filename, required

       --P program, "cd-hit" or "cd-hit-est", default "cd-hit"

       --B filename of list of hosts,

	      requred unless -Q or -L option is supplied

       --L number of cpus on local computer, default 0

	      when you are not running it over a cluster, you can use this option to divide a big clustering jobs into small pieces, I suggest you
	      just use "--L 1" unless you have enough RAM for each cpu

       --S Number of segments to split input DB into, default 64

       --Q number of jobs to submit to queue queuing system, default 0

	      by default, the program use ssh mode to submit remote jobs

       --T type of queuing system, "PBS", "SGE" are supported, default PBS

       --R restart file, used after a crash of run

       -h print this help

       More cd-hit/cd-hit-est options can be speicified in command line

	      Questions, bugs, contact Weizhong Li at liwz@sdsc.edu

cd-hit-para.pl 4.6-2012-04-25					    April 2012							 CD-HIT-PARA.PL(1)

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

looking for solution to improve process replicate files to remote loc.

Discussion started by: mr_manny

2. Shell Programming and Scripting

bash script working for small size files but not for big size files.

Discussion started by: davidpreml

3. AIX

How to send big files over slow network?

Discussion started by: giribt

4. Shell Programming and Scripting

egrep is very slow : How to improve performance

Discussion started by: hidnana