assuming the file is a text file, use split to make smaller files, depending upon the average line length, about 25000, 40 character lines per megabyte.
Changed the line that was list=`ls` to list=`ls x*` so that this script is not part of the emails
Last edited by jgt; 10-29-2011 at 03:36 PM..
Reason: typo in code
Hi,
I need to split a large file into small files based on a string.
At different palces in the large I have the string ^Job.
I need to split the file into different files starting from ^Job to the last character before the next ^Job.
Also all the small files should be automatically named.... (4 Replies)
Hi
I want to split a file that has 'n' number of records into 16 small files.
Can some one suggest me how to do this using Unix script?
Thanks
rrkk (10 Replies)
Dear All,
Could you please help me to split a file contain around 240,000,000 line to 4 files all equally likely , note that we need to maintain that the end of each file should started by start flage (MSISDN) and ended by end flag (End), also the number of the line between the... (10 Replies)
Hi,
I have an input file like:
111
abcdefgh
asdfghjk
dfghjkl
222
aaaaaaa
bbbbbb
333
djfhfgjktitjhgfkg
444
djdhfjkhfjkghjkfg
hsbfjksdbhjkgherjklg
fjkhfjklsahjgh
fkrjkgnj
I want to read this input file and make separate output files with the header as numric value like "111"... (9 Replies)
Hi
extending to one of my previous posted query ....
I am using
nawk -v invar1="$aa" '{print > ("ABS\_"((/\|/)?"A\_":"B\_")invar1"\_NETWORKID.txt")}' spfile.txt
to get 2 different files based on split condition i.e. "|"
Similar to invar1 variable in nawk I also need one more variable... (18 Replies)
Dear shell experts,
I would like to spilt a txt file into small ones. However, I did not know how to program use shell. If someone could help, it is greatly appreciated!
Specifically, I supposed there is file named A.txt. The content of the file likes this:
Subject run condtion ACC time... (3 Replies)
Hello,
I have one file which is in size around 20 MB , wanted to split up into four files of each size of 5 MB.
ABCD_XYZ_20130302223203.xml.
Requirement is that to write script which should do as : first three file should be of size 5 MB each, the fourth one content should be in the last... (8 Replies)
Dear all,
I have huge txt file with the input files for some setup_code. However for running my setup_code, I require txt files with maximum of 1000 input files
Please help me in suggesting way to break down this big txt file to small txt file of 1000 entries only.
thanks and Greetings,
Emily (12 Replies)
Hello Shell Guru's
I have a requirement to split the source xml file into three different text file.
And i need your valuable suggestion to finish this.
Here is my source xml snippet, here i am using only one entry of <jms-system-resource>. There may be multiple entries in the source file.
... (5 Replies)
I Have a large file with 24hrs log in the below format.i need to split the large file in to 24 small files on one hour based.i.e ex:from 09:55 to 10:55,10:55-11:55
can any one help me on this.!
... (20 Replies)
Discussion started by: Raghuram717
20 Replies
LEARN ABOUT DEBIAN
cd-hit-2d-para
CD-HIT-2D-PARA.PL(1) User Commands CD-HIT-2D-PARA.PL(1)NAME
cd-hit-2d-para.pl - divide a big clustering job into pieces to run cd-hit-2d or cd-hit-est-2d jobs
SYNOPSIS
cd-hit-2d-para.pl options
DESCRIPTION
This script divide a big clustering job into pieces and submit jobs to remote computers over a network to make it parallel. After
all the jobs finished, the script merge the clustering results as if you just run a single cd-hit-2d or cd-hit-est-2d.
You can also use it to divide big jobs on a single computer if your computer does not have enough RAM (with -L option).
Requirements:
1 When run this script over a network, the directory where you
run the scripts and the input files must be available on all the remote hosts with identical path.
2 If you choose "ssh" to submit jobs, you have to have
passwordless ssh to any remote host, see ssh manual to know how to set up passwordless ssh.
3 I suggest to use queuing system instead of ssh,
I currently support PBS and SGE
4 cd-hit-2d cd-hit-est-2d cd-hit-div cd-hit-div.pl must be
in same directory where this script is in.
Options
-i input filename for 1st db in fasta format, required
-i2 input filename for 2nd db in fasta format, required
-o output filename, required
--P program, "cd-hit-2d" or "cd-hit-est-2d", default "cd-hit-2d"
--B filename of list of hosts, requred unless -Q or -L option is supplied
--L number of cpus on local computer, default 0 when you are not running it over a cluster, you can use this option to divide a big
clustering jobs into small pieces, I suggest you just use "--L 1" unless you have enough RAM for each cpu
--S Number of segments to split 1st db into, default 2
--S2 Number of segments to split 2nd db into, default 8
--Q number of jobs to submit to queue queuing system, default 0 by default, the program use ssh mode to submit remote jobs
--T type of queuing system, "PBS", "SGE" are supported, default PBS
--R restart file, used after a crash of run
-h print this help
More cd-hit-2d/cd-hit-est-2d options can be speicified in command line
Questions, bugs, contact Weizhong Li at liwz@sdsc.edu
cd-hit-2d-para.pl 4.6-2012-04-25 April 2012 CD-HIT-2D-PARA.PL(1)