Sponsored Content
Full Discussion: big file processeing
Top Forums Shell Programming and Scripting big file processeing Post 35437 by WIntellect on Saturday 12th of April 2003 09:48:00 AM
Old 04-12-2003
One way to do it in Perl

I'm more of a Perl man, so here it is in perl:
Code:
#!/usr/bin/perl -w

while ($c = <>) {
    #Process $c here!!!
}

exit 0;

put the above code in a file called something like lineExtract.pl, make it executable, then you can do the following to use it:

./lineExtract.pl <big_file_to_process

$c will contain the information one line at a time!
 

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks (3 Replies)
Discussion started by: chenhao_no1
3 Replies

2. Shell Programming and Scripting

bath processeing script

am currently working on a batch processing script and i am stuck I am not very familiar with the korn shell I need to do the following: Process an input file with the following information: SOURCE FILE 533650_MSCIEUROPE_AvgWeight_YTD_EXP.XLS/Daily/test/Ceurope/EuropeFactset/YTD/... (1 Reply)
Discussion started by: chambala5
1 Replies

3. Solaris

wtmpx file is too big

Hi, I am using Sun Solaris 5.9 OS. I have found a file called wtmpx having a size of 5.0 GB. I want to clear this file using :>/var/adm/wtmpx. My query is, would it cause any problem to the running live system. Could anyone suggest the best method to clear the file without causing problem to... (6 Replies)
Discussion started by: Vijayakumarpc
6 Replies

4. Shell Programming and Scripting

Inserting a column from one file into another big file

Hi I have two files, one is 1.6 GB. I would like to add one extra column of information to the large file at a specific location (after its 2nd column). For example: File 1 has two columns more than 1000 rows like this MM009987 1 File 2 looks like this MM00098 MM00076 3 4 2 4 2... (1 Reply)
Discussion started by: sogi
1 Replies

5. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems? (2 Replies)
Discussion started by: NeedLotsofHelp
2 Replies

6. Shell Programming and Scripting

parsing data from a big file using keys from another smaller file

Hi, I have 2 files format of file 1 is: a1 b2 a2 c2 d1 f3 format of file 2 is (tab delimited): a1 1.2 0.5 0.06 0.7 0.9 1 0.023 a3 0.91 0.007 0.12 0.34 0.45 1 0.7 a2 1.05 2.3 0.25 1 0.9 0.3 0.091 b1 1 5.4 0.3 9.2 0.3 0.2 0.1 b2 3 5 7 0.9 1 9 0 1 b3 0.001 1 2.3 4.6 8.9 10 0 1 0... (10 Replies)
Discussion started by: Lucky Ali
10 Replies

7. Emergency UNIX and Linux Support

Getting VALUE from Big XML File -- That's All

We got data that was supposed to be CSV, but was sent in a huge XML file. I've downloaded xmlstarlet, but I'm darned if I can get it to operate the "sel" feature to look down a path and get any sort of value. I see pieces of what should be paths, but they seem to have extraneous characters, and... (7 Replies)
Discussion started by: gmark99
7 Replies

8. UNIX for Beginners Questions & Answers

How to convert CR to LF in a big file?

Hello Friends, I have a big file that is transferred to my UNIX system and it seems it has CR as the line delimiter When I run file <filename> <filename>: ASCII text, with CR line terminators How do I convert the file to one with LF as terminators so that my code that runs on UNIX can... (3 Replies)
Discussion started by: mehimadri12
3 Replies
CD-HIT-2D-PARA.PL(1)						   User Commands					      CD-HIT-2D-PARA.PL(1)

NAME
cd-hit-2d-para.pl - divide a big clustering job into pieces to run cd-hit-2d or cd-hit-est-2d jobs SYNOPSIS
cd-hit-2d-para.pl options DESCRIPTION
This script divide a big clustering job into pieces and submit jobs to remote computers over a network to make it parallel. After all the jobs finished, the script merge the clustering results as if you just run a single cd-hit-2d or cd-hit-est-2d. You can also use it to divide big jobs on a single computer if your computer does not have enough RAM (with -L option). Requirements: 1 When run this script over a network, the directory where you run the scripts and the input files must be available on all the remote hosts with identical path. 2 If you choose "ssh" to submit jobs, you have to have passwordless ssh to any remote host, see ssh manual to know how to set up passwordless ssh. 3 I suggest to use queuing system instead of ssh, I currently support PBS and SGE 4 cd-hit-2d cd-hit-est-2d cd-hit-div cd-hit-div.pl must be in same directory where this script is in. Options -i input filename for 1st db in fasta format, required -i2 input filename for 2nd db in fasta format, required -o output filename, required --P program, "cd-hit-2d" or "cd-hit-est-2d", default "cd-hit-2d" --B filename of list of hosts, requred unless -Q or -L option is supplied --L number of cpus on local computer, default 0 when you are not running it over a cluster, you can use this option to divide a big clustering jobs into small pieces, I suggest you just use "--L 1" unless you have enough RAM for each cpu --S Number of segments to split 1st db into, default 2 --S2 Number of segments to split 2nd db into, default 8 --Q number of jobs to submit to queue queuing system, default 0 by default, the program use ssh mode to submit remote jobs --T type of queuing system, "PBS", "SGE" are supported, default PBS --R restart file, used after a crash of run -h print this help More cd-hit-2d/cd-hit-est-2d options can be speicified in command line Questions, bugs, contact Weizhong Li at liwz@sdsc.edu cd-hit-2d-para.pl 4.6-2012-04-25 April 2012 CD-HIT-2D-PARA.PL(1)
All times are GMT -4. The time now is 05:09 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy