|
Search Forums:
|
|||||||
| Forums | Register | Forum Rules | Linux and Unix Links | Man Pages | Albums | FAQ | Users | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
How to extract a subset from a huge dataset
Hi, All I have a huge file which has 450G. Its tab-delimited format is as below Code:
x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is from 600000 to 30000000. I wrote the following perl script but it doesn't work: Code:
#!/usr/bin/perl
$file1 = $ARGV[0]; # Input file
$file2 = $ARGV[1]; # Output file
open (IN, $file1);
while ($line = <IN>)
{
chomp($line);
@array = split(/\t/,$line);
if ($array[0] eq 'x10')
{
if (($array[2] >= 600000) && ($array[2] <= 26279795))
{
open (OUT, ">>$file2");
print OUT "$line\n";
close OUT;
}
}
}
close IN;
exit;I guess the input file and output file are both too big that my script can't handle it. Anyone knows if there is any good way to do it? Perl or Shell scripts are preferred.. All your help will be appreciated! Last edited by Franklin52; 03-13-2010 at 12:47 PM.. Reason: Please indent your code and use code tags!! |
| Sponsored Links | |
|
|
|
#2
|
|||
|
|||
|
Code:
nawk -F"[\t]" '$1~/x10/ && $3>600000 && $3<30000000' FILE > SubFILE Last edited by EAGL€; 03-13-2010 at 11:33 AM.. Reason: didnt see it is tab delimeted format. |
| Sponsored Links | ||
|
|
|
#3
|
|||
|
|||
|
Hi,Eagle
Thanks for your reply. I just tried your command but it failed. It said -bash: nawk: command not found it seems like we don't have nawk in our server. Do you have other idea? can I just use awk? |
|
#4
|
|||
|
|||
|
Try awk instead or /usr/xpg4/bin/awk on Solaris: Code:
awk '$1=="x10" && $3>600000 && $3<30000000' FILE > SubFILE |
| Sponsored Links | ||
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Normalize a dataset with AWK | [raven] | Shell Programming and Scripting | 1 | 03-05-2009 11:49 AM |
| Numbers of records in SAS dataset | sasaliasim | Shell Programming and Scripting | 2 | 04-21-2008 04:55 PM |
| Total file size of a subset list | tekster757 | UNIX for Dummies Questions & Answers | 3 | 03-21-2008 12:27 PM |
| How to extract a piece of information from a huge file | Marcor | Shell Programming and Scripting | 2 | 03-13-2008 03:33 PM |
| How to extract data from a huge file? | srsahu75 | Shell Programming and Scripting | 5 | 01-18-2008 04:06 AM |
|
|