Extracting 482/300k columns no's with respective info. listed in file2 from file1


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extracting 482/300k columns no's with respective info. listed in file2 from file1
# 8  
Old 11-28-2009
I had made a mistake.

Now your code is running, BUT the output is just the column numbers for file2 repeated thousands of times (probably 1411 times) It does not fetch the information underneath the matched numbers from file 2.

The FS for my file 1 is a tab between columns (I'm considering the columns to be, no the actual columns but the headers that I have given the file from 0 to 300K). For example:

column 0 in file 1:

1
2
1
2
1
1
1
1
2
1
2
.
.
.
.
etc until row 1411

column 1-300K in file1:

22
21
23
41
23
32
44
21
11
11
22

All these columns as I wrote above are tab separated. Some of the lasts columns has 6 figure numbers as column headers but the spacing is still one tab.
# 9  
Old 11-28-2009
The / / below tells it to split on spaces. Change it to ' ' and it will split on any whitespace.

Code:
Change this line:

$outline = $outline . (split / /, $line) [$key-1];

to this:

$outline = $outline . (split ' ', $line) [$key-1];

and let me know what happens please.

that's single_quote space single_quote

# 10  
Old 11-28-2009
Quote:
radoulov;

the two perl codes don't return anything
[...]
This is what I tried, based on your original post:

Code:
% head file[12]
==> file1 <==
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22

==> file2 <==
2
3
6

And I get the following output:

Code:
% perl -le'
    $[ = 1;
    open F1, "<", shift or die "$!\n";
    @cols = <F1>;
    warn "$!\n" unless close F1;

    $, = " ";
    open F2, "<", shift or die "$!\n";
    print +(split)[@cols] while <F2>;
    warn "$!\n" unless close F2;
  ' file2 file1
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22

Code:
% perl -lane'
  push@cols,$_ and next if@F<2;
  $[=1if eof;print"@F[@cols]";
  ' file2 file1
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22

Code:
% awk 'END { print r, "}\47 file1" }
  { r = r ? r ", $" $1 : "awk \47{ print $" $1 }
  ' file2 | sh
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22

Could you please explain what should be the output given the input files I tried?
# 11  
Old 11-28-2009
jsmithstl;

Thank you for all your help. I tried your code, it seems to work. It has been running for more than 10 hours already. However, the columns are not preserved like this.

23 tab separation 12
21 21
22 31
21 21
22 32
21 31
42 21
31 32
22 21
12 22
# 12  
Old 11-28-2009
I'm glad it's working, but 10 hours is terrible. Have you considered putting this data in a database? How many rows are in the large file?
# 13  
Old 11-28-2009
Yes, I have considered that. I'm trying to figure out how to upload large files into sql. I was desperate that is why I tried your code and the other codes suggested here. But, yes the performance it is not good with perl. It has been 16 hours to be exact.

---------- Post updated at 04:09 PM ---------- Previous update was at 04:05 PM ----------

jsmithstl;

There are 1411 rows and more than 300 thousand columns. Your code hopefully will reduce it to 1411 rows and 482 columns.

Thanks again for all your help and time!
# 14  
Old 11-29-2009
I couldn't stand it. I had to improve the performance. I created a file containing 482 columns.

300
800
1300
...
...
239800
240300
240800

a datafile with 20 lines and 300,000 columns for each line.

1_1 ... 1_300000
2_1 ... 2_300000
...
...
19_1 ... 19_300000
20_1 ... 20_300000

modified the perl script to:
Code:
#!/usr/bin/perl

use strict;

my @a_column;
my @a_outcol;
my $outline;
my $line;
my $date_stamp;

$date_stamp = localtime time;
print "START:  $date_stamp\n";

open COLFILE, "<column.lst"
  or die "can't open file: $!";

while(<COLFILE>)
{
   chomp($_);
   push (@a_column, ($_ - 1));
   push (@a_outcol, "$_");
}

close COLFILE
  or die "can't close file: $!";

open DATFILE, "<column.dat"
  or die "can't open file: $!";

open OUTFILE, ">column.out"
  or die "can't open file: $!";

$outline = join("\t", @a_outcol);
print OUTFILE "$outline\n";

undef @a_outcol;

while($line = <DATFILE>)
{
   chomp($line);
   $outline = join( "\t", (split ' ', $line) [@a_column]);
   print OUTFILE "$outline\n";
}

close DATFILE
  or die "can't close file: $!";

close OUTFILE
  or die "can't close file: $!";

$date_stamp = localtime time;
print "END:  $date_stamp\n";

now when it runs....

./cf.pl
START: Sun Nov 29 05:56:00 2009
END: Sun Nov 29 05:56:03 2009

in comparison to the old code which took almost a minute just to process and write one line:
./column.pl
START: Sun Nov 29 06:22:07 2009
LINE_1: Sun Nov 29 06:23:02 2009

and the output is tab delimted:

Code:
300     800     1300    1800...239800   240300  240800
1_300   1_800   1_1300  1_1800...1_239800       1_240300        1_240800
2_300   2_800   2_1300  2_1800...2_239800       2_240300        2_240800
3_300   3_800   3_1300  3_1800...3_239800       3_240300        3_240800
...
...
18_300  18_800  18_1300 18_1800...18_239800     18_240300       18_240800
19_300  19_800  19_1300 19_1800...19_239800     19_240300       19_240800
20_300  20_800  20_1300 20_1800...20_239800     20_240300       20_240800

This should perform way better than the original.

Last edited by jsmithstl; 11-29-2009 at 09:25 AM.. Reason: added time comparisons
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Mapping the values of ids of two columns of file1 from file2

I have of two space separated files: ==> File1 <== PT|np_496075.1 st|K92748.1 st|K89648.1 PT|np_001300561.1 PT|np_497284.1 st|K90752.1 st|K90279.1 PT|np_740775.1 PT|np_497749.1 st|K90752.1 st|K92038.1 PT|np_490856.1 PT|np_497284.1 st|K90752.1 st|K88095.1 PT|np_494764.1 ==> File 2 <==... (2 Replies)
Discussion started by: sammy777888
2 Replies

2. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

4. Shell Programming and Scripting

look for line from FILE1 at FILE2

Hi guys! I'm trying to write something to find each line of file1 into file2, if line is found return YES, if not found return NO. The result can be written to a new file. Can you please help me out? FILE1 INPUT: WATER CAR SNAKE (in reality this file has about 600 lines each with a... (2 Replies)
Discussion started by: demmel
2 Replies

5. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

Get values from different columns from file2 when match values of file1

Hi everyone, I have file1 and file2 comma separated both. file1 is: Header1,Header2,Header3,Header4,Header5,Header6,Header7,Header8,Header9,Header10 Code7,,,,,,,,, Code5,,,,,,,,, Code3,,,,,,,,, Code9,,,,,,,,, Code2,,,,,,,,,file2... (17 Replies)
Discussion started by: cgkmal
17 Replies

7. UNIX for Dummies Questions & Answers

Replace columns from File1 with columns from File2

Hi all, I would like to replace some columns from file1 with columns from file2. Currently, I'm able to do it with the following command: awk 'NR==FNR{a=$1;b=$2;c=$3;next;} {$2=a;$4=b;$5=c;print}' file2 file1 > temp mv -f temp file1 First, i make the changes and save it as a temp... (1 Reply)
Discussion started by: seijihiko
1 Replies

8. Shell Programming and Scripting

grep -f file1 file2

Wat does this command do? fileA is a subset of fileB..now, i need to find the lines in fileB that are not in fileA...i.e fileA - fileB. diff fileA fileB gives the ouput but the format looks no good.... I just need the contents alone not the line num etc. (7 Replies)
Discussion started by: vijay_0209
7 Replies

9. Shell Programming and Scripting

awk/sed search lines in file1 matching columns in file2

Hi All, as you can see I'm pretty new to this board. :D I'm struggling around with small script to search a few fields in another file. Basically I have file1 looking like this: 15:38:28 sz:10001 pr:14.16 15:38:28 sz:10002 pr:18.41 15:38:29 sz:10003 pr:19.28 15:38:30 sz:10004... (1 Reply)
Discussion started by: floripoint
1 Replies

10. Shell Programming and Scripting

extracting lines from a file1 which maches a pattern in file2

Hi guys, Can you help me in solving ths problem? I have two files file1 and file2 as following: ===FILE1==== >LOC21 MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP >LOC05... (11 Replies)
Discussion started by: smriti_shridhar
11 Replies
Login or Register to Ask a Question