Extracting 482/300k columns no's with respective info. listed in file2 from file1

11-28-2009

Registered User

21, 0

Join Date: May 2009

Last Activity: 29 November 2009, 8:29 PM EST

Posts: 21

Thanks Given: 0

Thanked 0 Times in 0 Posts

I had made a mistake.

Now your code is running, BUT the output is just the column numbers for file2 repeated thousands of times (probably 1411 times) It does not fetch the information underneath the matched numbers from file 2.

The FS for my file 1 is a tab between columns (I'm considering the columns to be, no the actual columns but the headers that I have given the file from 0 to 300K). For example:

column 0 in file 1:

1
2
1
2
1
1
1
1
2
1
2
.
.
.
.
etc until row 1411

column 1-300K in file1:

22
21
23
41
23
32
44
21
11
11
22

All these columns as I wrote above are tab separated. Some of the lasts columns has 6 figure numbers as column headers but the spacing is still one tab.

sogi

View Public Profile for sogi

Find all posts by sogi

11-28-2009

Registered User

115, 3

Join Date: Oct 2009

Last Activity: 27 October 2017, 11:47 PM EDT

Location: St. Louis, MO

Posts: 115

Thanks Given: 0

Thanked 3 Times in 3 Posts

The / / below tells it to split on spaces. Change it to ' ' and it will split on any whitespace.

Code:

Change this line:

$outline = $outline . (split / /, $line) [$key-1];

to this:

$outline = $outline . (split ' ', $line) [$key-1];

and let me know what happens please.

that's single_quote space single_quote

jsmithstl

View Public Profile for jsmithstl

Find all posts by jsmithstl

11-28-2009

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Quote:

radoulov;

the two perl codes don't return anything
[...]

This is what I tried, based on your original post:

Code:

% head file[12]
==> file1 <==
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22

==> file2 <==
2
3
6

And I get the following output:

Code:

% perl -le'
    $[ = 1;
    open F1, "<", shift or die "$!\n";
    @cols = <F1>;
    warn "$!\n" unless close F1;

    $, = " ";
    open F2, "<", shift or die "$!\n";
    print +(split)[@cols] while <F2>;
    warn "$!\n" unless close F2;
  ' file2 file1
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22

Code:

% perl -lane'
  push@cols,$_ and next if@F<2;
  $[=1if eof;print"@F[@cols]";
  ' file2 file1
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22

Code:

% awk 'END { print r, "}\47 file1" }
  { r = r ? r ", $" $1 : "awk \47{ print $" $1 }
  ' file2 | sh
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22
23 21 22

Could you please explain what should be the output given the input files I tried?

radoulov

View Public Profile for radoulov

Find all posts by radoulov

11-28-2009

Registered User

21, 0

Join Date: May 2009

Last Activity: 29 November 2009, 8:29 PM EST

Posts: 21

Thanks Given: 0

Thanked 0 Times in 0 Posts

jsmithstl;

Thank you for all your help. I tried your code, it seems to work. It has been running for more than 10 hours already. However, the columns are not preserved like this.

23 tab separation 12
21 21
22 31
21 21
22 32
21 31
42 21
31 32
22 21
12 22

sogi

View Public Profile for sogi

Find all posts by sogi

11-28-2009

Registered User

115, 3

Join Date: Oct 2009

Last Activity: 27 October 2017, 11:47 PM EDT

Location: St. Louis, MO

Posts: 115

Thanks Given: 0

Thanked 3 Times in 3 Posts

I'm glad it's working, but 10 hours is terrible. Have you considered putting this data in a database? How many rows are in the large file?

jsmithstl

View Public Profile for jsmithstl

Find all posts by jsmithstl

11-28-2009

Registered User

21, 0

Join Date: May 2009

Last Activity: 29 November 2009, 8:29 PM EST

Posts: 21

Thanks Given: 0

Thanked 0 Times in 0 Posts

Yes, I have considered that. I'm trying to figure out how to upload large files into sql. I was desperate that is why I tried your code and the other codes suggested here. But, yes the performance it is not good with perl. It has been 16 hours to be exact.

---------- Post updated at 04:09 PM ---------- Previous update was at 04:05 PM ----------

jsmithstl;

There are 1411 rows and more than 300 thousand columns. Your code hopefully will reduce it to 1411 rows and 482 columns.

Thanks again for all your help and time!

sogi

View Public Profile for sogi

Find all posts by sogi

11-29-2009

Registered User

115, 3

Join Date: Oct 2009

Last Activity: 27 October 2017, 11:47 PM EDT

Location: St. Louis, MO

Posts: 115

Thanks Given: 0

Thanked 3 Times in 3 Posts

I couldn't stand it. I had to improve the performance. I created a file containing 482 columns.

300
800
1300
...
...
239800
240300
240800

a datafile with 20 lines and 300,000 columns for each line.

1_1 ... 1_300000
2_1 ... 2_300000
...
...
19_1 ... 19_300000
20_1 ... 20_300000

modified the perl script to:

Code:

#!/usr/bin/perl

use strict;

my @a_column;
my @a_outcol;
my $outline;
my $line;
my $date_stamp;

$date_stamp = localtime time;
print "START:  $date_stamp\n";

open COLFILE, "<column.lst"
  or die "can't open file: $!";

while(<COLFILE>)
{
   chomp($_);
   push (@a_column, ($_ - 1));
   push (@a_outcol, "$_");
}

close COLFILE
  or die "can't close file: $!";

open DATFILE, "<column.dat"
  or die "can't open file: $!";

open OUTFILE, ">column.out"
  or die "can't open file: $!";

$outline = join("\t", @a_outcol);
print OUTFILE "$outline\n";

undef @a_outcol;

while($line = <DATFILE>)
{
   chomp($line);
   $outline = join( "\t", (split ' ', $line) [@a_column]);
   print OUTFILE "$outline\n";
}

close DATFILE
  or die "can't close file: $!";

close OUTFILE
  or die "can't close file: $!";

$date_stamp = localtime time;
print "END:  $date_stamp\n";

now when it runs....

./cf.pl
START: Sun Nov 29 05:56:00 2009
END: Sun Nov 29 05:56:03 2009

in comparison to the old code which took almost a minute just to process and write one line:
./column.pl
START: Sun Nov 29 06:22:07 2009
LINE_1: Sun Nov 29 06:23:02 2009

and the output is tab delimted:

Code:

300     800     1300    1800...239800   240300  240800
1_300   1_800   1_1300  1_1800...1_239800       1_240300        1_240800
2_300   2_800   2_1300  2_1800...2_239800       2_240300        2_240800
3_300   3_800   3_1300  3_1800...3_239800       3_240300        3_240800
...
...
18_300  18_800  18_1300 18_1800...18_239800     18_240300       18_240800
19_300  19_800  19_1300 19_1800...19_239800     19_240300       19_240800
20_300  20_800  20_1300 20_1800...20_239800     20_240300       20_240800

This should perform way better than the original.

Last edited by jsmithstl; 11-29-2009 at 09:25 AM.. Reason: added time comparisons

jsmithstl

View Public Profile for jsmithstl

Find all posts by jsmithstl

UNIX for Dummies Questions & Answers

Extracting 482/300k columns no's with respective info. listed in file2 from file1

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Mapping the values of ids of two columns of file1 from file2

Discussion started by: sammy777888

2. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

Discussion started by: cmccabe

3. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

Discussion started by: pathunkathunk

4. Shell Programming and Scripting

look for line from FILE1 at FILE2

Discussion started by: demmel

5. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

Discussion started by: pathunkathunk

6. Shell Programming and Scripting

Get values from different columns from file2 when match values of file1

Discussion started by: cgkmal

7. UNIX for Dummies Questions & Answers

Replace columns from File1 with columns from File2

Discussion started by: seijihiko

8. Shell Programming and Scripting

grep -f file1 file2

Discussion started by: vijay_0209

9. Shell Programming and Scripting

awk/sed search lines in file1 matching columns in file2

Discussion started by: floripoint

10. Shell Programming and Scripting

extracting lines from a file1 which maches a pattern in file2

Discussion started by: smriti_shridhar