Extracting 482/300k columns no's with respective info. listed in file2 from file1


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extracting 482/300k columns no's with respective info. listed in file2 from file1
# 1  
Old 11-27-2009
Extracting 482/300k columns no's with respective info. listed in file2 from file1

Hi,

I have 2 files

File 1:


Code:
1 2 3 4 5 6 .......etc until column 300K
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
.
.
etc until row 1411

File 2:

Code:
123
955
1045
1184
2323
2328
2333
2756
3364
4377
5259
5351
5778
7632
8603
9399
9561
10469
.
.
.
.
until row 482


The numbers in file 2 are the column headers for file 1.

I need to extract the information corresponding to the columns listed in file 2.


I tried this:

Code:
file="file2"
while read line
do
grep $line file1 >> file3
done < $file


and this:


Code:
grep -f file2 file1 > file3

and this:
Code:
grep -w -f file2 file1 > file3

and this:

Code:
#! /usr/bin/python

import sys

if len(sys.argv) != 3:
print "Usage: %s <input1> <input2>" % (sys.argv[0])
exit(1)
inputfile1=sys.argv[1]
inputfile2=sys.argv[2]
# store keys in list
keys=list()
for i in open(inputfile1):
keys.append(i.strip())
for i in open(inputfile2):
line=i.strip()
list=line.split("|")
if list[2] in keys or list[3] in keys:
print lineNone


NONE of the above code works. Can somebody help?
I asked for help at another thread, one person helped me half way, but when I posted a reply nobody answered. Two days have gone by since I posted my reply, so that is why I opened this new thread.

Thank you for any help!!!

Last edited by Franklin52; 11-27-2009 at 05:25 AM.. Reason: adding code tags
# 2  
Old 11-27-2009
Can you provide an example of your output based on the files contents you list above?

---------- Post updated at 08:07 AM ---------- Previous update was at 05:22 AM ----------

I used the following data for my testing.

data file (I added the line number and an underscore to each value to show that each value was coming from the correct line)
Code:
1_1 1_2 1_3 1_4 1_5 1_6 1_7 1_8 1_9 1_10 1_11 1_12 1_13 1_14 1_15 1_16 1_17 1_18 1_19 1_20 1_21 1_22 1_23 1_24 1_25
2_1 2_2 2_3 2_4 2_5 2_6 2_7 2_8 2_9 2_10 2_11 2_12 2_13 2_14 2_15 2_16 2_17 2_18 2_19 2_20 2_21 2_22 2_23 2_24 2_25
3_1 3_2 3_3 3_4 3_5 3_6 3_7 3_8 3_9 3_10 3_11 3_12 3_13 3_14 3_15 3_16 3_17 3_18 3_19 3_20 3_21 3_22 3_23 3_24 3_25
4_1 4_2 4_3 4_4 4_5 4_6 4_7 4_8 4_9 4_10 4_11 4_12 4_13 4_14 4_15 4_16 4_17 4_18 4_19 4_20 4_21 4_22 4_23 4_24 4_25
5_1 5_2 5_3 5_4 5_5 5_6 5_7 5_8 5_9 5_10 5_11 5_12 5_13 5_14 5_15 5_16 5_17 5_18 5_19 5_20 5_21 5_22 5_23 5_24 5_25
6_1 6_2 6_3 6_4 6_5 6_6 6_7 6_8 6_9 6_10 6_11 6_12 6_13 6_14 6_15 6_16 6_17 6_18 6_19 6_20 6_21 6_22 6_23 6_24 6_25
7_1 7_2 7_3 7_4 7_5 7_6 7_7 7_8 7_9 7_10 7_11 7_12 7_13 7_14 7_15 7_16 7_17 7_18 7_19 7_20 7_21 7_22 7_23 7_24 7_25
8_1 8_2 8_3 8_4 8_5 8_6 8_7 8_8 8_9 8_10 8_11 8_12 8_13 8_14 8_15 8_16 8_17 8_18 8_19 8_20 8_21 8_22 8_23 8_24 8_25
9_1 9_2 9_3 9_4 9_5 9_6 9_7 9_8 9_9 9_10 9_11 9_12 9_13 9_14 9_15 9_16 9_17 9_18 9_19 9_20 9_21 9_22 9_23 9_24 9_25



column file
Code:
cat column.lst
3
7
11
13
17
21

Here's the perl code I used.
Code:
#!/usr/bin/perl

use strict;

my @a_column;
my $outline;
my $line;
my $key;

open COLFILE, "<column.lst"
  or die "can't open file: $!";

while(<COLFILE>)
{
   chomp($_);
   push (@a_column, "$_");
}

close COLFILE
  or die "can't close file: $!";

print "@a_column\n";

open DATFILE, "<column.dat"
  or die "can't open file: $!";

while($line = <DATFILE>)
{
   undef $outline;
   chomp($line);
   foreach $key (@a_column)
   {
      if ( defined $outline )
      {
         $outline = $outline . " ";
      }
      $outline = $outline . (split / /, $line) [$key-1];
   }
   print "$outline\n";
}

close DATFILE
  or die "can't close file: $!";

I did some testing to see if I could reference all columns in a_column at once to build $outline with one split command, but was unable. Maybe someone else knows a way to do it.

Here's the output:
Code:
3 7 11 13 17 21
1_3 1_7 1_11 1_13 1_17 1_21
2_3 2_7 2_11 2_13 2_17 2_21
3_3 3_7 3_11 3_13 3_17 3_21
4_3 4_7 4_11 4_13 4_17 4_21
5_3 5_7 5_11 5_13 5_17 5_21
6_3 6_7 6_11 6_13 6_17 6_21
7_3 7_7 7_11 7_13 7_17 7_21
8_3 8_7 8_11 8_13 8_17 8_21
9_3 9_7 9_11 9_13 9_17 9_21

I don't know how this will perform with 300K columns.
# 3  
Old 11-27-2009
Another one:

Code:
perl -le'
    $[ = 1;
    open F1, "<", shift or die "$!\n";
    @cols = <F1>;
    warn "$!\n" unless close F1;

    $, = " ";
    open F2, "<", shift or die "$!\n";
    print +(split)[@cols] while <F2>;
    warn "$!\n" unless close F2;
  ' file2 file1

And yet another less efficient one:

Code:
perl -lane'
  push@cols,$_ and next if@F<2;
  $[=1if eof;print"@F[@cols]";
  ' file2 file1

With shell + AWK:

Code:
awk 'END { print r, "}\47 file1" }
  { r = r ? r ", $" $1 : "awk \47{ print $" $1 }
  ' file2 | sh


Last edited by radoulov; 11-27-2009 at 11:02 AM..
# 4  
Old 11-27-2009
jsmithstl;

Code:
#!/usr/bin/perl

use strict;

my @a_column;
my $outline;
my $line;
my $key;

open COLFILE, "<column.lst"
  or die "can't open file: $!";

while(<COLFILE>)
{
   chomp($_);
   push (@a_column, "$_");
}

close COLFILE
  or die "can't close file: $!";

print "@a_column\n";

open DATFILE, "<column.dat"
  or die "can't open file: $!";

while($line = <DATFILE>)
{
   undef $outline;
   chomp($line);
   foreach $key (@a_column)
   {
      if ( defined $outline )
      {
         $outline = $outline . " ";
      }
      $outline = $outline . (split / /, $line) [$key-1];
   }
   print "$outline\n";
}

close DATFILE

This code does not work. I tried it and I ran out of memory. My large file is 1.6 GB to start with.

---------- Post updated at 08:00 PM ---------- Previous update was at 07:55 PM ----------

radoulov;

the two perl codes don't return anything. The file2 is not modified at all. It seems the programs quit right away.

I tried also this code:

Code:
awk 'END { print r, "}\47 file1" }
  { r = r ? r ", $" $1 : "awk \47{ print $" $1 }
  ' file2 | sh

This gives me a weird output. All the column numbers listed in file 2 repeated thousands of times horizontally. The numerical information listed under the desired columns of file1 is lost. It fetches only the column numbers with no information.

---------- Post updated at 08:09 PM ---------- Previous update was at 08:00 PM ----------

File 1 is tab separated

Code:
1 2 3 4 5 6 .......etc until column 300K
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
1 23 21 24 12 22
.
.
etc until row 1411

The desired output, for example, would be (in this message tabs don't work)
Code:
123 955 .........etc all column numbers from file2 (482 columns)
12   22
21   23
24   12
33   11
21   41
34   32
12   22
12   11
12   31
12   22
12   22
.
.
.
.
.
etc until row 1411


Note to administrators: Thank you for formatting my original message. I'm also clicking on the # icon to this reply posts but it does not work. What is the issue here? Do other users have the same problem?

Last edited by Franklin52; 11-29-2009 at 10:00 AM.. Reason: Correcting code tags
# 5  
Old 11-28-2009
The perl code reads file2 (the column list) into an array and then process file1 (the large file) one row at a time. So it's only reading one line at a time into memory, not the entire 1.6 GB file. How much memory is available on your system?
# 6  
Old 11-28-2009
This much memory:

31284140

But, I tried running that code and gave me this: Out of memory!
# 7  
Old 11-28-2009
hmmmmm....

Change this line:

$outline = $outline . (split / /, $line) [$key-1];

to this:

$outline = $outline . (split ' ', $line) [$key-1];

and let me know what happens please.

that's single_quote space single_quote
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Mapping the values of ids of two columns of file1 from file2

I have of two space separated files: ==> File1 <== PT|np_496075.1 st|K92748.1 st|K89648.1 PT|np_001300561.1 PT|np_497284.1 st|K90752.1 st|K90279.1 PT|np_740775.1 PT|np_497749.1 st|K90752.1 st|K92038.1 PT|np_490856.1 PT|np_497284.1 st|K90752.1 st|K88095.1 PT|np_494764.1 ==> File 2 <==... (2 Replies)
Discussion started by: sammy777888
2 Replies

2. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

4. Shell Programming and Scripting

look for line from FILE1 at FILE2

Hi guys! I'm trying to write something to find each line of file1 into file2, if line is found return YES, if not found return NO. The result can be written to a new file. Can you please help me out? FILE1 INPUT: WATER CAR SNAKE (in reality this file has about 600 lines each with a... (2 Replies)
Discussion started by: demmel
2 Replies

5. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

Get values from different columns from file2 when match values of file1

Hi everyone, I have file1 and file2 comma separated both. file1 is: Header1,Header2,Header3,Header4,Header5,Header6,Header7,Header8,Header9,Header10 Code7,,,,,,,,, Code5,,,,,,,,, Code3,,,,,,,,, Code9,,,,,,,,, Code2,,,,,,,,,file2... (17 Replies)
Discussion started by: cgkmal
17 Replies

7. UNIX for Dummies Questions & Answers

Replace columns from File1 with columns from File2

Hi all, I would like to replace some columns from file1 with columns from file2. Currently, I'm able to do it with the following command: awk 'NR==FNR{a=$1;b=$2;c=$3;next;} {$2=a;$4=b;$5=c;print}' file2 file1 > temp mv -f temp file1 First, i make the changes and save it as a temp... (1 Reply)
Discussion started by: seijihiko
1 Replies

8. Shell Programming and Scripting

grep -f file1 file2

Wat does this command do? fileA is a subset of fileB..now, i need to find the lines in fileB that are not in fileA...i.e fileA - fileB. diff fileA fileB gives the ouput but the format looks no good.... I just need the contents alone not the line num etc. (7 Replies)
Discussion started by: vijay_0209
7 Replies

9. Shell Programming and Scripting

awk/sed search lines in file1 matching columns in file2

Hi All, as you can see I'm pretty new to this board. :D I'm struggling around with small script to search a few fields in another file. Basically I have file1 looking like this: 15:38:28 sz:10001 pr:14.16 15:38:28 sz:10002 pr:18.41 15:38:29 sz:10003 pr:19.28 15:38:30 sz:10004... (1 Reply)
Discussion started by: floripoint
1 Replies

10. Shell Programming and Scripting

extracting lines from a file1 which maches a pattern in file2

Hi guys, Can you help me in solving ths problem? I have two files file1 and file2 as following: ===FILE1==== >LOC21 MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP >LOC05... (11 Replies)
Discussion started by: smriti_shridhar
11 Replies
Login or Register to Ask a Question