The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Operating Systems > Linux > Ubuntu
.
google unix.com



Ubuntu Ubuntu is a complete desktop Linux operating system, freely available with both community and professional support.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
how to create file.txt and add current date in file content chenboly Shell Programming and Scripting 3 06-24-2009 05:25 AM
Read a file and search a value in another file create third file using AWK King Kalyan Shell Programming and Scripting 11 06-19-2009 12:05 AM
Help removing strings from one file that match any of the values in a second file. upstate_boy UNIX for Dummies Questions & Answers 1 05-05-2009 10:51 AM
match string in a file to file in a directory Jae Shell Programming and Scripting 5 01-19-2008 01:11 AM
Read words from file and create new file using K-shell. bsrajirs Shell Programming and Scripting 4 06-01-2007 12:15 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-29-2009
sogi sogi is offline
Registered User
  
 

Join Date: May 2009
Posts: 13
Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello,

I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file.

For example:

File 1 has 1411 rows, I ignore how many columns it has (thousands)
File 2 has 311 rows, 1 column

Would like to create

File 3 with 311 rows (thousands of columns)

What is the fastest way to do this without consuming too much memory?

Thank you!
  #2 (permalink)  
Old 06-30-2009
rakeshawasthi rakeshawasthi is offline
Registered User
  
 

Join Date: Aug 2004
Location: India
Posts: 379
Fastest way is syncsort but i dont know if you would have that....
then try grep. dont use awk.
  #3 (permalink)  
Old 06-30-2009
sogi sogi is offline
Registered User
  
 

Join Date: May 2009
Posts: 13
I used this:

grep -A1 -A1 -f file1.txt file2 > file3

but it is taking forever and I don't know if it is going to be correct at the end
I don't know what -A1 -A1 mean (I'm assuming that is col1 File1 col1 File2)

Help please!
  #4 (permalink)  
Old 06-30-2009
rakeshawasthi rakeshawasthi is offline
Registered User
  
 

Join Date: Aug 2004
Location: India
Posts: 379
give some sample input of both the files
and desired output, and
conditions how the two files will be joined.
PS:- Use code tags
  #5 (permalink)  
Old 06-30-2009
sogi sogi is offline
Registered User
  
 

Join Date: May 2009
Posts: 13
Both files have no headings

input of file 1 (has one 1 column, as shown below):

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364



input of file 2 (this file has 2,498,588 columns with single digit numbers, starting with column 1 as shown below, each column is separated by a space)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364
.
.
.
.
.
.
.
.
.
.
.
MXY9423 <--- row #1411


desired output file 3 (with only #364 rows with the ids matched between file1 and file2 and 2,498,588 columns)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364

Thank you for any help!

---------- Post updated at 11:10 PM ---------- Previous update was at 11:03 PM ----------

I just checked the results I obtained with grep -A1 -A1 -f file1.txt file2 > file3

and they are wrong. Instead of getting only 364 rows, I get 367 and some of the ids of file 1 are missing in the output file 3. I want to match the ids from file1 (my "golden" list) in file2 and output that in file 3
  #6 (permalink)  
Old 06-30-2009
sogi sogi is offline
Registered User
  
 

Join Date: May 2009
Posts: 13
filtering out data using grep

Both files have no headings

input of file1.txt (has one 1 column, as shown below):

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364



input of file2.ped (this file has more than 2 million columns with single digit numbers, starting with column 1 as shown below, each column is separated by a space)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364
.
.
.
.
.
.
.
.
.
.
.
MXY9423 <--- row #1411


desired output file 3 (with only #364 rows with the ids matched between file1 and file2 and 2,498,588 columns)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364

Thank you for any help!

---------- Post updated at 11:10 PM ---------- Previous update was at 11:03 PM ----------

I used grep -A1 -A1 -f file1.txt file2 > file3 but that did not work.

I only got one reply for this thread yesterday saying to use grep, so that's why I'm posting this again in hopes somebody would help.

Thank you!
  #7 (permalink)  
Old 06-30-2009
vidyadhar85's Avatar
vidyadhar85 vidyadhar85 is offline Forum Staff  
Moderator(The Tutor)
  
 

Join Date: Jun 2008
Location: INDIA
Posts: 1,390
If you want to grep the data from file2 which are present in file1
Code:
grep -f file1 file2 > file3
or
awk 'FILENAME=="file1"{A[$0]=$0}
FILENAME=="file2"{if(A[$1]==$1){print}}' file1 file2 > file3
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 05:45 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0