Match col 1 of File 1 with col 1 File 2 and create a 3rd file


 
Thread Tools Search this Thread
Operating Systems Linux Ubuntu Match col 1 of File 1 with col 1 File 2 and create a 3rd file
# 1  
Old 06-30-2009
Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello,

I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file.

For example:

File 1 has 1411 rows, I ignore how many columns it has (thousands)
File 2 has 311 rows, 1 column

Would like to create

File 3 with 311 rows (thousands of columns)

What is the fastest way to do this without consuming too much memory?

Thank you!
# 2  
Old 06-30-2009
Fastest way is syncsort but i dont know if you would have that....
then try grep. dont use awk.
# 3  
Old 06-30-2009
I used this:

grep -A1 -A1 -f file1.txt file2 > file3

but it is taking forever and I don't know if it is going to be correct at the end
I don't know what -A1 -A1 mean (I'm assuming that is col1 File1 col1 File2)

Help please!
# 4  
Old 06-30-2009
give some sample input of both the files
and desired output, and
conditions how the two files will be joined.
PS:- Use code tags
# 5  
Old 06-30-2009
Both files have no headings

input of file 1 (has one 1 column, as shown below):

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364



input of file 2 (this file has 2,498,588 columns with single digit numbers, starting with column 1 as shown below, each column is separated by a space)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364
.
.
.
.
.
.
.
.
.
.
.
MXY9423 <--- row #1411


desired output file 3 (with only #364 rows with the ids matched between file1 and file2 and 2,498,588 columns)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364

Thank you for any help!

---------- Post updated at 11:10 PM ---------- Previous update was at 11:03 PM ----------

I just checked the results I obtained with grep -A1 -A1 -f file1.txt file2 > file3

and they are wrong. Instead of getting only 364 rows, I get 367 and some of the ids of file 1 are missing in the output file 3. I want to match the ids from file1 (my "golden" list) in file2 and output that in file 3
# 6  
Old 06-30-2009
filtering out data using grep

Both files have no headings

input of file1.txt (has one 1 column, as shown below):

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364



input of file2.ped (this file has more than 2 million columns with single digit numbers, starting with column 1 as shown below, each column is separated by a space)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364
.
.
.
.
.
.
.
.
.
.
.
MXY9423 <--- row #1411


desired output file 3 (with only #364 rows with the ids matched between file1 and file2 and 2,498,588 columns)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364

Thank you for any help!

---------- Post updated at 11:10 PM ---------- Previous update was at 11:03 PM ----------

I used grep -A1 -A1 -f file1.txt file2 > file3 but that did not work.

I only got one reply for this thread yesterday saying to use grep, so that's why I'm posting this again in hopes somebody would help.

Thank you!
# 7  
Old 06-30-2009
If you want to grep the data from file2 which are present in file1
Code:
grep -f file1 file2 > file3
or
awk 'FILENAME=="file1"{A[$0]=$0}
FILENAME=="file2"{if(A[$1]==$1){print}}' file1 file2 > file3

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace col 23 - 26 with new value, non delimited file

hello, i have a undelimited file which contains 229 byte records. i want to change column 23 - 26 with a new value and also change the sign of the data in colulmn 30 - 70. i've tried SED for the first change, but nothing happens: sed 's/\(^.\{22\}\).\{4\}\(.*\)/\0603\2/' inputfile heres an... (8 Replies)
Discussion started by: blt123
8 Replies

2. Shell Programming and Scripting

Modifying col values based on another col

Hi, Please help with this. I have several excel files (with and .xlsx format) with 10-15 columns each. They all have the same type of data but the columns are not ordered in the same way. Here is a 3 column example. What I want to do add the alphabet from column 2 to column 3, provided... (9 Replies)
Discussion started by: newbie83
9 Replies

3. Shell Programming and Scripting

Run a program-print parameters to output file-replace op file contents with max 4th col

Hi Friends, This is the only solution to my task. So, any help is highly appreciated. I have a file cat input1.bed chr1 100 200 abc chr1 120 300 def chr1 145 226 ghi chr2 567 600 unix Now, I have another file by name input2.bed (This file is a binary file not readable by the... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

4. Shell Programming and Scripting

Printing from col x to end of line, except last col

Hello, I have some tab delimited data and I need to move the last col. I could hard code it, awk '{ print $1,$NF,$2,$3,$4,etc }' infile > outfile but it would be nice to know the syntax to print a range cols. I know in cut you can do, cut -f 1,4-8,11- to print fields 1,... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

5. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

6. Shell Programming and Scripting

how to add new col in a file

Hi, Experts, I have a requirement as following: my source file: a a a b b c c c c I need add one more colume as following: 1 a 2 a 3 a 1 b 2 b 1 c 2 c (4 Replies)
Discussion started by: ken002
4 Replies

7. Shell Programming and Scripting

Get columns from another file for match in col 2 in 1st file

Hi, My first file has 592155 9 rs16916098 1 592156 19 rs7249604 1 592157 4 rs885156 1 592158 5 rs350067 12nd file has 9 rs16916098 0 113228129 2 4 19 rs7249604 0 58709070 4 2 2 rs17042833 0 113558750 4 2... (2 Replies)
Discussion started by: genehunter
2 Replies

8. Shell Programming and Scripting

Compare - 1st col of file

Hi, I have two different files, one has two columns and other has only one column. I would like to compare the first column in the first file with the data in the second file and write a third file with the data that is not present is not common to them. First file:... (26 Replies)
Discussion started by: swame_sp
26 Replies

9. Shell Programming and Scripting

sort and split file by 2 cols (1 col after the other)

Dear All, I am a newbie to shell scripting so this one is really over my head. I have a text file with five fields as below: 76576.867188 6232.454102 2.008904 55.000000 3 76576.867188 6232.454102 3.607231 55.000000 4 76576.867188 6232.454102 1.555146 65.000000 3 76576.867188 6232.454102... (19 Replies)
Discussion started by: Ghetz
19 Replies

10. Shell Programming and Scripting

sum(col) finding from a file

Hi I have an file which looks like country address phone amount sweden |address |phone | 10 | Singapo |address |phone | 20 | Italy-N |address |phone | 30 | denmar |address |phone | 40 | Here i need to do the sum(amount), how to do this in shell scripting Thanks Babu (11 Replies)
Discussion started by: ksmbabu
11 Replies
Login or Register to Ask a Question