Two files; if cells match then copy over other columns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Two files; if cells match then copy over other columns
# 1  
Old 04-14-2011
Two files; if cells match then copy over other columns

My current issue is dealing with two space delimited files.
The first file has column 1 as the sample ID's, then columns 2 - n as the observations. The second file has column 1 as the sample ID's, column 2 as the mother ID's, column 3 as the father ID's, column 4 as the gender, and column 5 as the trait. I would like a script that reads the first column of both files, and for the cells that match, copy over columns 2 - 5 from file 2 into the same position in file 1.

File 1
3936 C C C C A C .....

File 2
3936 3451 3607 1 1
3937 3451 3607 1 1
3938 3451 3607 1 1
3939 3451 3607 1 1
3940 3451 3607 2 1
3941 3451 3607 2 1
3942 3451 3607 2 1
3943 3451 3607 2 1
3944 3451 3607 2 1

Final File

3936 3451 3607 1 1 C C C C A C .....

I have tried the awk command for each column, but it did not work the way I wanted nor was it efficient:

awk 'NR==FNR{A[$1]=$1}A[$3]{sub($3,A[$3]);print}' file2 file1 > new file

Thank you for any help
# 2  
Old 04-14-2011
Do you mean in order, or does it have to work when, say, 3936 aren't both the first line?

If they aren't necessarily in order like that, then:
Code:
awk 'BEGIN {
        # Slurp in the entire file, indexed by the key for comparison later.
        while(getline <"file2") ARR[$1]=$0
}

{
        # Match from file2
        if(ARR[$1])
        {
                # Print all the arguments from file2.  Since the keys are the same,
                # this is effectively the same as substituting after.
                printf("%s ",ARR[$1]);
                # Print everything that comes after.
                for(N=2; N<=NF; N++)
                        printf(" %s", $N);
                printf("\n");
        }
        # Otherwise, print the line unchanged
        else print;
}' < file1

# 3  
Old 04-14-2011
you are correct, there are more lines in file 2 than file 1, and they do not match in order. Every ID in file 1 is in file 2, but not vice-versa.

On a side note, how do you get that shell script to output a text file? I tried '> newfile' but it did not work.
# 4  
Old 04-15-2011
I don't know why that didn't work, it should have. Maybe you put the redirection inside the awk script instead of outside?

Maybe it's clearer to move the redirection in front for a long awk script like that:

Code:
<inputfile >outputfile awk '{
...

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. UNIX for Beginners Questions & Answers

Remove duplicates in a dataframe (table) keeping all the different cells of just one of the columns

Hello all, I need to filter a dataframe composed of several columns of data to remove the duplicates according to one of the columns. I did it with pandas. In the main time, I need that the last column that contains all different data ( not redundant) is conserved in the output like this: A ... (5 Replies)
Discussion started by: pedro88
5 Replies

3. UNIX for Dummies Questions & Answers

Match the columns between two files and output

Hi Help, I have two files namely a.txt and b.txt a.txt looks like a.txt 1 2 2 1 3 3 2 4 4 4 5 6 6 7 7 b.txt looks like, b.txt 1 2 1 1 3 2 2 4 3 3 4 4 4 5 5 (2 Replies)
Discussion started by: Indra2011
2 Replies

4. Shell Programming and Scripting

Match first two columns and average third from multiple files

I have the following format of input from multiple files File 1 24.01 -81.01 1.0 24.02 -81.02 5.0 24.03 -81.03 0.0 File 2 24.01 -81.01 2.0 24.02 -81.02 -5.0 24.03 -81.03 10.0 I need to scan through the files and when the first 2 columns match I... (18 Replies)
Discussion started by: ncwxpanther
18 Replies

5. Shell Programming and Scripting

Return first two columns if match found among two files

Hi, I have FileA with one column. File B with 15 columns separated by comma delimiter. I need to compare the FILEA value with all 15 columns of FILEB... if matches, need to return the 1st, 2nd column values of FILEB. How to achieve this through shell script? Thanks in advance. (5 Replies)
Discussion started by: vamsikrishna928
5 Replies

6. Shell Programming and Scripting

Match the columns between 2 files

I have two files I want to match ids in the 5th column of the file 1 with the first column of the file 2 and get the description for the matched ids as shown in the output sno nm no nm2 ID 1 cc 574372 yyyi |6810|51234| 2 bb 119721 nmjk |6810|51234|51179| ... (4 Replies)
Discussion started by: raj_k
4 Replies

7. Shell Programming and Scripting

Match two columns from two files and print output

Hello, I have two files which are of the following format File 1 which has two columns Protein_ID Substitution NP_997239 T53R NP_060668 V267M NP_058515 P856A NP_001206 T55M NP_006601 D371Y ... (2 Replies)
Discussion started by: nans
2 Replies

8. Shell Programming and Scripting

Match files based on either of the two columns awk

Dear Shell experts, I have 2 files with structure: File 1: ID and count head test_GI_count1.txt 1000094 2 10039307 1 10039641 1 10047177 11 10047359 1 1008555 2 10120302 1 10120672 13 10121776 1 10121865 32 And 2nd file: head Protein_gi_GeneID_symbol.txt protein_gi GeneID... (11 Replies)
Discussion started by: smitra
11 Replies

9. Shell Programming and Scripting

Match columns several files

Hey fellas! Here come my problem. I appreciate if you have a look at it. I have several files with following structure: file_1:1 21 4 45 file_2:2 31 4 153 6 341 and so on... and I have a 'reference' file look like this: File_ref:A 1 B 2 C 3 (5 Replies)
Discussion started by: @man
5 Replies

10. Shell Programming and Scripting

Match strings in two files and compare columns of both

Good Morning, I was wondering if anybody could tell me how to achieve the following, preferably with a little commenting for understanding. I have 2 files, each with multiple rows with multiple columns. I need to find each row where the value in column 1 of file 1 matches column 1... (10 Replies)
Discussion started by: GarciasMuffin
10 Replies
Login or Register to Ask a Question