Match on first and last columns and merge


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match on first and last columns and merge
# 1  
Old 02-10-2012
AWK-Match on first and last columns and merge

Hi Friends,

I have an input file of this kind

input.txt

Quote:
chr1 1 2 abc
chr1 3 4 abc
chr1 4 7 abc
chr1 2 4 abc
chr1 5 6 abc
chr2 10 19 abc
chr2 2 4 abc
chr2 3 6 abc
chr2 4 5 abc
chr2 1 3 def
chr2 5 11 def
chr2 7 9 def
chr3 11 13 hji
chr3 1 21 hji
output.txt

Quote:
chr1 1 7 abc
chr2 2 19 abc
chr2 1 11 def
chr3 1 21 hji
The logic is as follows, read a row, compare column1 to the next, if there is a match, compare column4. If these two match, grab the minimum of column2 and maximum of column3 and print the output with column1, min(col2), max(col3), column4.

Please note that my files are not sorted.

Thanks in advance.

Last edited by jacobs.smith; 02-10-2012 at 07:19 PM.. Reason: Output change
# 2  
Old 02-10-2012
Your sample output doesn't quite match your logic, but I think this is what you had in mind:


Code:
awk '
    BEGIN {
          getline;       # seed with first record
        split( $0, prev, " " );
        min = $2;
        max = $3;
    }

    {
        if( prev[1] == $1 && prev[4] == $4 )    # prev matches next record; capture min/max
        {
            if( min > $2 )
                min = $2;
            if( max < $3 )
                max = $3;
        }
        else      #prev does not match, put out min/max from the last set of records
        {
            print prev[1], min, max, prev[4];
            min = $2;
            max = $3;
        }
        split( $0, prev, " " )     # capture this rec for next go round
    }

    END {
          print prev[1], min, max,  prev[4];    # print out the last set
    }
' input-file


Last edited by agama; 02-10-2012 at 07:19 PM.. Reason: formatting
This User Gave Thanks to agama For This Post:
# 3  
Old 02-10-2012
Quote:
Originally Posted by agama
Your sample output doesn't quite match your logic, but I think this is what you had in mind:


Code:
awk '
    BEGIN {
          getline;       # seed with first record
        split( $0, prev, " " );
        min = $2;
        max = $3;
    }

    {
        if( prev[1] == $1 && prev[4] == $4 )    # prev matches next record; capture min/max
        {
            if( min > $2 )
                min = $2;
            if( max < $3 )
                max = $3;
        }
        else      #prev does not match, put out min/max from the last set of records
        {
            print prev[1], min, max, prev[4];
            min = $2;
            max = $3;
        }
        split( $0, prev, " " )     # capture this rec for next go round
    }

    END {
          print prev[1], min, max,  prev[4];    # print out the last set
    }
' input-file


Thanks agama. I edited my output after u mentioned the error. Your code does exactly what I needed.

Thanks once again Agama.

Thanks to this forum.

---------- Post updated at 06:24 PM ---------- Previous update was at 06:22 PM ----------

Quote:
Originally Posted by agama
Your sample output doesn't quite match your logic, but I think this is what you had in mind:


Code:
awk '
    BEGIN {
          getline;       # seed with first record
        split( $0, prev, " " );
        min = $2;
        max = $3;
    }

    {
        if( prev[1] == $1 && prev[4] == $4 )    # prev matches next record; capture min/max
        {
            if( min > $2 )
                min = $2;
            if( max < $3 )
                max = $3;
        }
        else      #prev does not match, put out min/max from the last set of records
        {
            print prev[1], min, max, prev[4];
            min = $2;
            max = $3;
        }
        split( $0, prev, " " )     # capture this rec for next go round
    }

    END {
          print prev[1], min, max,  prev[4];    # print out the last set
    }
' input-file

Thanks for a lovely code with comments too.
# 4  
Old 02-10-2012
Quote:
Originally Posted by jacobs.smith
Thanks agama. I edited my output after u mentioned the error.

I noticed the typo in the output, but I was talking more about the fact that you want min/max to collect down the records as long as the first/last fields match. As you wrote your logic, the output should be one record for every pair of lines, but your sample output suggested otherwise.


Regardless, glad it worked for you.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. UNIX for Beginners Questions & Answers

Match and Merge two file

Hi All, I have two file . I need to merge both file based on a match. File 1: Column1 column2 column3 column4 File 2: column3 column5 I need to combine the two file based on match , Which in my case is column3 and combine the file as below Combined file Column1 (10 Replies)
Discussion started by: arunkumar_mca
10 Replies

3. Shell Programming and Scripting

Merge lines based on match

I am trying to merge two lines to one based on some matching condition. The file is as follows: Matches filter: 'request ', timestamp, <HTTPFlow request=<GET: Matches filter: 'request ', timestamp, <HTTPFlow request=<GET: Matches filter: 'request ', timestamp, <HTTPFlow ... (8 Replies)
Discussion started by: jamie_123
8 Replies

4. UNIX for Dummies Questions & Answers

Merge rows into one if first 2 columns match

Hi, I wanted to merge the content and below is input and required output info. Input: /hello,a,r /hello,a,L /hello,a,X /hi,b,v /hi,b,c O/p: /hello,a,r:L:X /hi,v,:v:c Use code tags, thanks. (6 Replies)
Discussion started by: ankitas
6 Replies

5. Shell Programming and Scripting

Request: How to Parse dynamic SQL query to pad extra columns to match the fixed number of columns

Hello All, I have a requirement in which i will be given a sql query as input in a file with dynamic number of columns. For example some times i will get 5 columns, some times 8 columns etc up to 20 columns. So my requirement is to generate a output query which will have 20 columns all the... (7 Replies)
Discussion started by: vikas_trl
7 Replies

6. Shell Programming and Scripting

awk match and merge with 2 files

Dear Awk experts! I have been trying to get a one liner for a match and merge operation, but having difficulties as I'm an awk newb. I always get stuck on the match and merge with 2 files. I have 2 files as follows: File 1: field 1,field 2,field 3,field 4,field 5,field 6,field 7,field... (6 Replies)
Discussion started by: landossa
6 Replies

7. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns match on two rows

Hi all, I know this sounds suspiciously like a homework course; but, it is not. My goal is to take a file, and match my "ID" column to the "Date" column, if those conditions are true, add the total number of minutes worked and place it in this file, while not printing the original rows that I... (6 Replies)
Discussion started by: mtucker6784
6 Replies

8. Shell Programming and Scripting

Match on columns and replace other columns

Hi Friends, I have the following input file cat input chr1 100 200 0.1 0.2 na 1 na nd chr1 105 200 0.1 0.2 1 1 na 98 chr1 110 290 nf 1 na nd na 1 chr2 130 150 12 3 na 1 na 1 chr3 450 600 nf nf na 10 na nd chr4 300 330 1 1 10 11 23 34 My requirement is 1. If $6 is na make $7 nd and... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

9. Shell Programming and Scripting

AWK to match and merge data from 2 files into 1.

Hello, hopefully this is an easy on for the AWK guru's out there. I'm having some trouble figuring out how to match+merge data in 2 files into 1 single report. I've got my 2 files filtered and delimited, just need to MATCH $3 in file1 to $1 in file2, then put $0 from File1 and $2+$3 from File2... (6 Replies)
Discussion started by: right_coaster
6 Replies

10. Shell Programming and Scripting

Merge columns

Hi all - I have a file like below: A: A1,A2,A3,A4 B: 1,2,3,4 C: z,y,x,w .... This format repeats The output should come in a single line merging the first line field with the other two rows: A1_1 A1_z A2_2 A2_y A3_3 A3_x A4_4 A4_w Could anyone help with some directions ... (4 Replies)
Discussion started by: deepakgang
4 Replies
Login or Register to Ask a Question