replace by match on fourth column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting replace by match on fourth column
# 1  
Old 09-04-2012
replace by match on fourth column

Hi friends,

My input file is this way

Code:
chr1 100 200 "abc"
chr1 350 400 "abc"
chr2 450 600 "def"
chr2 612 780 "def"

How do I make this file into

Code:
chr1 100 400 "abc"
chr2 450 780 "def"

This is basically matching on the fourth column and taking the minimum of second column and the maximum of third column.

Thanks in advance.
# 2  
Old 09-04-2012
With 192 posts so far.... What have you tried and where exactly are you stuck?
# 3  
Old 09-04-2012
I have a starting partial solution:
Code:
awk 'min[$4]=="" || $2<min[$4] {min[$4]=$2;lines[$4]=$1" "$2}END{for (j in lines) print j"\t"lines[j]}' inputfile
-------------------------------------
"abc"    chr1 100
"def"    chr2 450

awk 'max[$4]=="" || $3>max[$4] {max[$4]=$3;lines[$4]=$3" "$4}END{for (j in lines) print j"\t"lines[j]}' inputfile
-------------------------------------
"abc"    400 "abc"
"def"    780 "def"

Then we can join this two ouput together as the result.
Someone could please make a better one. I am learning awk.
# 4  
Old 09-04-2012
Try:
Code:
awk ' { if(count[$4]++ == 0) {
                col1[$4] = $1
                min[$4] = $2
                max[$4] = $3
        } else {
                if(min[$4] > $2) min[$4] = $2
                if(max[$4] < $3) max[$4] = $3
        }
}
END {   for(i in count) print col1[i], min[i], max[i], i
}' input

You can simplify this some if all of the entries with matches on column 4 are adjacent in the input; but you didn't specify that that we could make that assumption in your requirements. You also didn't say what is supposed to happen if column one doesn't have the same value for all entries that match on column 4. This script prints the value found in the 1st entry for any given value in column 4.
# 5  
Old 09-05-2012
your entire input data will have same continuous row on column for or it can be anywhere in file
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies

2. Shell Programming and Scripting

Match column 8 in file 1 with column 2 in file 2 and replace..

I am looking at the NR==FNR posts and trying to use them to achieve the following but I am not getting it. I have 2 files. I want to match column 8 in file 1 with column 2 in file 2. When they match I want to replace column 9 in file 1 with column 1 in file 2. This is and extract from file 1 ... (5 Replies)
Discussion started by: kieranfoley
5 Replies

3. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

4. Shell Programming and Scripting

[Solved] Extract First character in fourth column

Hi Experts, I am new to UNIX. One of my file records are like below 220 IN C/A 515013 NULL NULL 220 IN C/A 515017 NULL NULL 225 IN C/A 333701 NULL NULL 225 IN C/A 515034 NULL NULL 225 IN C/A 499201 NULL NULL 225 IN C/A 499202 NULL NULL The above mentioned records delimiter is... (4 Replies)
Discussion started by: suresh_target
4 Replies

5. Shell Programming and Scripting

awk Match First Field and Replace Second Column

Hi Friends, I have looked around the forums and over online but couldn't figure out how to deal with this problem input.txt gene1,axis1/0/1,axis2/0/1 gene1,axis1/1/2,axis2/1/2 gene1,axis1/2/3,axis2/2/3 gene2,axis1/3/4,axis2/3/4 Match on first column and if first column is... (1 Reply)
Discussion started by: jacobs.smith
1 Replies

6. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

7. Shell Programming and Scripting

Remove the first character from the fourth column only if the column has four characters

I have a file as follows ATOM 5181 N AMET K 406 12.440 6.552 25.691 0.50 7.37 N ATOM 5182 CA AMET K 406 13.685 5.798 25.578 0.50 5.87 C ATOM 5183 C AMET K 406 14.045 5.179 26.909 0.50 5.07 C ATOM 5184 O MET K... (14 Replies)
Discussion started by: hasanabdulla
14 Replies

8. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

9. Shell Programming and Scripting

Use awk to have the fourth column with spaces

Hi Gurus, We have a ftpserver from which we do a dir command and output it to a local file. The content of the ftpfile is: 07-15-09 06:06AM 5466 ABC_123_ER19057320090714082723.ZIP 07-15-09 06:07AM 3801 ABC_123_ER19155920090714082842.ZIP 07-15-09 06:07AM ... (14 Replies)
Discussion started by: donisback
14 Replies

10. Shell Programming and Scripting

How to manipulate first column and reverse the line order in third and fourth column?

How to manipulate first column and reverse the line order in third and fourth column as follws? For example i have a original file like this: file1 0.00000000E+000 -1.17555359E-001 0.00000000E+000 2.00000000E-002 -1.17555359E-001 0.00000000E+000 ... (1 Reply)
Discussion started by: Max Well
1 Replies
Login or Register to Ask a Question