Combine identical lines and average the one variable field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Combine identical lines and average the one variable field
# 1  
Old 06-11-2014
Combine identical lines and average the one variable field

I have the following file

Code:
299899 chrX_299716_300082 196  78.2903 299991 chrX_299982_300000 18.2538 Tajd:0.745591 FayWu:-0.245701 T2:1.45
299899 chrX_299716_300082 196  78.2903 299991 chrX_299982_300000 18.2538 Tajd:0.745591 FayWu:-0.245701 T2:0.283
311027 chrX_310892_311162 300  91.6452 311022 chrX_311013_311031 14.9526 Tajd:0.640409 FayWu:-0.278087 T2:0.283
311027 chrX_310892_311162 300  91.6452 311022 chrX_311013_311031 14.9526 Tajd:0.640409 FayWu:-0.278087 T2:-0.324
388608 chrX_388393_388823 562  50.619 388603 chrX_388594_388612 18.4584 Tajd:0.342217 FayWu:-0.742664 T2:-0.421
688781 chrX_688561_689002 552 -0 688817 chrX_688808_688826 10.6874 Tajd:0.302043 FayWu:-1.079566 T2:0.803
688781 chrX_688561_689002 552 -0 688817 chrX_688808_688826 10.6874 Tajd:0.302043 FayWu:-1.079566 T2:-1.233
1220600 chrX_1220404_1220797 510 -0 1220617 chrX_1220608_1220626 16.7085 Tajd:0.391032 FayWu:-0.421912 T2:1.093

There are a lot of identical lines which differ only in the last field (T2:#). I'm looking for a way to combine these lines so that the T2 entry is averaged. In this excerpt I would wish to receive something like:

Code:
299899 chrX_299716_300082 196  78.2903 299991 chrX_299982_300000 18.2538 Tajd:0.745591 FayWu:-0.245701 T2:0.8665
311027 chrX_310892_311162 300  91.6452 311022 chrX_311013_311031 14.9526 Tajd:0.640409 FayWu:-0.278087 T2:-0.0205
388608 chrX_388393_388823 562  50.619 388603 chrX_388594_388612 18.4584 Tajd:0.342217 FayWu:-0.742664 T2:-0.421
688781 chrX_688561_689002 552 -0 688817 chrX_688808_688826 10.6874 Tajd:0.302043 FayWu:-1.079566 T2:-0.215
1220600 chrX_1220404_1220797 510 -0 1220617 chrX_1220608_1220626 16.7085 Tajd:0.391032 FayWu:-0.421912 T2:1.093

The file is sorted, so all identical lines will be consecutive entries. The closest I have gotten is:
Code:
more input.file | awk '{split($10,a,":");avt2[$1]+=a[2];c[$1]++}END{for(i in avt2) print $0,avt2[i]/c[i]}' > output.file

but have not received any helpful results.
Thanks a lot for any help,
Jonas
# 2  
Old 06-11-2014
May be ,

Code:
awk -F: '{S=$1 FS $2 FS $3;a[S]++;b[S]=b[S]+$4} END {for (i in a) {print i FS b[i]/a[i]}}' file


Note: I assumed you have 4 colon separated fields on each line.

Also, The file need not to be sorted in this case. It would work in both cases.

EDIT:
I see, you have used almost similar way, except the IFS=":".

Last edited by clx; 06-11-2014 at 07:24 AM..
# 3  
Old 06-11-2014
Seems to have worked.

Thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk - If field value of consecutive records are the identical print portion of lines

I have some data that looks like this: PXD= ZW< 1,6 QR> QRJ== 1,2(5) QR> QRJ== 4,1(2) QR> QRJ== 4,2 QRB= QRB 4,2 QWM QWM 6,2 R<C ZW< 11,2 R<H= R<J= 6,1 R>H XZJ= 1,2(2) R>H XZJ= 2,6(2) R>H XZJ= 4,1(2) R>H XZJ= 6,2 RDP RDP 1,2 What I would like to do is if fields $1 and $2 are... (5 Replies)
Discussion started by: jvoot
5 Replies

2. UNIX for Beginners Questions & Answers

How to delete identical lines while leaving one undeleted?

Hi, I have a file as follows. file1 Hello Hi His Hi Hi Hungry hi so I want to delete identical lines while leaving one of them undeleted. So desired output will be Hello Hi (2 Replies)
Discussion started by: beginner_99
2 Replies

3. UNIX for Dummies Questions & Answers

Combine Similar Output from the 2nd field w.r.t 1st Field

Hi, For example: I have: HostA,XYZ HostB,XYZ HostC,ABC I would like the output to be: HostA,HostB: XYZ HostC:ABC How can I achieve this? So far what I though of is: (1 Reply)
Discussion started by: alvinoo
1 Replies

4. Shell Programming and Scripting

awk to combine by field and average by another

In the below awk I am trying to combine all matching $4 into a single $5 (up to the -), and count the lines in $6 and average all values in $7. The awk is close but it seems to only be using the last line in the file and skipping all others. The posted input is a sample of the file that is over... (3 Replies)
Discussion started by: cmccabe
3 Replies

5. Shell Programming and Scripting

sed print all lines between second and third identical lines

I am trying to extract a table of data (mysql query output) from a log file. I need to print everything below the header and not past the end of the table. I have spent many hours searching with little progress. I am matching the regexp +-\{99\} with no problem. I just can't figure out how to print... (5 Replies)
Discussion started by: godfreydanials
5 Replies

6. Shell Programming and Scripting

Combine multiple lines in file based on specific field

Hi, I have an issue to combine multiple lines of a file. I have records as below. Fields are delimited by TAB. Each lines are ending with a new line char (\n) Input -------- ABC 123456 abcde 987 890456 7890 xyz ght gtuv ABC 5tyin 1234 789 ghty kuio ABC ghty jind 1234 678 ght ... (8 Replies)
Discussion started by: ratheesh2011
8 Replies

7. UNIX for Dummies Questions & Answers

more than 10 identical lines

I have a file that looks like this 10 user1s, 5 user2s and 10 users3. 10.10.1.1 user1 10.10.1.1 user1 10.10.1.1 user1 10.10.1.1 user1 10.10.1.1 user1 10.10.1.1 user1 10.10.1.1 user1 10.10.1.1 user1 10.10.1.1 user1 10.10.1.1 user1 10.10.1.2 user2 10.10.1.2 user2 10.10.1.2 user2... (7 Replies)
Discussion started by: lawsongeek
7 Replies

8. Shell Programming and Scripting

print running field average for a set of lines

Hi everyone, I have a program that generates logs that contains sections like this: IMAGE INPUT 81 0 0.995 2449470 0 1726 368 1 0.0635 0.3291 82 0 1.001 2448013 0 1666 365 1 0.0649 0.3235 83 0 1.009 2444822 0 1697 371 1 ... (3 Replies)
Discussion started by: euval
3 Replies

9. Shell Programming and Scripting

Ignore identical lines

Hello Experts, I have two files called "old" and "new". My old file contains 10 lines and my new file contains 10 + "n" lines. The first field in both these files contain ID. I sort these two files on ID. I am interested in only the lines that are in the new file and not in old. I tried... (4 Replies)
Discussion started by: forumthreads
4 Replies

10. Shell Programming and Scripting

replace 2 identical strings on different lines

I am looking to replace two or more strings on different lines using sed, but not with the same variable. IE # cat xxx.file <abc> abc def ghi abc def ghi abc def ghi currently I can only change each line with the same pattern: # sed -e '/<abc>/!s/abc\(.*\)/jkl mno/' xxx.file abc jkl mno... (3 Replies)
Discussion started by: prkfriryce
3 Replies
Login or Register to Ask a Question