Duplicate values merge


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Duplicate values merge
# 1  
Old 02-09-2013
Duplicate values merge

Dear Gents,

Please can you help me to solve this problem.

Input file...

Code:
22057485  ,219 ,1050
22057485  ,223 ,1050
21897425  ,278 ,1050
21897425  ,279 ,1050
21897425  ,287 ,1050
20497465  ,602 ,1051
20517500  ,677 ,1051
20517500  ,681 ,1051
20577555  ,775 ,1052
20577555  ,778 ,1052
20357560  ,778 ,1052
20357560  ,780 ,1052
23717535  ,794 ,1053
23717535  ,805 ,1053
23657530  ,797 ,1053
23657530  ,798 ,1053
23657530  ,799 ,1053

I would like to get something like it:

output file

Code:
1050  22057485    219    223    
1050  21897425    278    279    287
1051  20497465    602    603    605
1051  20517500    677    681    
1052  20577555    775    778    
1052  20357560    778    780    
1053  23717535    794    805    
1053  23657530    797    798    799

Thanks in advance
# 2  
Old 02-09-2013
If you don't mind the order of the output:
Code:
awk -F' *, *' '{c[$3 OFS $1]=c[$3 OFS $1]""?c[$3 OFS $1] OFS $2:$2}
END{for(i in c) print i,c[i]}' OFS='\t' file

This User Gave Thanks to elixir_sinari For This Post:
# 3  
Old 02-09-2013
Thanks a lot its works perfect. Smilie
# 4  
Old 02-09-2013
Try:
Code:
awk -F' *,' 'p!=$1{if(p)print s; s=$3 OFS $1; p=$1}{s=s OFS $2} END{print s}' OFS='\t' file


Last edited by Scrutinizer; 03-04-2013 at 06:10 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 02-09-2013
@Scrutinizer
A very nice solution. I did use nearly one hour to study this simple work to find out how it works. I do admire how you guys manage to find this clever simple solution to the problems.

I just like to explain how this script work, so I have written it some more readable.
Code:
awk -F' *,' '		#1	
p!=$1{			#2
	if(p) print s;	#3
	s=$3 OFS $1;	#4
	p=$1}		#5
{s=s OFS $2} 		#6
END {print s}' \	#7
OFS='\t' file		#8

#1 Setting the Field separator to one or more spaced followed by a comma ' *,'

Run on line one 22057485 ,219 ,1050
$1=22057485 $2=219 $3=1050
#2 test if p is different form $1, and it is since p=0 (no data)
#3 test if p contains data, no, p is blank, do not print.
#4 set s=$3 OFS $1 s="1050 22057485"
#5 p=$1=22057485
#6 s=s OFS $2 s="1050 22057485 219"
Run on line two 22057485 ,223 ,1050
$1=22057485 $2=223 $3=1050
#2 test if p is different form $1, and it equal p=22057485 $1=22057485
Jump to #6
#6 s=s OFS $2 s="1050 22057485 219 223"
Run on line three 21897425 ,278 ,1050
$1=21897425 $2=278 $3=1050
#2 test if p is different form $1, and it is since p=22057485 $1=21897425
#3 test if p contains data, yes print s 1050 22057485 219 223
#4 set s=$3 OFS $1 s="1050 21897425"
#5 p=$1=21897425
#6 s=s OFS $2 s="1050 21897425 278"
Run on line four
.
.
.
#7 END Last job, print the last line print s
#8 setts the Output Field Separator to tab OFS='\t'

Last edited by Jotne; 02-10-2013 at 03:48 AM..
These 2 Users Gave Thanks to Jotne For This Post:
# 6  
Old 02-10-2013
Thanks to everybody for your great job.

---------- Post updated at 02:27 PM ---------- Previous update was at 02:37 AM ----------

Gents,

please other thing

Code:
1050  22057485    219    223
1050  21897425    278    279    287 
1051  20497465    602    603    605 
1051  20517500    677    681     
1052  20577555    775    778     
1052  20357560    778    780     
1053  23717535    794    805     
1053  23657530    797    798    799

How i can count the total of values only from the 4 column to the end.

In this case the total of values will be 11.

How I can get this value..?.

Thanks for your help
# 7  
Old 02-10-2013
Code:
awk '{i+=NF-3} END {print i}' infile
11

For me this gives 11 not 15 numbers?
Code:
1050  22057485    219    223
1050  21897425    278    279    287 
1051  20497465    602    603    605 
1051  20517500    677    681     
1052  20577555    775    778     
1052  20357560    778    780     
1053  23717535    794    805     
1053  23657530    797    798    799

Or did I understand this incorrect.
This User Gave Thanks to Jotne For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join and merge multiple files with duplicate key and fill void columns

Join and merge multiple files with duplicate key and fill void columns Hi guys, I have many files that I want to merge: file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: (5 Replies)
Discussion started by: yjacknewton
5 Replies

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

3. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies

4. Shell Programming and Scripting

Remove duplicate values with condition

Hi Gents, Please can you help me to get the desired output . In the first column I have some duplicate records, The condition is that all need to reject the duplicate record keeping the last occurrence. But the condition is. If the last occurrence is equal to value 14 or 98 in column 3 and... (2 Replies)
Discussion started by: jiam912
2 Replies

5. Shell Programming and Scripting

Append values of duplicate entries

My input file is: LOC_Os01g01870 GO:0006139 LOC_Os01g01870 GO:0009058 LOC_Os01g02570 GO:0006464 LOC_Os01g02570 GO:0009987 LOC_Os01g02570 GO:0008152 LOC_Os01g04380 GO:0006950 LOC_Os01g04380 GO:0009628 I want to append the duplicate values in a tab/space... (2 Replies)
Discussion started by: Sanchari
2 Replies

6. Shell Programming and Scripting

Extract values of duplicate keys

I have two questions that are related, so it would be great if you can help me with both! Question1: I have a file A that looks like this: a x b y b z c w I want to get something like: a x b y; z c w Given that a,b,c has no spaces. But the other letters might contain spaces. ... (2 Replies)
Discussion started by: Viernes
2 Replies

7. Shell Programming and Scripting

duplicate values

Hi, How to enumerate duplicate values, without sorting the file. example 1 1 2 1 3 1 1 2 2 2 3 2 1 3 2 3 3 3 Where the first column have the repetead values without sorting, I would like to get the value of the times that the value is repetead , as I show... (2 Replies)
Discussion started by: jiam912
2 Replies

8. Shell Programming and Scripting

Awk: How to merge duplicate lines and print in a single

The input file: >cat module1 200611051053 95 200523457498 35 200617890187 57 200726098123 66 200645676712 71 200744556590 68 >cat module2 200645676712 ... (10 Replies)
Discussion started by: winter9
10 Replies

9. Shell Programming and Scripting

merge files with same row values

Hi everyone, I'm just wondering how could I using awk language merge two files by comparison of one their row. I mean, I have one file like this: file#1: 21/07/2009 11:45:00 100.0000000 27.2727280 21/07/2009 11:50:00 75.9856644 25.2492676 21/07/2009 11:55:00 51.9713287 23.2258072... (4 Replies)
Discussion started by: tonet
4 Replies
Login or Register to Ask a Question