Delete duplicate data and pertain the latest month data.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete duplicate data and pertain the latest month data.
# 1  
Old 04-01-2011
Delete duplicate data and pertain the latest month data.

Hi I have a file with following records
It contains three months of data, some data is duplicated,i need to access the latest data from the duplicate ones.
for e.g; i have foll data
Code:
"200","0","","11722","-63","","","","11722","JUL","09"
"200","0","","11722","-63","","","","11722","JUL","09"
"200","0","","11722","-63","","","","11722","JUL","09"
"200","0","","11722","-63","","","","11722","JUN","09"
"200","0","","11722","-63","","","","11722","JUN","09"

As it can be seen that the records are same with difference of the month,i want to delete the duplicate records and keep the records with the latest month value
e.g; if i consider the 3rd and 5th record both are same in terms of data but i need the latest data to persist in file, which in this case it is JUl 09.
The problem is if i sort the data i will get JUN09 data as alphabetic wise JUN comes first,whereas i need JUL 09 data, if i sort it descending same problem occurs for different months. The uniq command is also not giving me the right output.
According to my logic i thought of converting the Month name to month number, and concatenate it with the year column, then sort and delete the duplicate lines, but its not working fine
Could you please suggest a shell script on this scenario.
I have data till 2011

Last edited by Franklin52; 04-01-2011 at 06:21 AM.. Reason: Please use code tags
# 2  
Old 04-01-2011
Does the below command help you..? If not provide few more sample data
Code:
awk -F, 'NR==FNR{a[$10","$11]=$1","$2","$3","$4","$5","$6","$7","$8","$9",";next}END{for(i in a)print a[i]i}' inputfile inputfile

# 3  
Old 04-01-2011
Hi vee_789,

With sort:

Code:
sort -uk1.52,1.53nr -k1.46,1.49Mr Inputfile | sort -uk1.52,1.53nr
"200","0","","11722","-63","","","","11722","OCT","11"
"200","0","","11722","-63","","","","11722","DEC","10"
"200","0","","11722","-63","","","","11722","JUL","09"

Hope it helps,

Regards
# 4  
Old 04-01-2011
Hi, thanks but ur code is not giving me the desired output.
Wat i want is:
Consider for e.g these records
Code:
"200","0","","13011","-264","","","","13011","JUL","09"
"200","0","","13011","-264","","","","13011","JUL","09"
"200","0","","13011","-264","","","","13011","JUN","09"
"200","0","","13011","-263","","","","13011","JUL","09"
"200","0","","13011","-263","","","","13011","JUL","09"
"200","0","","13011","-263","","","","13011","AUG","09"

This should give me output as
Code:
"200","0","","13011","-264","","","","13011","JUL","09"
"200","0","","13011","-263","","","","13011","AUG","09"

From this it can be seen that the output has given me the latest record.
As in the duplicates are deleted and amongst those duplicate records the record with latest month value is given

Last edited by Franklin52; 04-01-2011 at 06:22 AM.. Reason: Please use code tags
# 5  
Old 04-01-2011
a bit clumsy but...
Code:
sort -r file | awk -F, '{if(date==$10$11){next}else{date=$10$11;print}}' OFS=,

Or did you not want the JUN record?
# 6  
Old 04-01-2011
No, i dont need the june record as it is the duplicate record if we compare it with july's record, and as july is the latest month when compared with june, i need the july record. I can understand its a bit confusing.
# 7  
Old 04-01-2011
Try this,
Code:
 awk -F"," 'BEGIN{d["JAN"] = 1
        d["FEB"] = 2
        d["MAR"] = 3
        d["APR"] = 4
        d["MAY"] = 5
        d["JUN"] = 6
        d["JUL"] = 7
        d["AUG"] = 8
        d["SEP"] = 9
        d["OCT"] = 10
        d["NOV"] = 11
        d["DEC"] = 12}
{y=$0;gsub(/"/,"",$10);gsub(/"/,"",$11);$10=d[$10];if(b==$5){if($10$11>c){a=y;b=$5;c=$10$11} else {b=$5}}else{print a;a=y;b=$5;c=$10$11;}}END{print a}' OFS="," inputfile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Take 10 latest line data

Good day for us. I want to ask what is the manner to count total of spesific character or string in 10 latest line. I mean from Latestline - 10 line until Latest line. Example : If the latest line of my file is 455th line, I just want to count total of spesific string from line 446th to 455th.... (5 Replies)
Discussion started by: weslyarfan
5 Replies

2. Shell Programming and Scripting

Help with duplicate column 1 data

Input file Q6GZV8 AY548484>AAT09676.1>YP_031595.1>2947737>CLSP2512393 P0C9E9 AY261366 P0C9K3 AY261361>IPR004848>PF01639 P0C9I4 AY261363>IPR004848 Desired output file Q6GZV8 AY548484 Q6GZV8 AAT09676.1 Q6GZV8 YP_031595.1 Q6GZV8 2947737 Q6GZV8 CLSP2512393 P0C9E9 AY261366... (3 Replies)
Discussion started by: perl_beginner
3 Replies

3. Shell Programming and Scripting

Help with duplicate common data content

Input file: #data_131 0 >content..._* 1 >content..._at_+/97.20% #data_137 0 >content..._* 1 >content..._at_+/97.20% 2 >seq..._* 3 >content..._at_+/97.20% 4 >content..._at_+/97.20% #data_141 0 >content..._* #data_150 0 >content..._* 1 >content..._at_+/97.20% 2 >seq..._* 3... (3 Replies)
Discussion started by: perl_beginner
3 Replies

4. UNIX for Dummies Questions & Answers

Mapping a data in a file and delete line in source file if data does not exist.

Hi Guys, Please help me with my problem here: I have a source file: 1212 23232 343434 ASAS1 4 3212 23232 343434 ASAS2 4 3234 23232 343434 QWQW1 4 1134 23232 343434 QWQW2 4 3212 23232 343434 QWQW3 4 and a mapping... (4 Replies)
Discussion started by: kokoro
4 Replies

5. Linux

How to Keep your core System and personal Data safe while updating to latest distro?

Hi everyone, Almost everything is in the title! Which partitions do you keep? Which partitions do you reformat, while doing a clean install? Personaly, I never format /var and /home partitions when I update to latest linux distribution. It has been working quite ok up to now, but I was... (3 Replies)
Discussion started by: freddie50
3 Replies

6. Web Development

Remove duplicate data in php

helllo there.. I really need your help.. I have my sample program like this.. <?php // db connection $db = "mds_reports"; if($connect = mysql_connect("172.16.8.32", "mds_reports", "password")) $connect = mysql_select_db($db); else... (2 Replies)
Discussion started by: Jeneca
2 Replies

7. Shell Programming and Scripting

How to extract log data based on current date and month ?

Hi Gurus, I'm using HP-UX B.11.23 operating system. I've been trying to extract this log info based on the current date and month, but was having some issues as the date column which on the 4th column has a comma and the 5th column has a dot tied to it. Here is the output from my shut... (5 Replies)
Discussion started by: superHonda123
5 Replies

8. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

9. UNIX for Dummies Questions & Answers

To check if the latest version of given GDG base has data

Hi All , I am trying to run a shell script through a JCL . The requirement is I have a gdg base name and I need to create a script that will just check if the latest version of that gdg has data or not . If it doesnt have data RC 4 need to be returned . One more thing which is bothering me is i... (1 Reply)
Discussion started by: mavesum
1 Replies

10. Shell Programming and Scripting

To check if the latest version of given GDG base has data

Hi All , I am trying to run a shell script through a JCL . The requirement is I have a gdg base name and I need to create a script that will just check if the latest version of that gdg has data or not . If it doesnt have data RC 4 need to be returned . One more thing which is bothering me is i... (3 Replies)
Discussion started by: mavesum
3 Replies
Login or Register to Ask a Question