How to delete duplicate entries without using awk command?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to delete duplicate entries without using awk command?
# 1  
Old 12-12-2012
Ubuntu [Solved] How to delete duplicate entries without using awk command?

Hello..
I am trying to remove the duplicate entries in a log files and used the the below shell script to do the same.
Code:
awk '!x[$0]++' <filename>

Can I do without using the awk command and the regex? I do not want to start the search from the beginning of the line in the log file as it contains date and time which of course varies from line to line. I am not able to write an effective script to do this functionality. Can anyone help me please?

Last edited by Scott; 12-12-2012 at 04:55 AM.. Reason: COde tags
# 2  
Old 12-12-2012
Can you post an (anonimized) sample of the log file?
# 3  
Old 12-12-2012
Yeah sure. Its something like this
Code:
Jan 13 2010 11:44:55 AM LoggingEx main Info1
Jan 13 2010 11:44:56 AM LoggingEx main Info1
Jan 13 2010 11:44:57 AM LoggingEx main Info1
Jan 13 2010 11:44:58 AM LoggingEx main Info1
Jan 13 2010 11:44:59 AM LoggingEx main Info1

Here not all lines are same. But still they are duplicates. I want to start removing this duplicate lines by beginning my search from 'LoggingEx...' so that i can skip the starting date part. If I use the awk command then it considers all the above lines as different and gives the same output which is not desired. Hope this helps....
Thanks

Last edited by Scrutinizer; 12-12-2012 at 05:16 AM.. Reason: code tags
# 4  
Old 12-12-2012
Code:
$ awk -F"AM" '!a[$2]++' a.txt
Jan 13 2010 11:44:55 AM LoggingEx main Info1

if you are using solaris, then use nawk

The above given command, splits the line using the delimter as "AM", if you want to check for PM also, then

Code:
 
$ nawk -F"AM|PM" '!a[$2]++' a.txt
Jan 13 2010 11:44:55 AM LoggingEx main Info1
Jan 13 2010 11:44:55 PM TEST
Jan 13 2010 11:44:55 PM TEST1

$ cat a.txt 
Jan 13 2010 11:44:55 AM LoggingEx main Info1
Jan 13 2010 11:44:56 AM LoggingEx main Info1
Jan 13 2010 11:44:57 AM LoggingEx main Info1
Jan 13 2010 11:44:58 AM LoggingEx main Info1
Jan 13 2010 11:44:59 AM LoggingEx main Info1
Jan 13 2010 11:44:55 PM TEST
Jan 13 2010 11:44:55 PM TEST
Jan 13 2010 11:44:55 PM TEST1

# 5  
Old 12-12-2012
for PM entries also..Smilie

Code:
awk -F "AM|PM" '!a[$2]++' a.txt

# 6  
Old 12-12-2012
Isn't there a way without having dependency on the pattern 'AM/PM'? Also I wanted to know without using awk command?
# 7  
Old 12-12-2012
sort on some systems can do this:
Code:
sort -uk6 file

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to put the command to remove duplicate lines in my awk script?

I create a CGI in bash/html. My awk script looks like : echo "<table>" for fn in /var/www/cgi-bin/LPAR_MAP/*; do echo "<td>" echo "<PRE>" awk -F',|;' -v test="$test" ' NR==1 { split(FILENAME ,a,""); } $0 ~ test { if(!header++){ ... (12 Replies)
Discussion started by: Tim2424
12 Replies

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

3. Shell Programming and Scripting

Delete duplicate row

Hi all, how can delete duplicate files in file form, e.g. $cat file1 aaa 123 234 345 456 bbb 345 345 657 568 ccc 345 768 897 456 aaa 123 234 345 456 ddd 786 784 234 263 ccc 345 768 897 456 aaa 123 234 345 456 ccc 345 768 897 456 then i need ouput file1 some, (4 Replies)
Discussion started by: aav1307
4 Replies

4. UNIX for Dummies Questions & Answers

Need an awk command to delete a line

Hi all, As of now am using an awk command to check the number of columns in a file that has 10 lakh rows. Is it possible to remove that particular line having an extra column and copy the remaining lines to a new file ? YOUR HELP IS HIGHLY APPRECIATED. THANKS IN ADVANCE (5 Replies)
Discussion started by: Rahul619
5 Replies

5. Shell Programming and Scripting

Help with removing duplicate entries with awk or Perl

Hi, I have a file which looks like:ke this : chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583... (22 Replies)
Discussion started by: Amit Pande
22 Replies

6. Shell Programming and Scripting

AWK Command to duplicate lines in a file?

Hi, I have a file with date in it like: UserString1 UserString2 UserString3 UserString4 UserString5 I need two entries for each line so it reads like UserString1 UserString1 UserString2 UserString2 etc. Can someone help me with the awk command please? Thanks (4 Replies)
Discussion started by: Grueben
4 Replies

7. Shell Programming and Scripting

How can i delete the content between all the occurences of two strings using sed or awk command

Hi. I have to delete the content between all the occurrences of the xml tags in a single file. For example: * The tags <script>.....................</script> occurs more than once in the same file. * It follows tagging rules meaning a start tag will be followed by an end tag. Will not have... (9 Replies)
Discussion started by: satheeshkumar
9 Replies

8. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance (9 Replies)
Discussion started by: cola
9 Replies

9. Shell Programming and Scripting

Counting duplicate entries in a file using awk

Hi, I have a very big (with around 1 million entries) txt file with IPv4 addresses in the standard format, i.e. a.b.c.d The file looks like 10.1.1.1 10.1.1.1 10.1.1.1 10.1.2.4 10.1.2.4 12.1.5.6 . . . . and so on.... There are duplicate/multiple entries for some IP... (3 Replies)
Discussion started by: sajal.bhatia
3 Replies

10. Ubuntu

delete duplicate rows with awk files

Hi every body I have some text file with a lots of duplicate rows like this: 165.179.568.197 154.893.836.174 242.473.396.153 165.179.568.197 165.179.568.197 165.179.568.197 154.893.836.174 how can I delete the repeated rows? Thanks Saeideh (2 Replies)
Discussion started by: sashtari
2 Replies
Login or Register to Ask a Question