The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Remove Similar entries in a File Nysif Steve UNIX for Advanced & Expert Users 2 03-13-2009 03:50 AM
Comparing 2 list and deleting deuplicate entries eltinator Shell Programming and Scripting 10 08-15-2007 01:35 PM
cuting the entries from one file and copy to other vasikaran UNIX for Dummies Questions & Answers 0 08-03-2005 08:22 AM
Splitting a file into groupings of 20 entries 98_1LE Shell Programming and Scripting 1 06-17-2003 12:10 PM
Deleting double entry in a file Wing m. Cheng Shell Programming and Scripting 3 09-18-2002 08:42 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 09-16-2002
opusforum opusforum is offline
Registered User
  
 

Join Date: Sep 2002
Posts: 3
deleting double entries in a log file

Hi Folks,

I have a apache log file that has double entries (however not all lines appear twice).

How can I delete automatically the first line of a double entry?

Your help is greatly appreciated.

Thanks,

Klaus

Here is what the log file looks like

217.81.190.164 - - [28/Aug/2002:00:16:33 +0200] "GET /rmg/w4w/1000689.htm HTTP/1.1" 200 2409
217.81.190.164 - - [28/Aug/2002:00:16:33 +0200] "GET /rmg/w4w/1000689.htm HTTP/1.1" 200 2409 "http://www.opusforum.org/rmg/w4w/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
217.81.190.164 - - [28/Aug/2002:00:17:01 +0200] "GET /rmg/vec/ HTTP/1.1" 200 2631
217.81.190.164 - - [28/Aug/2002:00:17:01 +0200] "GET /rmg/vec/ HTTP/1.1" 200 2631 "http://www.opusforum.org/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
217.81.190.164 - - [28/Aug/2002:00:17:03 +0200] "GET /rmg/vec/1000868.htm HTTP/1.1" 200 2386
217.81.190.164 - - [28/Aug/2002:00:17:03 +0200] "GET /rmg/vec/1000868.htm HTTP/1.1" 200 2386 "http://www.opusforum.org/rmg/vec/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
213.23.52.237 - - [28/Aug/2002:00:17:10 +0200] "GET / HTTP/1.0" 200 16327
  #2 (permalink)  
Old 09-16-2002
Perderabo's Avatar
Perderabo Perderabo is offline Forum Staff  
Unix Daemon
  
 

Join Date: Aug 2001
Location: Ashburn, Virginia
Posts: 9,111
How about:
uniq <inputfile >outputfile
  #3 (permalink)  
Old 09-16-2002
auswipe's Avatar
auswipe auswipe is offline Forum Advisor  
Registered User
  
 

Join Date: Nov 2001
Location: Wide Awake Wylie, Texas
Posts: 535
Re: deleting double entries in a log file

Quote:
Originally posted by opusforum
I have a apache log file that has double entries (however not all lines appear twice).
In this situation, I like to use a Perl hash for doing the dirty work for me.

Something like this:

Code:

#!/usr/bin/perl

open(LOG, "myLogFile") || die "$!";

my %logHash;

while ($inputLine = <LOG>) {
  if (!exists($logHash{$inputLine})) {
    $logHash{$inputLine} = 1;
    print "$inputLine";
  };
};
That should remove the dupe entries. Just redirect the output to a new log.
  #4 (permalink)  
Old 09-16-2002
auswipe's Avatar
auswipe auswipe is offline Forum Advisor  
Registered User
  
 

Join Date: Nov 2001
Location: Wide Awake Wylie, Texas
Posts: 535
Quote:
Originally posted by Perderabo
How about:
uniq <inputfile >outputfile
Oh sure, do it the eeaaasssy way!
  #5 (permalink)  
Old 09-16-2002
opusforum opusforum is offline
Registered User
  
 

Join Date: Sep 2002
Posts: 3
unfortunately doesn't work

Hi Folks,

thanks a lot for your suggestions. Unfortunately, both suggestions don't work.

The "uniq" solution needs a "-w 50" in order to come up with the double entry. However, it gives me the first line but I need the second (the line with add. information).

The perl script doesn't give me the result because it compares line by line. But the lines are not really "exact" duplicates (only the first 50 characters or so).

Any refinements, so the solution works? I am sure we are close

Thanks

Klaus
  #6 (permalink)  
Old 09-17-2002
opusforum opusforum is offline
Registered User
  
 

Join Date: Sep 2002
Posts: 3
the answer

I made it

here is what worked for me:

perl -e 'print reverse <>' logfile|uniq -w 50|perl -e 'print reverse <>' >logfile.done

so first, the logfile is inverted (by lines) then the dupes are removed and finaly we do an invert again.

The inversion is needed in order to have the first of a duplicate line pair removed.

Thanks to your contributions folks. This pointed me into the right direction.

Klaus
  #7 (permalink)  
Old 03-13-2009
uniesh uniesh is offline
Banned
  
 

Join Date: Mar 2009
Posts: 16
uniq -c <file1 >file2 would give you the number duplicate entries with a unique entry appending to the 2nd file.


Regards,
uniesh
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 01:59 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0