Remove duplicate text

06-07-2008

Registered User

18, 1

Join Date: May 2008

Last Activity: 25 June 2008, 5:36 PM EDT

Posts: 18

Thanks Given: 0

Thanked 1 Time in 1 Post

Remove duplicate text

Hello,

I have a log file which is generated by a script which looks like this:

Code:

userid: 7
starttime: Sat May 24 23:24:13 CEST 2008
endtime: Sat May 24 23:26:57 CEST 2008
total time spent: 2.73072 minutes / 163.843 seconds
date: Sat Jun 7 16:09:03 CEST 2008

userid: 8
starttime: Sun May 25 00:14:30 CEST 2008
endtime: Sun May 25 00:14:32 CEST 2008
total time spent: 0.0304667 minutes / 1.828 seconds
date: Sat Jun 7 16:10:02 CEST 2008

userid: 9
starttime: Sun May 25 00:14:30 CEST 2008
endtime: Sun May 25 00:14:32 CEST 2008
total time spent: 0.0304667 minutes / 1.828 seconds
date: Sat Jun 7 16:11:01 CEST 2008

Everytime when I run the script, it will increase the userid by one and adds information(start time, end time etc).
Anyone knows if there is an efficient way to remove the whole last block of text when its starttime or endtime or both is duplicate of the previous block?

Thanks in advance.

dejavu88

View Public Profile for dejavu88

Find all posts by dejavu88

06-07-2008

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Try this:

Code:

awk 'BEGIN{RS="";FS="\n"}
t!=$2$3{t=$2$3;print $0,"\n"}
' "file"

Regards

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

06-07-2008

Registered User

18, 1

Join Date: May 2008

Last Activity: 25 June 2008, 5:36 PM EDT

Posts: 18

Thanks Given: 0

Thanked 1 Time in 1 Post

Doesn't work

, looks like there's no change when I execute the script.
Can you also explain what the code does? I'm new at using awk.

dejavu88

View Public Profile for dejavu88

Find all posts by dejavu88

06-07-2008

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Does your file has a different format as you posted? Is every block seperated by an empty line?
I've copy and paste your example and it works fine for me.

This changes the default record and field seperator so you can treat every line as a field and every block as a record. The 2nd field is the start time end the 3th the end time:

Code:

awk 'BEGIN{RS="";FS="\n"}

Variable t is a reminder, print only records if the start and end time is different from the previous block:

Code:

t!=$2$3{t=$2$3;print $0,"\n"}

Regards

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

06-07-2008

Registered User

18, 1

Join Date: May 2008

Last Activity: 25 June 2008, 5:36 PM EDT

Posts: 18

Thanks Given: 0

Thanked 1 Time in 1 Post

The original file format looks like this:

Code:

UserID: 7
Start Time: Sat May 24 23:24:13 CEST 2008
End Time: Sat May 24 23:26:57 CEST 2008
Total time spent: 2.73072 minutes / 163.843 seconds
Date: Sat Jun 7 16:09:03 CEST 2008

UserID: 8
Start Time: Sun May 25 00:14:30 CEST 2008
End Time: Sun May 25 00:14:32 CEST 2008
Total time spent: 0.0304667 minutes / 1.828 seconds
Date: Sat Jun 7 16:10:02 CEST 2008

UserID: 9
Start Time: Sun May 25 00:14:30 CEST 2008
End Time: Sun May 25 00:14:32 CEST 2008
Total time spent: 0.0304667 minutes / 1.828 seconds
Date: Sat Jun 7 16:11:01 CEST 2008

I forgot to mention that there's an extra line at the beginning of the file followed by an empty line. Which says something about my log file. Thought it would be possible to get awk to read from the end of the file instead of from the top.

[edit2]

My fault, it's also in html format, not plain text in a log file, so when awk reads the file, it reads multiple lines in the .html file instead of what you see in the output of the log in a browser. There are a lot of lines in the file, should I just paste them here or can I send you the file?

Last edited by dejavu88; 06-07-2008 at 07:24 PM..

dejavu88

View Public Profile for dejavu88

Find all posts by dejavu88

06-08-2008

Registered User

544, 43

Join Date: Oct 2006

Last Activity: 27 March 2017, 3:00 AM EDT

Location: Belgium

Posts: 544

Thanks Given: 5

Thanked 43 Times in 29 Posts

You should strip the html tags before processing the log file. Different solutions here:

- PHP function strip_tags()
- lynx text browser with option -dump
- html2text utility.
- using sed to get rid of the tags

Can you post an extract of your html file. Or attach it to your reply.

ripat

View Public Profile for ripat

Find all posts by ripat

06-08-2008

Registered User

18, 1

Join Date: May 2008

Last Activity: 25 June 2008, 5:36 PM EDT

Posts: 18

Thanks Given: 0

Thanked 1 Time in 1 Post

I only stripped the last two html tags with a bit of code Franklin52 wrote for me in another post. So my script is able to add new log information and then put back the last two html tags after the log information over and over again. I'd rather not strip the html tags before processing the log, cause it's gonna be a hassle to put them back in the exact place where they were.

I've put the log up on pastebin:
http://pastebin.com/m2ba76b50

dejavu88

View Public Profile for dejavu88

Find all posts by dejavu88

Shell Programming and Scripting

Remove duplicate text

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Discussion started by: martinsmith

2. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Discussion started by: mahasona

3. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

Discussion started by: pasc

4. UNIX for Dummies Questions & Answers

Remove duplicate

Discussion started by: tinku981

5. Shell Programming and Scripting

Remove duplicate

Discussion started by: samrat dutta

6. Shell Programming and Scripting

Filter or remove duplicate block of text without distinguishing marks or fields

Discussion started by: samask

7. Shell Programming and Scripting

remove duplicate

Discussion started by: ccp

8. Shell Programming and Scripting

Remove duplicate files based on text string?

Discussion started by: spangberg

9. Shell Programming and Scripting

Remove duplicate

Discussion started by: sabercats

10. Shell Programming and Scripting

Remove duplicate ???

Discussion started by: sabercats