Awk and duplicate lines - little complicated

03-11-2012

Registered User

6, 0

Join Date: Nov 2011

Last Activity: 6 January 2014, 7:09 PM EST

Posts: 6

Thanks Given: 4

Thanked 0 Times in 0 Posts

Awk and duplicate lines - little complicated

So I've got problem which continues on my previous one (from few months ago:
unix.com/shell-programming-scripting/171764-delete-duplicate-lines-twist.html ).

Good, proven, working solutions for that old problem are those:

Code:

awk '{cur=$0; gsub(/[^[:alnum:]]/, "", cur); if (!a[tolower(cur)]++) print}'

and

Code:

awk '{s=tolower($0);gsub("[^[:alnum:]]","",s);x[s]=$0} END {for(i in x) print x[i]}'

These 2 approaches yield same results (but with different final order of lines, which is really unimportant for me).
These lines (any of them) are also, what I need modified now to work a little different, and that is purpose of this new topic:

I now don't need awk (in his search for duplicate lines in file) to consider and compare whole lines anymore. But only first parts of lines until it reaches character '*' (asterisk). Asterisk is separator in my file and everything that comes after asterisk, awk should not bother with (its like he got to end of the line). Asterisk occurs in every line in file but sometimes there is more then one per line (this should not confuse awk, and he should still take into account only first part of line, until first asterisk appears.

If someone can make good solution for this would save me week of work... also eternal gratitude from me

shadowww

View Public Profile for shadowww

Find all posts by shadowww

03-11-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Try:

Code:

awk -F\* '{cur=$1; gsub(/[^[:alnum:]]/, "", cur); if (!a[tolower(cur)]++) print}'

or the same a abit shorter:

Code:

awk -F\* '{s=$1; gsub(/[^[:alnum:]]/,x,s)} !a[tolower(s)]++'

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

03-11-2012

Registered User

6, 0

Join Date: Nov 2011

Last Activity: 6 January 2014, 7:09 PM EST

Posts: 6

Thanks Given: 4

Thanked 0 Times in 0 Posts

Yep seems like that is exactly what I wanted, only kinda suprised it cut my file almost in half size o_o
Need test a little bit more...

edit: well, this is it 100%
tested and retested, thanks Scrutinizer, love you!

Last edited by shadowww; 03-12-2012 at 12:19 PM..

shadowww

View Public Profile for shadowww

Find all posts by shadowww

Shell Programming and Scripting

Awk and duplicate lines - little complicated

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to put the command to remove duplicate lines in my awk script?

Discussion started by: Tim2424

2. UNIX for Dummies Questions & Answers

awk solution to duplicate lines based on column

Discussion started by: torchij

3. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Discussion started by: asjaiswal

4. Shell Programming and Scripting

AWK Command to duplicate lines in a file?

Discussion started by: Grueben

5. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Discussion started by: raidzero

6. Shell Programming and Scripting

AWK Duplicate lines multiple times based on a calculated value

Discussion started by: jamesfx

7. Shell Programming and Scripting

remove duplicate lines using awk

Discussion started by: sudvishw

8. Shell Programming and Scripting

Awk: How to merge duplicate lines and print in a single

Discussion started by: winter9

9. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Discussion started by: cola

10. Shell Programming and Scripting

Print duplicate only lines as normal output - Awk

Discussion started by: quincyjones