Awk and duplicate lines - little complicated


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk and duplicate lines - little complicated
# 1  
Old 03-11-2012
Question Awk and duplicate lines - little complicated

So I've got problem which continues on my previous one (from few months ago:
unix.com/shell-programming-scripting/171764-delete-duplicate-lines-twist.html ).

Good, proven, working solutions for that old problem are those:
Code:
awk '{cur=$0; gsub(/[^[:alnum:]]/, "", cur); if (!a[tolower(cur)]++) print}'

and
Code:
awk '{s=tolower($0);gsub("[^[:alnum:]]","",s);x[s]=$0} END {for(i in x) print x[i]}'

These 2 approaches yield same results (but with different final order of lines, which is really unimportant for me).
These lines (any of them) are also, what I need modified now to work a little different, and that is purpose of this new topic:

I now don't need awk (in his search for duplicate lines in file) to consider and compare whole lines anymore. But only first parts of lines until it reaches character '*' (asterisk). Asterisk is separator in my file and everything that comes after asterisk, awk should not bother with (its like he got to end of the line). Asterisk occurs in every line in file but sometimes there is more then one per line (this should not confuse awk, and he should still take into account only first part of line, until first asterisk appears.

If someone can make good solution for this would save me week of work... also eternal gratitude from me Smilie
# 2  
Old 03-11-2012
Try:
Code:
awk -F\* '{cur=$1; gsub(/[^[:alnum:]]/, "", cur); if (!a[tolower(cur)]++) print}'

or the same a abit shorter:
Code:
awk -F\* '{s=$1; gsub(/[^[:alnum:]]/,x,s)} !a[tolower(s)]++'

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 03-11-2012
Yep seems like that is exactly what I wanted, only kinda suprised it cut my file almost in half size o_o
Need test a little bit more...

edit: well, this is it 100%
tested and retested, thanks Scrutinizer, love you!

Last edited by shadowww; 03-12-2012 at 12:19 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to put the command to remove duplicate lines in my awk script?

I create a CGI in bash/html. My awk script looks like : echo "<table>" for fn in /var/www/cgi-bin/LPAR_MAP/*; do echo "<td>" echo "<PRE>" awk -F',|;' -v test="$test" ' NR==1 { split(FILENAME ,a,""); } $0 ~ test { if(!header++){ ... (12 Replies)
Discussion started by: Tim2424
12 Replies

2. UNIX for Dummies Questions & Answers

awk solution to duplicate lines based on column

Hi experts, I have a tab-delimited file with one column containing values separated by a comma. I wish to duplicate the entire line for every value in that comma-delimited field. For example: $cat file 4444 4444 4444 4444 9990 2222,7777 6666 2222 ... (3 Replies)
Discussion started by: torchij
3 Replies

3. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

4. Shell Programming and Scripting

AWK Command to duplicate lines in a file?

Hi, I have a file with date in it like: UserString1 UserString2 UserString3 UserString4 UserString5 I need two entries for each line so it reads like UserString1 UserString1 UserString2 UserString2 etc. Can someone help me with the awk command please? Thanks (4 Replies)
Discussion started by: Grueben
4 Replies

5. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

6. Shell Programming and Scripting

AWK Duplicate lines multiple times based on a calculated value

Hi, I'm trying to create an XML sitemap of our dynamic ecommerce sites SEO Friendly URLs and am trying to create the initial page listing. I have a CSV file that looks like the following and need duplicate the lines based on a value which needs calculating. ... (2 Replies)
Discussion started by: jamesfx
2 Replies

7. Shell Programming and Scripting

remove duplicate lines using awk

Hi, I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw :confused: (7 Replies)
Discussion started by: sudvishw
7 Replies

8. Shell Programming and Scripting

Awk: How to merge duplicate lines and print in a single

The input file: >cat module1 200611051053 95 200523457498 35 200617890187 57 200726098123 66 200645676712 71 200744556590 68 >cat module2 200645676712 ... (10 Replies)
Discussion started by: winter9
10 Replies

9. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance (9 Replies)
Discussion started by: cola
9 Replies

10. Shell Programming and Scripting

Print duplicate only lines as normal output - Awk

input output a1 100 200 XYZ_X a1 98 188 ABC (2 Replies)
Discussion started by: quincyjones
2 Replies
Login or Register to Ask a Question