![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Advanced & Expert Users Expert-to-Expert. Learn advanced UNIX, UNIX commands, Linux, Operating Systems, System Administration, Programming, Shell, Shell Scripts, Solaris, Linux, HP-UX, AIX, OS X, BSD. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Removing tokens from cmd line | bashuser2 | Shell Programming and Scripting | 1 | 06-01-2009 06:55 AM |
| selecting tokens from a string... | c_d | Shell Programming and Scripting | 1 | 01-15-2009 05:04 AM |
| : + : more tokens expected | Nomaad | Shell Programming and Scripting | 3 | 04-17-2008 03:49 PM |
| reverse tokens with sed | markc | Shell Programming and Scripting | 1 | 02-22-2008 01:55 AM |
| tokens in unix ? | seaten | UNIX for Dummies Questions & Answers | 6 | 05-09-2005 05:57 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Find and eliminate duplcate tokens
I have a file like this: Code:
[token1]=value1 [token2]=value2 . . . [token n]=valuen The issue is that if we get to have i.e. the [token17] line duplicated it may incurr into errors in our application. I tried to find those repeated lines with something like Code:
uniq -cd prueba1.txt But it only found the repeated lines that are inmediately after the other i.e. Code:
$ cat prueba1.txt
uno
dos
tres
tres
cuatro
cinco
seis
cuatro
siete
ocho
$ uniq -cd prueba1.txt
2 tres
Only finding "tres" when it should also find "cuatro" Any idea on how to fix that? Last edited by vgersh99; 09-22-2009 at 06:08 PM.. Reason: code tags, PLEASE! |
|
||||
|
Why don't you sort before running uniq? Also, just a starting point, look at arrays usage in awk: Code:
awk '
{
key = $0; # use your key field here
if(key in regarr) {
duparr[key] = key
}
else {
regarr[key] = key
}
}
END {
for(idx in duparr) {
print idx;
}
}
'
|
|
||||
|
Thanks guys I appreciate your flash responses, this worked better than I expected ![]() My final version is this command: Code:
tr ' ] ' ' ' < prueba.txt | sort | awk '{print $1} ' | uniq -d | wc -l
That way, it should return 0, if it returns any higher, we have problems! Let me know what you think about it, perhaps there is a way to make it shorter, I think it's really long...
|
![]() |
| Bookmarks |
| Tags |
| find duplicate uniq sequence script |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|