Find and eliminate duplcate tokens


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Find and eliminate duplcate tokens
# 1  
Old 09-22-2009
Find and eliminate duplcate tokens

I have a file like this:
Code:
[token1]=value1
[token2]=value2
.
.
.
[token n]=valuen

The issue is that if we get to have i.e. the [token17] line duplicated it may incurr into errors in our application.
I tried to find those repeated lines with something like
Code:
uniq -cd prueba1.txt

But it only found the repeated lines that are inmediately after the other i.e.
Code:
$ cat prueba1.txt
uno
dos
tres
tres
cuatro
cinco
seis
cuatro
siete
ocho
$ uniq -cd prueba1.txt
      2 tres

Only finding "tres" when it should also find "cuatro"
Any idea on how to fix that?

Last edited by vgersh99; 09-22-2009 at 06:08 PM.. Reason: code tags, PLEASE!
# 2  
Old 09-22-2009
To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

---------- Post updated at 05:09 PM ---------- Previous update was at 05:09 PM ----------

Sort your file first - 'man sort'
# 3  
Old 09-22-2009
Why don't you sort before running uniq?

Also, just a starting point, look at arrays usage in awk:

Code:
 
awk '
{
        key = $0; # use your key field here
        if(key in regarr) {
                duparr[key] = key
        }
        else {
                regarr[key] = key
        }
}
END {
        for(idx in duparr) {
                print idx;
        }
}
'

# 4  
Old 09-22-2009
Thanks guys
I appreciate your flash responses, this worked better than I expected Smilie
My final version is this command:
Code:
tr ' ] ' ' ' < prueba.txt | sort | awk '{print $1} ' | uniq -d | wc -l

That way, it should return 0, if it returns any higher, we have problems!
Let me know what you think about it, perhaps there is a way to make it shorter, I think it's really long... Smilie
# 5  
Old 09-22-2009
A little bit smaller:
Code:
 cut -d']' -f1 prueba2.txt|sort|uniq -d|wc -l

# 6  
Old 09-22-2009
Quote:
Originally Posted by Scrutinizer
A little bit smaller:
Code:
 cut -d']' -f1 prueba2.txt|sort|uniq -d|wc -l

Nice one, at least is smaller that mine, hehe
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

C++ getline, parse and take first tokens by condition

Hello, Trying to parse a file (in FASTA format) and reformat it. 1) Each record starts with ">" and followed by words separated by space, but they are in one same line for sure; 2) Sequences are following that may be in multiple rows with possible spaces inside until the next ">".... (18 Replies)
Discussion started by: yifangt
18 Replies

2. Programming

Reading tokens

I have a String class with a function that reads tokens using a delimiter. For example String sss = "6:8:12:16"; nfb = sss.nfields_b (':'); String tkb1 = sss.get_token_b (':'); String tkb2 = sss.get_token_b (':'); String tkb3 = sss.get_token_b (':'); String tkb4 =... (1 Reply)
Discussion started by: kristinu
1 Replies

3. Shell Programming and Scripting

Need tokens in shell script

Hi All, Im writing a shell script in which I want to get the folder names in one folder to be used in for loop. I have used: packsName=$(cd ~/packs/Acquisitions; ls -l| awk '{print $9}') echo $packsName o/p: opt temp user1 user2 ie. Im getting the output as a string. But I want... (3 Replies)
Discussion started by: AB10
3 Replies

4. Shell Programming and Scripting

+: more tokens expected

Hey everyone, i needed some help with this one. We move into a new file system (which should be the same as the previous one, other than the name directory has changed) and the script worked fine in the old file system and not the new. I'm trying to add the results from one with another but i'm... (4 Replies)
Discussion started by: senormarquez
4 Replies

5. Shell Programming and Scripting

Replacing tokens

Hi all, I have a variable with value DateFileFormat=NAME.CODE.CON.01.#.S001.V1.D$.hent.txt I want this variable to get replaced with : var2 is a variable with string value DateFileFormat=NAME\\.CODE\\.CON\\.01\\.var2\\.S001\\.V1\\.D+\\.hent\\.txt\\.xml$ Please Help (3 Replies)
Discussion started by: abhinav192
3 Replies

6. Shell Programming and Scripting

Removing tokens from cmd line

Hi everyone. I am trying to develop my own shell,and i am in the part of redirection. let's say the user gives as input cat test > test2 in the array of arguments i want to keep only arg=cat,arg=test. ">" token is not an input file so cat cannot worka and test2 is output.how can i remove > and... (1 Reply)
Discussion started by: bashuser2
1 Replies

7. Shell Programming and Scripting

selecting tokens from a string...

i store the output of ls in a variable FL $FL=`ls` $echo $FL f1.txt f2.txt f3.txt f4.txt f5.txt script.sh script.sh~ test.txt now if i want to retrive the sub-string "f1.txt" from $FL we were taught that this is what i have to do $set $FL $echo $1 f1.txt and echo $2 would give... (1 Reply)
Discussion started by: c_d
1 Replies

8. Shell Programming and Scripting

: + : more tokens expected

Hello- Trying to add two numbers in a ksh shell scripts and i get this error every time I execute stat1_ex.ksh: + : more tokens expected stat1=`cat .stat1a.tmp | cut -f2 -d" "` stat2=`cat .stat2a.tmp | cut -f2 -d" "` j=$(($stat1 + $stat2)) # < Here a the like the errors out echo $j... (3 Replies)
Discussion started by: Nomaad
3 Replies

9. Shell Programming and Scripting

reverse tokens with sed

I currently use this bash for loop below to reverse a set of tokens, example "abc def ghi" to "ghi def abc" but in looking at various sed one liner postings I notice two methods to reverse lines of text from a file (emulating tac) and reversing letters in a string (emulating rev) so I've spent some... (1 Reply)
Discussion started by: markc
1 Replies

10. UNIX for Dummies Questions & Answers

tokens in unix ?

im trying to remove all occurences of " OF xyz " in a file where xyz could be any word assuming xyz is the last word on the line but I won't always be. at the moment I have sed 's/OF.*//' but I want a nicer solution which could be in pseudo code sed 's/OF.* (next token)//' Is... (6 Replies)
Discussion started by: seaten
6 Replies
Login or Register to Ask a Question