Detect duplicated words in file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Detect duplicated words in file
# 1  
Old 08-09-2010
Detect duplicated words in file

I have big spooled file which contains thousands of user account Ids. I'd want to find the duplicates of the ids in the file. Say those ids are words with numbers + alphabets, no special characters. We want to detect the duplicates of the words and put the duplicated words in another file.
TIA


---------- Post updated at 05:00 PM ---------- Previous update was at 04:56 PM ----------

example of the content of the file:

u2erheh u4yuiopk y1dfssdf h1dffdd h1dff67
u2qwert a9fdsfs y1dfd56 u4yuiopk h1dffd2
...
# 2  
Old 08-09-2010
Do you mean print the duplicates to another file or
print to another file and remove from old file?
# 3  
Old 08-09-2010
no remove old file, just find the dups and spool those to a new file.
thanks.
# 4  
Old 08-09-2010
I hope u have a space seperated file, if not change the sed command

Code:
sed 's/ /\n/g' file_name | sort| awk '{if(pre == $0){print;}pre=$0}'|uniq > output_file

This works, but i m nt a expert so there will b more smart way than this Smilie

---------- Post updated 08-10-10 at 12:03 AM ---------- Previous update was 08-09-10 at 11:59 PM ----------

to remove duplicate lines from file

Not the solution for your question but some thing very similar and common
# 5  
Old 08-09-2010
Code:
tr -sc '[:alnum:]' \\n < data | sort | uniq -d

# 6  
Old 08-09-2010
Code:
tr -s "[ \t]" "\n" <file| sort |uniq -d

# 7  
Old 08-10-2010
Quote:
Originally Posted by gvj
I hope u have a space seperated file, if not change the sed command

Code:
sed 's/ /\n/g' file_name | sort| awk '{if(pre == $0){print;}pre=$0}'|uniq > output_file

This works, but i m nt a expert so there will b more smart way than this Smilie

---------- Post updated 08-10-10 at 12:03 AM ---------- Previous update was 08-09-10 at 11:59 PM ----------

to remove duplicate lines from file

Not the solution for your question but some thing very similar and common
i relaced "file_name" with the source file, tried, and nothing found in the "output_file".

---------- Post updated at 09:57 PM ---------- Previous update was at 09:51 PM ----------

Quote:
Originally Posted by alister
Code:
tr -sc '[:alnum:]' \\n < data | sort | uniq -d

Should I test this with data replaced with the file?
I did that and nothing happening

my file:
etst test1 fdsfsdfds
test1

---------- Post updated at 09:59 PM ---------- Previous update was at 09:57 PM ----------

Quote:
Originally Posted by kurumi
Code:
tr -s "[ \t]" "\n" <file| sort |uniq -d

tr -s "[ \t]" "\n" <try1.txt| sort |uniq -d
got nothing.

---------- Post updated 08-10-10 at 01:06 AM ---------- Previous update was 08-09-10 at 09:59 PM ----------

I am so new to the scripting. Guys, Did I miss something on testing your codes?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace particular words in file based on if finds another words in that line

Hi All, I need one help to replace particular words in file based on if finds another words in that file . i.e. my self is peter@king. i am staying at north sydney. we all are peter@king. How to replace peter to sham if it finds @king in any line of that file. Please help me... (8 Replies)
Discussion started by: Rajib Podder
8 Replies

2. Shell Programming and Scripting

Join files, omit duplicated records from one file

Hello I have 2 files, eg more file1 file2 :::::::::::::: file1 :::::::::::::: 1 fromfile1 2 fromfile1 3 fromfile1 4 fromfile1 5 fromfile1 6 fromfile1 7 fromfile1 :::::::::::::: file2 :::::::::::::: 3 fromfile2 5 fromfile2 (4 Replies)
Discussion started by: CHoggarth
4 Replies

3. Shell Programming and Scripting

Deleting duplicated chunks in a file using awk/sed

Hi all, I'd always appreciate all helps from this site. I would like to delete duplicated chunks of strings on the same row(?). One chunk is comprised of four lines such as: path name starting point ending point voltage number I would like to delete duplicated chunks on the same... (5 Replies)
Discussion started by: jypark22
5 Replies

4. Shell Programming and Scripting

How to remove duplicated column in a text file?

Dear all, How can I remove duplicated column in a text file? Input: LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC... (1 Reply)
Discussion started by: huiyee1
1 Replies

5. UNIX for Dummies Questions & Answers

Replace the words in the file to the words that user type?

Hello, I would like to change my setting in a file to the setting that user input. For example, by default it is ONBOOT=ON When user key in "YES", it would be ONBOOT=YES -------------- This code only adds in the entire user input, but didn't replace it. How do i go about... (5 Replies)
Discussion started by: malfolozy
5 Replies

6. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

7. UNIX for Dummies Questions & Answers

Sort csv file by duplicated column value

hello, I have a large file (about 1gb) that is in a file similar to the following: I want to make it so that I can put all the duplicates where column 3 (delimited by the commas) are shown on top. Meaning all people with the same age are listed at the top. The command I used was ... (3 Replies)
Discussion started by: jl487
3 Replies

8. Shell Programming and Scripting

Splitting concatenated words in input file with words from the same file

Dear all, I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list. An example would make this clear annamarie mariechristine johnsmith johnjoseph smith john smith... (8 Replies)
Discussion started by: gimley
8 Replies

9. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Hello, I have a complex problem. I have a file in which words have been joined together: Theboy ranslowly I want to be able to correctly split the words using a lookup file in which all the words occur: the boy ran slowly slow put child ly The lookup file which is meant for look up... (21 Replies)
Discussion started by: gimley
21 Replies

10. Shell Programming and Scripting

Pattern matching in Duplicated file and print once

Dear Experts, I have many alarms appeared in a file twice, i want to grep them with this info EVTTIME & DOMAIN, and print them in second file with 1 occurance. I have tried uniq -d test.txt > newfile and awk '!arr++' test.txt > newfile both are not working Please help me with this!!! ... (1 Reply)
Discussion started by: Danish Shakil
1 Replies
Login or Register to Ask a Question