remove duplicate words in a line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting remove duplicate words in a line
# 1  
Old 03-18-2009
Bug remove duplicate words in a line

Hi,

Please help!
I have a file having duplicate words in some line and I want to remove the duplicate words.
The order of the words in the output file doesn't matter.

INPUT_FILE
pink_kite red_pen ball pink_kite ball
yellow_flower white no white no
cloud nine_pen pink cloud pink nine_pen
brown_ball white
red_bear green red_bear
white no

OUTPUTFILE
pink_kite red_pen ball
yellow_flower white no
cloud nine_pen pink
brown_ball white
red_bear green
white no

Your help is highly appreciated.
Thanks in advance Smilie

Last edited by sam_2921; 03-18-2009 at 07:05 AM.. Reason: formatting
# 2  
Old 03-18-2009
Code:
awk '{ while(++i<=NF) printf (!a[$i]++) ? $i FS : ""; i=split("",a); print "" }' file

# 3  
Old 03-18-2009
Code:
#!/usr/bin/env python

for line in open('temp.txt', 'r'):
    seen = []
    words = line.rstrip('\n').split()

    for word in words:
        if not word in seen:
            print word,
            seen.append(word)
    print

Output:

Code:
# cat temp.txt
pink_kite red_pen ball pink_kite ball
yellow_flower white no white no
cloud nine_pen pink cloud pink nine_pen
brown_ball white
red_bear green red_bear
white no

# python temp.py
pink_kite red_pen ball
yellow_flower white no
cloud nine_pen pink
brown_ball white
red_bear green
white no

# 4  
Old 03-19-2009
hi perl shoudl be easy.

But you may try below awk

Code:
nawk '
function re_dup(arr,n)
{
	for(i=1;i<num;i++){
		for(j=i+1;j<=num;j++){
			if (arr[i]==arr[j])
				arr[j]=""
		}
	}
}
{
	num=split($0,arr," ")
	re_dup(arr,num)
	for(i=1;i<=num;i++){
		if(arr[i]!="")
			printf("%s ",arr[i])
	}
	printf "\n"
}' filename

# 5  
Old 03-19-2009
Thanks summer_cherry, ShawnMilo and Rubin. Smilie

The nawk and Python codes are running perfect,

but Rubin the awk one liner is giving the error " a[: Event not found. " can u please guide why this error is coming?

Thanks again.
Sam
# 6  
Old 03-19-2009
Thanks summer_cherry, ShawnMilo and Rubin. Smilie

The nawk and Python codes are running perfect,

but Rubin the awk one liner is giving the error " a[: Event not found. " can u please guide why this error is coming?

Thanks again.
Sam
# 7  
Old 03-19-2009
Quote:
Originally Posted by sam_2921
...but Rubin the awk one liner is giving the error " a[: Event not found. " can u please guide why this error is coming?...
I cannot reproduce the same error, obviously use nawk or /usr/xpg4/bin/awk on Solaris. The code works fine either on Solaris or Linux with no error messages.
HTH.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Remove duplicate words from column 1

Tried using sed and uniq but it's removing the entire line. Can't seem to figure a way to just remove the word. Any help is appreciated. I have a file: dog, text1, text2, text3 dog, text1, text2, text3 dog, text1, text2, text3 cat, text1, text2, text3 Trying to remove all duplicate instances... (6 Replies)
Discussion started by: jimmyf
6 Replies

2. Shell Programming and Scripting

Remove last few words from Line

Hi I would like to remove last few words from File Could anybody Help on it. ps -ef | grep mgr.prm | awk '{print $10}' /opt/app/dummyd/xyz/dirprm/mgr.prm /opt/app/dummy/xyz/dirprm/mgr.prm /opt/app/dummy/xyz/dirprm/mgr.prm I want output like /opt/app/dummyd/xyz... (4 Replies)
Discussion started by: tapia
4 Replies

3. Shell Programming and Scripting

Remove duplicate entries from the same line

Hello, I have a file which have several duplicate entries on the same line: File ID source 1 GM GF GM 2 GM GF GM GF GM GF GM GF GM GF 3 GM GF GM SF GM GF GM SF 4 FF FF FF FF 5 FF GM FF ... (2 Replies)
Discussion started by: nans
2 Replies

4. UNIX for Dummies Questions & Answers

Remove Duplicate Two Line Pairs?

So I have a bunch of files that look like this >gi|33332323 MMKCRGVIMVVEKVMKRDGRIVPFDESRIRWAVQ--- >gi|45235353 MMKCR----VEKMRDVFFDESIRWAVQ They go on...sequences are much longer but all in two line (fasta) format. I want to remove duplicate pairs of ID(GI) number and sequence. I tried... (12 Replies)
Discussion started by: bakere19
12 Replies

5. Shell Programming and Scripting

Remove duplicate line on condition

Hi Ive been scratching over this for some time with no solution. I have a file like this 1 bla bla 1 2 bla bla 2 4 bla bla 3 5 bla bla 1 6 bla bla 1 I want to remove consecutive occurrences of lines like bla bla 1, but the first column may be different. Any ideasss?? (23 Replies)
Discussion started by: jamie_123
23 Replies

6. Shell Programming and Scripting

Remove very first pair of duplicate words

I have file which is almost look like below MMIT MMIT ... (2 Replies)
Discussion started by: manas_ranjan
2 Replies

7. Shell Programming and Scripting

Remove all words after first space from each line

My file looks like: asd absjdd sdff vczxs wedssx c dasx ccc I need to keep asd sdff wedssx dasx How do I do that experts?:wall::wall: (1 Reply)
Discussion started by: hakermania
1 Replies

8. Shell Programming and Scripting

remove first few words from a line

Hi All, Sample: 4051 Oct 4 10:03:36 AM 2008: TEST: end of testcase Checking Interface after reload, result fail I need to remove first 10 words of the above line and output should be like Checking Interface after reload, result fail Please help me in this regard. Thanks, (4 Replies)
Discussion started by: shellscripter
4 Replies

9. UNIX for Dummies Questions & Answers

Identify duplicate words in a line using command

Hi, Let me explain the problem clearly: Let the entries in my file be: lion,tiger,bear apple,mango,orange,apple,grape unix,windows,solaris,windows,linux red,blue,green,yellow orange,maroon,pink,violet,orange,pink Can we detect the lines in which one of the words(separated by field... (8 Replies)
Discussion started by: srinivasan_85
8 Replies

10. UNIX for Dummies Questions & Answers

Remove Duplicate line

Hi, I have a scenario here where I have created a flatfile with the below mentioned information. File as you can see is dispalyed in three columns 1st column is FileNameString 2nd column is Report_Name (this has spaces) 3rd column is Flag Result file needed is, removal of duplicate... (1 Reply)
Discussion started by: Student37
1 Replies
Login or Register to Ask a Question