finding and removing 2 identical consecutive words in a text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting finding and removing 2 identical consecutive words in a text
# 1  
Old 05-01-2011
finding and removing 2 identical consecutive words in a text

i want to write a shell script that correct a text file.for example if i have the input file:
Code:
"john has has 2 apples
anne has 3 oranges oranges"

i want that the output file be like this:
Code:
"john has 2 apples
anne has 3 oranges"

i've tried to read line by line from input text file into array for donn't losing "\n" when i write the result in te output file but i got stuck
can anyone help me pls?
thanks

Last edited by Franklin52; 05-02-2011 at 03:55 AM.. Reason: Please use code tags
# 2  
Old 05-01-2011
Welcome to the forum!

Try sed:
Code:
sed 's/\b\(.*\)\b\1/\1/g' infile

# 3  
Old 05-01-2011
Code:
awk '{for (i=2;i<=NF;i++) while($(i+1)==$i){sub($i FS $(i+1),$i,$0)}}1' infile

# 4  
Old 05-01-2011
thanks for those ideas,its works...but the problem appear when last word from a line is identical with first word from next line...any suggestions?
# 5  
Old 05-01-2011
You still can use SED, but this is not that efficient if your file is large:
Code:
sed ':f;N;$!bf; s/\b\(.*\)\n\1\b/\1\n/g; s/\b\(.*\)\b\1/\1/g' infile

# 6  
Old 05-01-2011
yes,i know...but sed isn't so efficiently because if we have "apple apples" the firs word will be removed even if this 2 words are not equal
# 7  
Old 05-01-2011
Quote:
Originally Posted by cocostaec
yes,i know...but sed isn't so efficiently because if we have "apple apples" the firs word will be removed even if this 2 words are not equal
The code I provided does not have this problem, please try it out.
That I said it is not that efficient if your file is large does not necessarily mean that it can be so inefficient that it is not acceptable.Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk - If field value of consecutive records are the identical print portion of lines

I have some data that looks like this: PXD= ZW< 1,6 QR> QRJ== 1,2(5) QR> QRJ== 4,1(2) QR> QRJ== 4,2 QRB= QRB 4,2 QWM QWM 6,2 R<C ZW< 11,2 R<H= R<J= 6,1 R>H XZJ= 1,2(2) R>H XZJ= 2,6(2) R>H XZJ= 4,1(2) R>H XZJ= 6,2 RDP RDP 1,2 What I would like to do is if fields $1 and $2 are... (5 Replies)
Discussion started by: jvoot
5 Replies

2. Shell Programming and Scripting

Get group of consecutive uppercase words using gawk

Hi I'd like to extract, from a text file, the strings starting with "The Thing" and only composed of words with a capital first letter and apostrophes, like for example: "The Thing I Only" from "those are the The Thing I Only go for whatever." or "The Thing That Are Like Men's Eyewear" ... (7 Replies)
Discussion started by: louisJ
7 Replies

3. Shell Programming and Scripting

Scan a file in realtime and execute certain commands on encountering 5 consecutive identical lines

Mysql log has something like below: I need a bash shell script that will do the following: 1) The script will scan the mysql.log file constantly in real time (something like tail -F mysql.log) 2) If it encounters 5 consecutive identical lines then it would invoke some commands (say... (4 Replies)
Discussion started by: proactiveaditya
4 Replies

4. Shell Programming and Scripting

Removing consecutive lines in a file

We have very large transaction logs that have transactions which start with a line that starts with 'Begin :' and ends with a line that starts with 'End :'. For most transactions there is valid data between those two lines. I am trying to get rid of lines that look like this: Begin :... (11 Replies)
Discussion started by: deneuve01
11 Replies

5. Shell Programming and Scripting

Finding consecutive same words in a file

Hi All, I tried this but I am having trouble formulating this: I have a file that looks like this (this is a sample file words can be different): network router frame network router computer card host computer card One can see that in this file "network" and "router" occur... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

6. UNIX for Dummies Questions & Answers

deleting words in list with more than 2 identical adjacent characters

Morning Guys & Gals, I am trying to figure out a way to remove lines from a file that have more than 2 identical characters in sequence.. So if for instance the list would look like ; the output would be ; I can't seem to get my head around perl (among many other... (7 Replies)
Discussion started by: TAPE
7 Replies

7. SuSE

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

8. Programming

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

9. Shell Programming and Scripting

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

10. Shell Programming and Scripting

Removing identical words in column

I have a file that needs to be cleaned up. Here is the file: Project Project John Project Gary Project Sean Project2 Project2 Lisa Project2 Tyler Project2 Sam Project3 Project3 Mike Project3 Bran I need the o/p to be: Project John Gary Sean Project2 (7 Replies)
Discussion started by: leepet01
7 Replies
Login or Register to Ask a Question