finding and removing 2 identical consecutive words in a text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting finding and removing 2 identical consecutive words in a text
# 8  
Old 05-01-2011
i've tried this code but there is the same problem...my input text file is::
Code:
"ana are are mere
mere
ion are prune prunea."

and the output file is:
Code:
"ana are mere

ion are prunea.
"

Code:
sed ':f;N;$!bf; s/\b\(.*\)\n\1\b/\1\n/g;
 s/\b\(.*\)\b\1/\1/g' infile


Last edited by Franklin52; 05-02-2011 at 03:56 AM.. Reason: Please use code tags
# 9  
Old 05-01-2011
Fixed, I forgot to add the word boundary(marked as red) for the identical consecutive word:

Code:
sed ':f;N;$!bf; s/\b\(.*\)\n\1\b/\1\n/g; s/\b\(.*\)\b\1\b/\1/g'

This User Gave Thanks to kevintse For This Post:
# 10  
Old 05-01-2011
it works fine,thanks but there still is a problem...if the text contain 3 identical consecutive words sed only replace the first one ,the last 2 words remaining intacts...have you any ideas?
# 11  
Old 05-01-2011
Quote:
Originally Posted by kevintse
Welcome to the forum!

Try sed:
Code:
sed 's/\b\(.*\)\b\1/\1/g' infile

Kevin, if you don't mind can yo please explain this?

regards,
Ahamed
# 12  
Old 05-01-2011
Quote:
Originally Posted by cocostaec
it works fine,thanks but there still is a problem...if the text contain 3 identical consecutive words sed only replace the first one ,the last 2 words remaining intacts...have you any ideas?
OK, I guess you still have more than 2 identical consecutive words spanning multiple lines, and you have to reserve all the newline characters, this can be a challenge, at least with SED.

---------- Post updated at 09:53 PM ---------- Previous update was at 09:44 PM ----------

Quote:
Originally Posted by ahamed101
Kevin, if you don't mind can yo please explain this?

regards,
Ahamed
Code:
sed 's/\b\(.*\)\b\1/\1/g' infile

Hi, ahamed
Well, the Regex I used was quite simple, text marked as red are word boundaries, text marked as dark green is a capture, text marked as pink are back references.

The first \1 references the previous capture, this simply means matching 2 identical consecutive words.
The second \1 is used to replace the matched text, here the replacement part reserves only one of the 2 consecutive words.
This User Gave Thanks to kevintse For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk - If field value of consecutive records are the identical print portion of lines

I have some data that looks like this: PXD= ZW< 1,6 QR> QRJ== 1,2(5) QR> QRJ== 4,1(2) QR> QRJ== 4,2 QRB= QRB 4,2 QWM QWM 6,2 R<C ZW< 11,2 R<H= R<J= 6,1 R>H XZJ= 1,2(2) R>H XZJ= 2,6(2) R>H XZJ= 4,1(2) R>H XZJ= 6,2 RDP RDP 1,2 What I would like to do is if fields $1 and $2 are... (5 Replies)
Discussion started by: jvoot
5 Replies

2. Shell Programming and Scripting

Get group of consecutive uppercase words using gawk

Hi I'd like to extract, from a text file, the strings starting with "The Thing" and only composed of words with a capital first letter and apostrophes, like for example: "The Thing I Only" from "those are the The Thing I Only go for whatever." or "The Thing That Are Like Men's Eyewear" ... (7 Replies)
Discussion started by: louisJ
7 Replies

3. Shell Programming and Scripting

Scan a file in realtime and execute certain commands on encountering 5 consecutive identical lines

Mysql log has something like below: I need a bash shell script that will do the following: 1) The script will scan the mysql.log file constantly in real time (something like tail -F mysql.log) 2) If it encounters 5 consecutive identical lines then it would invoke some commands (say... (4 Replies)
Discussion started by: proactiveaditya
4 Replies

4. Shell Programming and Scripting

Removing consecutive lines in a file

We have very large transaction logs that have transactions which start with a line that starts with 'Begin :' and ends with a line that starts with 'End :'. For most transactions there is valid data between those two lines. I am trying to get rid of lines that look like this: Begin :... (11 Replies)
Discussion started by: deneuve01
11 Replies

5. Shell Programming and Scripting

Finding consecutive same words in a file

Hi All, I tried this but I am having trouble formulating this: I have a file that looks like this (this is a sample file words can be different): network router frame network router computer card host computer card One can see that in this file "network" and "router" occur... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

6. UNIX for Dummies Questions & Answers

deleting words in list with more than 2 identical adjacent characters

Morning Guys & Gals, I am trying to figure out a way to remove lines from a file that have more than 2 identical characters in sequence.. So if for instance the list would look like ; the output would be ; I can't seem to get my head around perl (among many other... (7 Replies)
Discussion started by: TAPE
7 Replies

7. SuSE

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

8. Programming

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

9. Shell Programming and Scripting

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

10. Shell Programming and Scripting

Removing identical words in column

I have a file that needs to be cleaned up. Here is the file: Project Project John Project Gary Project Sean Project2 Project2 Lisa Project2 Tyler Project2 Sam Project3 Project3 Mike Project3 Bran I need the o/p to be: Project John Gary Sean Project2 (7 Replies)
Discussion started by: leepet01
7 Replies
Login or Register to Ask a Question