Egrep find word that occurs twice in a row


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Egrep find word that occurs twice in a row
# 8  
Old 10-19-2017
Quote:
Originally Posted by Scrutinizer
In addition to Don's suggestion :
grep does not know +, so you woud need to use \{1,\} instead.
In the example, a closing square bracket and repeat operators appears to be missing, so I think it would need to be modified like so:
Code:
grep -e '^\([a-z]\{1,\}\) \1$' -e '^\([a-z]\{1,\}\) \1 ' -e ' \([a-z]\{1,\}\) \1 ' -e ' \([a-z]\{1,\}\) \1$'

Where both the sub-pattern and its back reference are on word boundaries, either at the beginning followed by space, at the end preceded by space or in between space characters.

But without word boundary operators, it gets more complicated when the words do not have to be adjacent:
Code:
grep -e '^\([a-z]\{1,\}\) \([^ ]* \)*\1$' -e '^\([a-z]\{1,\}\) \([^ ]* \)*\1 ' -e ' \([a-z]\{1,\}\) \([^ ]* \)*\1 ' -e ' \([a-z]\{1,\}\) \([^ ]* \)*\1$'

Another thing to note that this is just the case where words are on the boundaries with a space. But there can be comma's, semicolons punctuations etcetera.


--
If you have GNU or BSD grep (as opposed to standard grep) then you can use word boundaries as an extension to regex, so it can be simplified into something like this:
Code:
grep '\<\([a-z]\{1,\}\)\>.*\<\1\>'

They also support back reference with extended regular expressions so, you can can do this:
Code:
grep -E '\<([a-z]+)\>.*\<\1\>'

Note in general instead of [a-z], it is preferable to use [[:lower:]] for lowercase or [[:alpha:]] which matches both upper and lower case in all compliant code sets..
Thank you for the corrections Scrutinizer.

The grep command:
Code:
grep '\([a-z]+\)\1'

does include a valid BRE, but what it is looking for is a single character in the codeset of the current locale that is greater than or equal to a and less than or equal to z followed by a literal plus sign (+) followed by another copy of those same two characters.

In addition to using [a-z]\{1,\} to get one or more "a" through "z" characters, one can also use [a-z][a-z]* to match the same set of characters.

I had used a single space between words intentionally, because I thought that was what was wanted (i.e., adjacent duplicate words.) The BREs you used look for duplicated words on a single line whether or not they are adjacent.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find a word and increment the number in the word & save into new files

Hi All, I am looking for a perl/awk/sed command to auto-increment the numbers line in file, P1.tcl: run_build_model sparc_ifu_dec run_drc set_faults -model path_delay -atpg_effectiveness -fault_coverage add_delay_paths P1 set_atpg -abort_limit 1000 run_atpg -ndetects 1000 I would like... (6 Replies)
Discussion started by: jypark22
6 Replies

2. Shell Programming and Scripting

Find word in a line and output in which line the word occurs / no. of times it occurred

I have a file: file.txt, which contains the following data in it. This is a file, my name is Karl, what is this process, karl is karl junior, file is a test file, file's name is file.txt My name is not Karl, my name is Karl Joey What is your name? Do you know your name and... (3 Replies)
Discussion started by: anuragpgtgerman
3 Replies

3. Shell Programming and Scripting

How to find a phrase and pull all lines that follow until the phrase occurs again?

I want to burst a report by using the page number value in the report header. Each section starts with *PAGE NO:* 1 Each section might have several pages, but the next section always starts back at 1. So I want to find the "*PAGE NO:* 1" value and pull all lines that follow until "*PAGE NO:* 1"... (4 Replies)
Discussion started by: Scottie1954
4 Replies

4. Shell Programming and Scripting

perl lwp find word and print next word :)

hi all, I'm new there, I'm just playing with perl and lwp and I just successfully created a script for log in to a web site with post. I have a response but I would like to have something like this: I have in my response lines like: <div class="sender">mimi020</div> <some html code.....>... (3 Replies)
Discussion started by: vogueestylee
3 Replies

5. UNIX for Dummies Questions & Answers

Find EXACT word in files, just the word: no prefix, no suffix, no 'similar', just the word

I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL. I need the whole word... (6 Replies)
Discussion started by: chicchan
6 Replies

6. Shell Programming and Scripting

Find and replace a word in all the files (that contain the word) under a directory

Hi Everyone, I am looking for a simple way for replacing all the files under a directory that use the server "xsgd1234dap" with "xsdr3423pap". For Example: In the Directory, $pwd /home/nick $ grep -l "xsgd1234dap" *.sh | wc -l 119 I have "119" files that are still using... (5 Replies)
Discussion started by: filter
5 Replies

7. Shell Programming and Scripting

Need to replace the first word of a line if it occurs again in the next line(shell)

Hi folks, have a look into the attachment, i am not familiar with unix, can you please help me in this regard. thanks in advance, :) regards, Geeko (4 Replies)
Discussion started by: geeko
4 Replies

8. Shell Programming and Scripting

Looking for a single line to count how many times one character occurs in a word...

I've been looking on the internet, and haven't found anything simple enough to use in my code. All I want to do is count how many times "-" occurs in a string of characters (as a package name). It seems it should be very simple, and shouldn't require more than one line to accomplish. And this is... (2 Replies)
Discussion started by: Shingoshi
2 Replies

9. Shell Programming and Scripting

find a word in a file, and change a word beneath it ??

Hi all, I have a file with lines written somewhat like this. aaaa ccc aa linux browse = no xssxw cdcedc dcsdcd csdw police dwed dwd browse = no cdecec (2 Replies)
Discussion started by: vikas027
2 Replies

10. Shell Programming and Scripting

TO find the word which occurs maximum number of times

Hi Folks !!!!!!!!!!!!!!!!!!! My Requirement is............. i have a input file: 501,501.chan 502,502.anand 503,503.biji 504,504.raja 505,505.chan 506,506.anand 507,507.chan and my o/p should be chan->3 i.e. the word which occurs maximum number of times in a file should be... (5 Replies)
Discussion started by: aajan
5 Replies
Login or Register to Ask a Question