How to remove words that contain 3+ of the same character in a row?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove words that contain 3+ of the same character in a row?
# 1  
Old 08-09-2012
How to remove words that contain 3+ of the same character in a row?

Hello,

I am looking for a way to remove words from a list that contain 3 or more of the same character.

For example lets say the full list is as follows
Code:
ABCDEF
ABBHJK
AAAHJD
KKPPPP
NAUJKS

AAAHJD & KKPPPP should be removed from this list as obviously they contain AAA and PPPP respectively.

My first attempt at this was to use
Code:
grep -v '\([[:alpha:]]\)\1' filename

but this will only remove Words with 2+ characters the same in a row.
Code:
grep -v '\([[:alpha:]][[:alpha:]]\)\1' filename will remove 4+

My knowledge of Awk/Sed is quite weak. Can anyone lend some advise as to where I should look from here?

Regards,
Colin

Moderator's Comments:
Mod Comment Please view this code tag video for how to use code tags when posting code and data.
# 2  
Old 08-09-2012
You almost had it the first time. Try:
Code:
grep -v '\([[:alpha:]]\)\1\1' filename

# 3  
Old 08-09-2012
If you need to remove lines with 3 or more occurrences of a character NOT in succession, try
Code:
awk '{p=1;for(i=1;i<=length;i++) if(gsub(substr($0,i,1),"&")>=3) {p=0;break}}p' file

This will also remove lines with 3 or more occurrences of a character in succession.
# 4  
Old 08-09-2012
Ambiguous request. The reply I posted assumed you want to delete lines with three adjacent occurrences of a character. The reply elixir_sinari posted assumed you want to delete any line with three occurrences of a character whether or not they are adjacent. The input you gave will give the same results for either interpretation. What was it that you wanted?
# 5  
Old 08-09-2012
Quote:
Originally Posted by elixir_sinari
If you need to remove lines with 3 or more occurrences of a character NOT in succession, try
Code:
awk '{p=1;for(i=1;i<=length;i++) if(gsub(substr($0,i,1),"&")>=3) {p=0;break}}p' file

This will also remove lines with 3 or more occurrences of a character in succession.
That approach isn't very robust. The first argument to gsub is an extended regular expression. If the line contains a ., it will match every character. If there's a ?, +, *, or some other metacharacter, there may be a runtime regular expression compilation failure.

What you're attempting can be done easily with grep and a single regular expression:
Code:
grep -v '\(.\).*\1.*\1' file

Regards,
Alister

Last edited by alister; 08-09-2012 at 12:38 PM..
# 6  
Old 08-09-2012
Quote:
Originally Posted by alister
That approach isn't very robust. The first argument to gsub is an extended regular expression. If the line contains a ., it will match every character. If there's a ?, +, *, or some other metacharacter, there may be a runtime regular expression compilation failure.

What you're attempting can be done easily with grep and a single regular expression:
Code:
grep -v '\(.\).*\1.*\1' file

Regards,
Alister
I did foresee that possibility while writing the solution. But, I assumed that only alphabets will be in the file.
# 7  
Old 08-09-2012
Quote:
Originally Posted by elixir_sinari
I did foresee that possibility while writing the solution. But, I assumed that only alphabets will be in the file.
Looking at the first post, that seems a reasonable assumption given the sample data and the use of the [:alpha:] class.

I'll leave my post as is just in case it's of any use (as I'm sure you know, sometimes the sample data isn't representative).

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

I have a file which comes every day and the file data look's as below. Vi abc.txt a|b|c|d\n a|g|h|j\n Some times we receive the file with only a new line character in the file like vi abc.txt \n (8 Replies)
Discussion started by: rak Kundra
8 Replies

2. Shell Programming and Scripting

remove words

All, I have a file with below entries. /java/usr/abc/123 /java/usr/xyz/123_21 /java/usr/ab12/345/234 ......... ......... And I need entry as /java/usr/abc/config /java/usr/xyz/config /java/usr/ab12/config ......... ......... Actually, I need to remove all other entries... (2 Replies)
Discussion started by: anshu ranjan
2 Replies

3. Shell Programming and Scripting

Awk: Searching for length of words between slash character

Dear UNIX Community, I have a set of file paths like the one below: \\folder name \ folder1 \ folder2 \ folder3 \ folder4 \\folder name \ very long folder name \ even longer name I would like to find the length of the characters (including space) between the \'s. However, I want... (6 Replies)
Discussion started by: vnayak
6 Replies

4. Shell Programming and Scripting

shell script to print words having first and last character same.

Hi I want to write a shell script to print only those words from a file whose beginning and last character are same. Please help. Thanks, vini (5 Replies)
Discussion started by: vini kumar
5 Replies

5. Shell Programming and Scripting

remove row if string is same as previous row

I have data like: Blue Apple 6 Red Apple 7 Yellow Apple 8 Green Banana 2 Purple Banana 8 Orange Pear 11 What I want to do is if $2 in a row is the same as $2 in the previous row remove that row. An identical $2 may exist more than one time. So the out file would look like: Blue... (4 Replies)
Discussion started by: dcfargo
4 Replies

6. UNIX for Dummies Questions & Answers

Remove words beginning with a certain character from a file

Hi, how could you go about removing words that begin with a certain character. assuming that this character is '-' I currently have echo "-hello" | sed s/-/""/ which replaces the leading dash with nothing but I want to remove the whole word, even if there are multiple words beginning... (3 Replies)
Discussion started by: skinnygav
3 Replies

7. Shell Programming and Scripting

deleting blank line and row containing certain words in single sed command

Hi Is it possible to do the following in a single command /usr/xpg4/bin/sed -e '/rows selected/d' /aemu/CALLAUTO/callauto.txt > /aemu/CALLAUTO/callautonew.txt /usr/xpg4/bin/sed -e '/^$/d' /aemu/CALLAUTO/callautonew.txt > /aemu/CALLAUTO/callauto_new.txt exit (1 Reply)
Discussion started by: aemunathan
1 Replies

8. Shell Programming and Scripting

Need to remove the words

Hi folks, I have file with the below 1245633505 +manual mroennfeldt@news.com.au 1245633506 +manual sal@bynews.com.au 1245633506 +manual whson@btimes.com 1245633507 +manual karla.marsden@tnews.com.au 1245633508 +manual king@netn.com.au Now, I need the output of the files only with... (4 Replies)
Discussion started by: gsiva
4 Replies

9. UNIX for Advanced & Expert Users

Remove words from file

Hello, I have a question: I have two different files, let's call them file1 and file2. file1 contains a list of words, the words are on seperate lines: word1 word2 word3 word4 etc... file2 also contains a list of words, seperated in the same way as file1. What I want to do is... (5 Replies)
Discussion started by: Beeser
5 Replies

10. Shell Programming and Scripting

remove first few words from a line

Hi All, Sample: 4051 Oct 4 10:03:36 AM 2008: TEST: end of testcase Checking Interface after reload, result fail I need to remove first 10 words of the above line and output should be like Checking Interface after reload, result fail Please help me in this regard. Thanks, (4 Replies)
Discussion started by: shellscripter
4 Replies
Login or Register to Ask a Question