How to remove words that contain 3+ of the same character in a row?

Login or Register to Ask a Question and Join Our Community

How to remove words that contain 3+ of the same character in a row?

Tags

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting How to remove words that contain 3+ of the same character in a row?

08-09-2012

Registered User

14, 0

Join Date: Nov 2010

Last Activity: 14 April 2014, 9:10 AM EDT

Posts: 14

Thanks Given: 0

Thanked 0 Times in 0 Posts

How to remove words that contain 3+ of the same character in a row?

Hello,

I am looking for a way to remove words from a list that contain 3 or more of the same character.

For example lets say the full list is as follows

Code:

ABCDEF
ABBHJK
AAAHJD
KKPPPP
NAUJKS

AAAHJD & KKPPPP should be removed from this list as obviously they contain AAA and PPPP respectively.

My first attempt at this was to use

Code:

grep -v '\([[:alpha:]]\)\1' filename

but this will only remove Words with 2+ characters the same in a row.

Code:

grep -v '\([[:alpha:]][[:alpha:]]\)\1' filename will remove 4+

My knowledge of Awk/Sed is quite weak. Can anyone lend some advise as to where I should look from here?

Regards,
Colin

Moderator's Comments:

Mod Comment

Please view this code tag video for how to use code tags when posting code and data.

colinireland

View Public Profile for colinireland

Find all posts by colinireland

08-09-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You almost had it the first time. Try:

Code:

grep -v '\([[:alpha:]]\)\1\1' filename

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-09-2012

Registered User

1,413, 498

Join Date: Mar 2012

Last Activity: 8 November 2019, 2:39 AM EST

Location: India

Posts: 1,413

Thanks Given: 101

Thanked 498 Times in 474 Posts

If you need to remove lines with 3 or more occurrences of a character NOT in succession, try

Code:

awk '{p=1;for(i=1;i<=length;i++) if(gsub(substr($0,i,1),"&")>=3) {p=0;break}}p' file

This will also remove lines with 3 or more occurrences of a character in succession.

elixir_sinari

View Public Profile for elixir_sinari

Find all posts by elixir_sinari

08-09-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Ambiguous request. The reply I posted assumed you want to delete lines with three adjacent occurrences of a character. The reply elixir_sinari posted assumed you want to delete any line with three occurrences of a character whether or not they are adjacent. The input you gave will give the same results for either interpretation. What was it that you wanted?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-09-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by elixir_sinari

If you need to remove lines with 3 or more occurrences of a character NOT in succession, try

Code:

awk '{p=1;for(i=1;i<=length;i++) if(gsub(substr($0,i,1),"&")>=3) {p=0;break}}p' file

This will also remove lines with 3 or more occurrences of a character in succession.

That approach isn't very robust. The first argument to gsub is an extended regular expression. If the line contains a ., it will match every character. If there's a ?, +, *, or some other metacharacter, there may be a runtime regular expression compilation failure.

What you're attempting can be done easily with grep and a single regular expression:

Code:

grep -v '\(.\).*\1.*\1' file

Regards,
Alister

Last edited by alister; 08-09-2012 at 12:38 PM..

alister

View Public Profile for alister

Find all posts by alister

08-09-2012

Registered User

1,413, 498

Join Date: Mar 2012

Last Activity: 8 November 2019, 2:39 AM EST

Location: India

Posts: 1,413

Thanks Given: 101

Thanked 498 Times in 474 Posts

Quote:

Originally Posted by alister

That approach isn't very robust. The first argument to gsub is an extended regular expression. If the line contains a ., it will match every character. If there's a ?, +, *, or some other metacharacter, there may be a runtime regular expression compilation failure.

What you're attempting can be done easily with grep and a single regular expression:

Code:

grep -v '\(.\).*\1.*\1' file

Regards,
Alister

I did foresee that possibility while writing the solution. But, I assumed that only alphabets will be in the file.

elixir_sinari

View Public Profile for elixir_sinari

Find all posts by elixir_sinari

08-09-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by elixir_sinari

I did foresee that possibility while writing the solution. But, I assumed that only alphabets will be in the file.

Looking at the first post, that seems a reasonable assumption given the sample data and the use of the [:alpha:] class.

I'll leave my post as is just in case it's of any use (as I'm sure you know, sometimes the sample data isn't representative).

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

I have a file which comes every day and the file data look's as below. Vi abc.txt a|b|c|d\n a|g|h|j\n Some times we receive the file with only a new line character in the file like vi abc.txt \n

2. Shell Programming and Scripting

remove words

All, I have a file with below entries. /java/usr/abc/123 /java/usr/xyz/123_21 /java/usr/ab12/345/234 ......... ......... And I need entry as /java/usr/abc/config /java/usr/xyz/config /java/usr/ab12/config ......... ......... Actually, I need to remove all other entries...

3. Shell Programming and Scripting

Awk: Searching for length of words between slash character

Dear UNIX Community, I have a set of file paths like the one below: \\folder name \ folder1 \ folder2 \ folder3 \ folder4 \\folder name \ very long folder name \ even longer name I would like to find the length of the characters (including space) between the \'s. However, I want...

4. Shell Programming and Scripting

shell script to print words having first and last character same.

Hi I want to write a shell script to print only those words from a file whose beginning and last character are same. Please help. Thanks, vini

5. Shell Programming and Scripting

remove row if string is same as previous row

I have data like: Blue Apple 6 Red Apple 7 Yellow Apple 8 Green Banana 2 Purple Banana 8 Orange Pear 11 What I want to do is if $2 in a row is the same as $2 in the previous row remove that row. An identical $2 may exist more than one time. So the out file would look like: Blue...

6. UNIX for Dummies Questions & Answers

Remove words beginning with a certain character from a file

Hi, how could you go about removing words that begin with a certain character. assuming that this character is '-' I currently have echo "-hello" | sed s/-/""/ which replaces the leading dash with nothing but I want to remove the whole word, even if there are multiple words beginning...

7. Shell Programming and Scripting

deleting blank line and row containing certain words in single sed command

Hi Is it possible to do the following in a single command /usr/xpg4/bin/sed -e '/rows selected/d' /aemu/CALLAUTO/callauto.txt > /aemu/CALLAUTO/callautonew.txt /usr/xpg4/bin/sed -e '/^$/d' /aemu/CALLAUTO/callautonew.txt > /aemu/CALLAUTO/callauto_new.txt exit

8. Shell Programming and Scripting

Need to remove the words

Hi folks, I have file with the below 1245633505 +manual mroennfeldt@news.com.au 1245633506 +manual sal@bynews.com.au 1245633506 +manual whson@btimes.com 1245633507 +manual karla.marsden@tnews.com.au 1245633508 +manual king@netn.com.au Now, I need the output of the files only with...

9. UNIX for Advanced & Expert Users

Remove words from file

Hello, I have a question: I have two different files, let's call them file1 and file2. file1 contains a list of words, the words are on seperate lines: word1 word2 word3 word4 etc... file2 also contains a list of words, seperated in the same way as file1. What I want to do is...

10. Shell Programming and Scripting

remove first few words from a line

Hi All, Sample: 4051 Oct 4 10:03:36 AM 2008: TEST: end of testcase Checking Interface after reload, result fail I need to remove first 10 words of the above line and output should be like Checking Interface after reload, result fail Please help me in this regard. Thanks,

Login or Register to Ask a Question