Help | Unix | grep | regular expression | backreference

10-12-2009

Registered User

17, 0

Join Date: Oct 2009

Last Activity: 4 December 2010, 6:41 PM EST

Posts: 17

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hello,

I'm working on learning regular expressions and what I can do with them. I'm using unix to and its programs to experiment and learn what my limitations are with them.

I'm working on duplicating the regular expression:

Code:

 
^(.*)(\r?\n\1)+$

This is supposed to delete duplicate lines from a file.

The full commadline argument that I'm using is:

Code:

 
ps -e | cut -c 25- | sort | grep '^\(.*\)\(\n\1\)+$'

I added \ in front of the ( and ) and ' infront of the ^ and after the $
I also removed the \r? which was for windows support.

ps -e: Outputs a list of most the processes running. This is simply to generate output to work with.

Quote:

29513 ? 00:00:00 bash
31212 ? 00:00:00 man
31215 ? 00:00:00 sh
31216 ? 00:00:00 sh
31221 ? 00:00:00 less
32464 ? 00:00:00 cat

cut -c 25-: Prints only the characters in a string in a line starting with the 25th character. This is to get only the processes names printed.

Quote:

man
sh
sh
less
cat

sort: Will sort the list alphabetically. This is because I think the regular expression requires the list to be sorted.

Quote:

cat
less
man
sh
sh

grep '^$.*$$\n\1$+$': Print lines matching the following pattern.

': Strong quotation, allows the containing characters to put passed "as is" to the grep program.

^: Match all lines that start with "$.*$"

$.*$: Back reference "()" all lines of any length that contain zero or more characters ".*" Basically, store each line and entire line. Each line back referenced is replaced by the next line.

This is where I get kinda lost. Of course I could already be lost without knowing it.

$\n\1$: Back reference "()" a new line "\n" call the previous backreference stored "\1" -> " "$.*$". Make a new line exactly the same as the first back referenced.

+: Is there one or more of the preceding line? Does $.*$ contain (\n\1\)

$: Match the end of line position to the +

': Closing strong qoutation.

Even with the step by step from the below link, I still have a hard time understanding the replacement and repetition.

My output is: Empty.

Quote:

$ ps -e | cut -c 25- | sort | grep '^$.*$$\n\1$+$'
$

My desired out is:

Quote:

cat
less
man
sh

On a side note, I guess this is basis of the code that run the app "uniq".

Code found from http://www.regular-expressions.info/duplicatelines.html
Used for general reference Regular expression - Wikipedia, the free encyclopedia
Used to look at grep grep - Linux Command - Unix Command
Addition unix specific regular expression info Regular Expressions

Last edited by MykC; 10-12-2009 at 02:49 PM..

MykC

View Public Profile for MykC

Find all posts by MykC

10-13-2009

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

grep do not span line boundaries.

P.S. Just to be sure that you know that you need something like this:

Code:

ps -eoargs|sort -u

radoulov

View Public Profile for radoulov

Find all posts by radoulov

10-13-2009

Registered User

17, 0

Join Date: Oct 2009

Last Activity: 4 December 2010, 6:41 PM EST

Posts: 17

Thanks Given: 0

Thanked 0 Times in 0 Posts

Ok, so if I'm trying to use grep to execute the regular expression:

Code:

^(.*)(\r?\n\1)+$

To remove duplicate lines that are in sequence, then its simply a regular expression that goes beyond what grep is able to execute/interpret because it can't deal with more than one line at a time. Ok, I read a bit about the concept about line boundaries and it seems this one of the things sed is used for.

Code:

ps -e | grep -o "[^ ]*$" | sort -u

Is what I used to get the output I was trying to achieve, but this more of an experiment with grep rather getting the output. I guess I going to have to familiarize myself with the concept of line boundaries and any tricks I can use to work with them in odd ways.

Finally, I'm going to play around with this a bit but if grep can't remove duplicate lines like uniq it might be able to remove duplicate characters or patterns.

Last edited by MykC; 10-13-2009 at 10:56 AM..

MykC

View Public Profile for MykC

Find all posts by MykC

UNIX for Dummies Questions & Answers

Help | Unix | grep | regular expression | backreference | Syntax/Logic

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep + Regular expression or

Discussion started by: Anupam_Halder

2. Shell Programming and Scripting

Help with awk script (syntax error in regular expression)

Discussion started by: spacegoose

3. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Discussion started by: jessy83

4. Shell Programming and Scripting

Help with grep / regular expression

Discussion started by: dragon.1431

5. UNIX for Dummies Questions & Answers

Help | unix | grep | regular expression

Discussion started by: MykC

6. UNIX for Dummies Questions & Answers

Syntax Help | unix | grep | regular expression | repetition

Discussion started by: MykC

7. Shell Programming and Scripting

grep regular expression

Discussion started by: Calypso

8. Shell Programming and Scripting

grep with regular expression

Discussion started by: daikeyang

9. UNIX for Advanced & Expert Users

regarding grep regular expression

Discussion started by: ukatru

10. Shell Programming and Scripting

grep : regular expression

Discussion started by: RishiPahuja