Help | Unix | grep | regular expression | backreference | Syntax/Logic


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help | Unix | grep | regular expression | backreference | Syntax/Logic
# 1  
Old 10-12-2009
Help | Unix | grep | regular expression | backreference | Syntax/Logic

Hello,

I'm working on learning regular expressions and what I can do with them. I'm using unix to and its programs to experiment and learn what my limitations are with them.

I'm working on duplicating the regular expression:
Code:
 
^(.*)(\r?\n\1)+$

This is supposed to delete duplicate lines from a file.

The full commadline argument that I'm using is:
Code:
 
ps -e | cut -c 25- | sort | grep '^\(.*\)\(\n\1\)+$'

I added \ in front of the ( and ) and ' infront of the ^ and after the $
I also removed the \r? which was for windows support.

ps -e: Outputs a list of most the processes running. This is simply to generate output to work with.

Quote:
29513 ? 00:00:00 bash
31212 ? 00:00:00 man
31215 ? 00:00:00 sh
31216 ? 00:00:00 sh
31221 ? 00:00:00 less
32464 ? 00:00:00 cat
cut -c 25-: Prints only the characters in a string in a line starting with the 25th character. This is to get only the processes names printed.

Quote:
man
sh
sh
less
cat
sort: Will sort the list alphabetically. This is because I think the regular expression requires the list to be sorted.

Quote:
cat
less
man
sh
sh
grep '^\(.*\)\(\n\1\)+$': Print lines matching the following pattern.

': Strong quotation, allows the containing characters to put passed "as is" to the grep program.

^: Match all lines that start with "\(.*\)"

\(.*\): Back reference "()" all lines of any length that contain zero or more characters ".*" Basically, store each line and entire line. Each line back referenced is replaced by the next line.

This is where I get kinda lost. Of course I could already be lost without knowing it.

\(\n\1\): Back reference "()" a new line "\n" call the previous backreference stored "\1" -> " "\(.*\)". Make a new line exactly the same as the first back referenced.

+: Is there one or more of the preceding line? Does \(.*\) contain (\n\1\)

$: Match the end of line position to the +

': Closing strong qoutation.

Even with the step by step from the below link, I still have a hard time understanding the replacement and repetition.

My output is: Empty.

Quote:
$ ps -e | cut -c 25- | sort | grep '^\(.*\)\(\n\1\)+$'
$
My desired out is:
Quote:
cat
less
man
sh
On a side note, I guess this is basis of the code that run the app "uniq".

Code found from http://www.regular-expressions.info/duplicatelines.html
Used for general reference Regular expression - Wikipedia, the free encyclopedia
Used to look at grep grep - Linux Command - Unix Command
Addition unix specific regular expression info Regular Expressions

Last edited by MykC; 10-12-2009 at 03:49 PM..
# 2  
Old 10-13-2009
grep do not span line boundaries.

P.S. Just to be sure that you know that you need something like this:

Code:
ps -eoargs|sort -u

# 3  
Old 10-13-2009
Ok, so if I'm trying to use grep to execute the regular expression:

Code:
^(.*)(\r?\n\1)+$

To remove duplicate lines that are in sequence, then its simply a regular expression that goes beyond what grep is able to execute/interpret because it can't deal with more than one line at a time. Ok, I read a bit about the concept about line boundaries and it seems this one of the things sed is used for.

Code:
ps -e | grep -o "[^ ]*$" | sort -u

Is what I used to get the output I was trying to achieve, but this more of an experiment with grep rather getting the output. I guess I going to have to familiarize myself with the concept of line boundaries and any tricks I can use to work with them in odd ways.

Finally, I'm going to play around with this a bit but if grep can't remove duplicate lines like uniq it might be able to remove duplicate characters or patterns.

Last edited by MykC; 10-13-2009 at 11:56 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep + Regular expression or

Hi , I have few lines like A20120101.ANU.ZIP A20120401.ABC.ZIP A20120105.KJK.ZIP A20120809.JUG.ZIP A20120101.MAT.ZIP B20120301.ANU.XIP I want to filter by 1. Files starting with A and Ending With Z ( ^A.*.ZIP$) 2. And either ANU, or KJK or MAT in the file name. Hope my... (6 Replies)
Discussion started by: Anupam_Halder
6 Replies

2. Shell Programming and Scripting

Help with awk script (syntax error in regular expression)

I've found this script which seems very promising to solve my issue: To search and replace many different database passwords in many different (.php, .pl, .cgi, etc.) files across my filesystem. The passwords may or may not be contained within quotes, single quotes, etc. #!/bin/bash... (4 Replies)
Discussion started by: spacegoose
4 Replies

3. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

4. Shell Programming and Scripting

Help with grep / regular expression

Hi, Input file: -13- -1er- -1xyz1- -1xz12- -2ab1- -2ab2-- -143- Code: grep '^*\-' input.txt Wrong output: -13- -1xyz1- -2ab1- -2ab2-- (4 Replies)
Discussion started by: dragon.1431
4 Replies

5. UNIX for Dummies Questions & Answers

Help | unix | grep | regular expression

I have the following code: ls -al /bin | tr -s ' ' | grep 'x' ls -al: Lists all the files in a given director such as /bin tr -s ' ': removes additional spaces between characters so that there is only one space grep 'x': match all "x" characters that are followed by a whitespace. I was... (3 Replies)
Discussion started by: MykC
3 Replies

6. UNIX for Dummies Questions & Answers

Syntax Help | unix | grep | regular expression | repetition

Hello, This is my first post so, Hello World! Anyways, I'm learning how to use unix and its quickly become apparent that a strong foundation in regular expressions will make things easier. I'm not sure if my syntax is messing things up or my logic is messing things up. ps -e | grep... (4 Replies)
Discussion started by: MykC
4 Replies

7. Shell Programming and Scripting

grep regular expression

please can someone tell me what the following regrex means grep "^aa*$" <file> I thought this would match any word beginning with aa and ending with $, but it doesnt. Thanks in advance Calypso (7 Replies)
Discussion started by: Calypso
7 Replies

8. Shell Programming and Scripting

grep with regular expression

Hi, guys. I have one question, hope somebody can give me a hand I have a file called passwd, the contents of it arebelow: *********************** ... goldsimj:x:5008:200: goldsij2:x:5009:200: whitej:x:5010:201: brownj:x:5011:202: goldsij3:x:5012:204: greyp:x:5013:203: ...... (6 Replies)
Discussion started by: daikeyang
6 Replies

9. UNIX for Advanced & Expert Users

regarding grep regular expression

When i do ls -ld RT_BP* i am getting the following list. drwxrwx--- 2 user group 256 Oct 17 10:09 RT_BP809 drwxrwx--- 2user group 256 Oct 17 10:09 RT_BP809.O drwxrwx--- 2 user group 256 Oct 17 10:09 RT_BP810 drwxrwx--- 2user group 256 Oct... (2 Replies)
Discussion started by: ukatru
2 Replies

10. Shell Programming and Scripting

grep : regular expression

guys, my requirment goes like this: I have a file, and wish to filter out records where 1. The first letter is o or O and 2. The next 4 following letter should not be ther I do not wish to use pipe and wish to do it in one shot. The best expression I came up with is: grep ^*... (10 Replies)
Discussion started by: RishiPahuja
10 Replies
Login or Register to Ask a Question