grep fixed string with regex


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting grep fixed string with regex
# 1  
Old 11-08-2010
Error grep fixed string with regex

Hello, all! Maybe the title is badly formulated, you can help me with that...!

I'm using the GNU grep, and I need to make sure that grep will extract only what I tell it to.

I have the following regular expression: [a-z_][a-z0-9_-]*[$]?

Well, I need to make sure I grep only a word which may start with a lowercase letter or underline, the following characters may contain numbers and dashes aswell, and the last character can be a dollar sign.

Would you help me with this command?

Code:
$ echo "u_s9Ae-u" | grep "[a-z_][a-z0-9_-]*[$]\?" # should not grep "A"
u_s9Ae-u

$ echo "u_s9Ae-u" | grep -x "[a-z_][a-z0-9_-]*[$]\?"
u_s9Ae-u

$ echo "u_s9Ae-u" | grep -o "[a-z_][a-z0-9_-]*[$]\?"
u_s9
e-u

Tried many different commands with no success...

Best regards!
Teresa and Junior
# 2  
Old 11-08-2010
Quote:
Originally Posted by teresaejunior
Well, I need to make sure I grep only a word which may start with a lowercase letter or underline, the following characters may contain numbers and dashes aswell, and the last character can be a dollar sign.
It usually pays to simply re-read the definition you start with when constructing regexps. In this case:

1.) only a word
If we use the old IBM definition of a "word": A word is a sequence of non-blank characters, separated by blanks. ; we end up with something like:

Code:
grep '[<b><tab>]*[^<b><tab>][^<b><tab>][<b><tab>]*'

We search for an (optional) blank/tab character, followed by one or more non-blanks/non-tabs, followed by an optional tab/blank.

2.) which may start with a lowercase letter or underline,
Ok, we fine-tune our definition of the word:

Code:
grep '[<b><tab>]*[^_a-z][^<b><tab>][<b><tab>]*'

We search for an (optional) blank/tab character, followed by one underline or lowercase letter, more non-blanks/non-tabs, followed by an optional tab/blank.

3.) the following characters may contain numbers and dashes aswell,
more fine-tuning on what we mean by "word" here, i think it is self-explanatory now:

Code:
grep '[<b><tab>]*[^_a-z][-_a-z0-9][-_a-z0-9]*[<b><tab>]*'

4.) and the last character can be a dollar sign.
still more fine-tuning:

Code:
grep '[<b><tab>]*[^_a-z][-_a-z0-9][-_a-z0-9]*\$*[<b><tab>]*'

Probably we could drop the ending "[<b><tab>]*" now, because it might be superfluous - you will have to decide that by running the regexp against your data. Replace "<b>" and "<tab>" with literal blanks/tabs when you enter the code, i just used this to make them visible.

I hope this helps.

bakunin
# 3  
Old 11-08-2010
Thank you, bakunin! But it still greps the "A", or I'm doing something wrong... The idea is: we prompt the user for a string, and then we check if it matches the criteria. So the echo thing is actually used, and the pipe later:

Code:
$ echo " u_sA9e-u " | grep '[ ]*[^_a-z][-_a-z0-9]*\$*[ ]*'
 u_sA9e-u 

$ echo "u_sA9e-u" | grep '[ \t]*[^_a-z][-_a-z0-9]*\$*[ \t]*'
u_sA9e-u

$ echo "u_sA9e-u" | grep -w '[ \t]*[^_a-z][-_a-z0-9]*\$*[ \t]*'

$ echo "u_sA9e-u" | grep -x '[ \t]*[^_a-z][-_a-z0-9]*\$*[ \t]*'

$ echo "u_sA9e-u" | grep -o '[ \t]*[^_a-z][-_a-z0-9]*\$*[ \t]*'
A9e-u

And I tried a bunch of different commands again with no luck... Would you try it there?

Best regards!
Teresa e Junior
# 4  
Old 11-08-2010
@Bakunin, we should also take words at the start (^) or the end of the line ($). Using [<b><tab>]* with no further anchors means that it may match part of a word too, since we are allowing occurrence on both sides to be zero.

So I think we need something lie this.:
Code:
grep -E '([[:space:]]|^)[a-z_][a-z0-9_-]*[$]?([[:space:]]|$)'

we cannot use word GNU word boundaries (\b) here since dashes are part of the allowed character set.
# 5  
Old 11-08-2010
Quote:
Originally Posted by Scrutinizer
@Bakunin, we should also take words at the start (^) or the end of the line ($). Using [<b><tab>]* with no further anchors means that it may match part of a word too, since we are allowing occurrence on both sides to be zero.

So I think we need something lie this.:
Code:
grep -E '([[:space:]]|^)[a-z_][a-z0-9_-]*[$]?([[:space:]]|$)'

we cannot use word GNU word boundaries (\b) here since dashes are part of the allowed character set.
Code:
$ echo "u_sA9e-u" | grep -E "([[:space:]]|^)[a-z_][a-z0-9_-]*[$]?([[:space:]]|$)"
u_sA9e-u

Any ideas?
# 6  
Old 11-08-2010
Strange, when I do this I get:
Code:
$ echo "u_sA9e-u" | grep -E "([[:space:]]|^)[a-z_][a-z0-9_-]*[$]?([[:space:]]|$)"
$ echo "u_sx9e-u" | grep -E "([[:space:]]|^)[a-z_][a-z0-9_-]*[$]?([[:space:]]|$)"
u_sx9e-u

Are you sure you are using GNU grep?
# 7  
Old 11-08-2010
Hello!

I have the following alias: alias grep='grep --color=auto', and the difference between the following commands is that the one which outputs colors is with the 'x', the other is black and white... To bypass the alias I tried "\grep", but it doesn't change.

Code:
$ echo "u_sA9e-u" | grep -E "([[:space:]]|^)[a-z_][a-z0-9_-]*[$]?([[:space:]]|$)"
u_sA9e-u
$ echo "u_sx9e-u" | grep -E "([[:space:]]|^)[a-z_][a-z0-9_-]*[$]?([[:space:]]|$)"
u_sx9e-u

Code:
$ apt-cache show grep
Description: GNU grep, egrep and fgrep
 'grep' is a utility to search for text in files; it can be used from the
 command line or in scripts.  Even if you don't want to use it, other packages
 on your system probably will.
 .
 The GNU family of grep utilities may be the "fastest grep in the west".
 GNU grep is based on a fast lazy-state deterministic matcher (about
 twice as fast as stock Unix egrep) hybridized with a Boyer-Moore-Gosper
 search for a fixed string that eliminates impossible text from being
 considered by the full regexp matcher without necessarily having to
 look at every character. The result is typically many times faster
 than Unix grep or egrep. (Regular expressions containing backreferencing
 will run more slowly, however.)


Though I noticed the following behavior:
Code:
$ var=Aa90
$ echo ${var//[a-z]/}
90
echo ${var//['a-z']/}
a90

---------- Post updated at 08:30 AM ---------- Previous update was at 08:26 AM ----------

$ bash --posix
bash-4.1$ echo "u_sA9e-u" | grep -E "([[:space:]]|^)[a-z_][a-z0-9_-]*[$]([[:space:]]|$)"
u_sA9e-u
bash-4.1$

Last edited by teresaejunior; 11-08-2010 at 06:52 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grep multiple words in a file with help of fixed string switch

I have multiple strings in a file which have special character $, when i search strings by ignoring $ with \ using single quotes it returns empty results. My search strings are set char_1($lock) and set new_char_clear_3($unlock) I tried searching with but it returns empty results.However... (3 Replies)
Discussion started by: g_eashwar
3 Replies

2. Shell Programming and Scripting

Grep string with regex numeric characters

Hi all, I have the following entries in a file: Cause Indicators=80 90 Cause Indicators=80 90 Cause Indicators=82 90 Cause Indicators=82 90 Cause Indicators=82 90 The first 2 digits might change so I am after a sort of grep which could find any first 2 digits + the second 2,... (3 Replies)
Discussion started by: nms
3 Replies

3. Shell Programming and Scripting

Grep with regex containing one string but not the other

Hi to you all, I'm just struggling with a regex problem and I'm pretty sure that I'm missing sth obvious... :confused: I need a regex to feed my grep in order to find lines that contain one string but not the other. Here's the data example: 2015-04-08 19:04:55,926|xxxxxxxxxx| ... (11 Replies)
Discussion started by: stresing
11 Replies

4. Shell Programming and Scripting

grep regex, match exact string which includes "/" anywhere on line.

I have a file that contains the 2 following lines (from /proc/mounts) /dev/sdc1 /mnt/backup2 xfs rw,relatime,attr2,noquota 0 0 /dev/sdb1 /mnt/backup xfs rw,relatime,attr2,noquota 0 0 I need to match the string in the second column exactly so that only one result is returned, e.g. > grep... (2 Replies)
Discussion started by: jelloir
2 Replies

5. Shell Programming and Scripting

Getting a string without fixed delimiters

I have a line of text for example aaaa bbbb cccc dddd eeee ffffff I would need to get the cccc however bbbb could be there or not. So whether bbbb is in the line or not I need cccc. I was looking at either awk or sed....and trying to start at c and end until the next space. Also... (11 Replies)
Discussion started by: bombcan1
11 Replies

6. UNIX for Dummies Questions & Answers

Using grep to check for character at fixed position

i have a file (test.txt) that contains: 20799510617900000928000000005403020110315V 20799510617900000928000000005403020110316 20799510617900000928000000005403020110317 20799510617900000928000000005403020110318V grep V test.txt > /tmp/void.log if then mail -s "void" < test.txt fi... (2 Replies)
Discussion started by: tjmannonline
2 Replies

7. Shell Programming and Scripting

filtering out duplicate substrings, regex string from a string

My input contains a single word lines. From each line data.txt prjtestBlaBlatestBlaBla prjthisBlaBlathisBlaBla prjthatBlaBladpthatBlaBla prjgoodBlaBladpgoodBlaBla prjgood1BlaBla123dpgood1BlaBla123 Desired output --> data_out.txt prjtestBlaBla prjthisBlaBla... (8 Replies)
Discussion started by: kchinnam
8 Replies

8. UNIX for Dummies Questions & Answers

Regex to match when input is not a certain string (can't use grep -v)

Hey everyone, Basically, all I'm looking for is a way to regex for not a certain string. The regex I'm looking to avoid matching is: D222 i.e. an equivalent of: awk '!/D222/' The problem is that I use this in the following command in a Bash script: ls ${source_directory} | awk... (1 Reply)
Discussion started by: kdelok
1 Replies

9. UNIX for Dummies Questions & Answers

| help | unix | grep (GNU grep) 2.5.1 | advanced regex syntax

Hello, I'm working on unix with grep (GNU grep) 2.5.1. I'm going through some of the newer regex syntax using Regular Expression Reference - Advanced Syntax a guide. ls -aLl /bin | grep "\(x\)" Which works, just highlights 'x' where ever, when ever. I'm trying to to get (?:) to work but... (4 Replies)
Discussion started by: MykC
4 Replies

10. Shell Programming and Scripting

sed, grep, awk, regex -- extracting a matched substring from a file/string

Ok, I'm stumped and can't seem to find relevant info. (I'm not even sure, I might have asked something similar before.): I'm trying to use shell scripting/UNIX commands to extract URLs from a fairly large web page, with a view to ultimately wrapping this in PHP with exec() and including the... (2 Replies)
Discussion started by: ropers
2 Replies
Login or Register to Ask a Question