Sponsored Content
Top Forums Shell Programming and Scripting Grep regex to ignore sequence only if surrounded by fwd-slashes Post 302879255 by gencon on Wednesday 11th of December 2013 12:42:36 PM
Old 12-11-2013
...and just because this might be of interest:

Quote:
Originally Posted by Don Cragun
Why did you use [0-9][0-9]* at the end of digitSequenceTooLongNotIP instead of using [0-9]+?
Quote:
Originally Posted by gencon
I'm far from being a regex pro, my thought process went like this: I need to match a minimum of 4 digits in a row, so [0-9][0-9][0-9][0-9], then optionally a 5th or more digit so I need [0-9][0-9][0-9][0-9][0-9]*.

[0-9][0-9][0-9][0-9]+ is more concise, is it any more efficient?
I've had a look into the question I've placed in bold above...

I created a dataset of 5 million numbers each with a random number of digits (between 1 and 10 digits). 10 numbers per line, each separated by a space.

Then I used time to time 10 runs of an awk program which used [0-9][0-9][0-9][0-9]+ and then 10 runs with [0-9][0-9][0-9][0-9][0-9]*.

Since it was being run on my Linux desktop PC, I used chrt and set the scheduling policy to SCHED_FIFO with a priority of 99 which as far as I know gives the process the highest priority possible. The commands were:

Code:
chrt -f 99 time -f "\n***\nSecs: %e \nCPU: %P \nContext Switches: %c \nWaits: %w"
awk 'BEGIN { regex = "[0-9][0-9][0-9][0-9]+"; } { line = $0; gsub(regex, "x", line); }'
< NumsData5MillionNums >> ResRegex4 2>&1

chrt -f 99 time -f "\n***\nSecs: %e \nCPU: %P \nContext Switches: %c \nWaits: %w"
awk 'BEGIN { regex = "[0-9][0-9][0-9][0-9][0-9]*"; } { line = $0; gsub(regex, "x", line); }'
< NumsData5MillionNums >> ResRegex5 2>&1

I don't think the results can be considered as particularly scientific... But they were fairly consistent. BTW as expected each run had 0 context switches and 1 wait.

In fact the results were so close that I think that the Awk interpreter was probably running the same code in both cases, after all the 2 regexes [0-9][0-9][0-9][0-9]+ and [0-9][0-9][0-9][0-9][0-9]* are logically interchangeable.

I sorted the times and discarded the 3 fastest and 3 slowest times of the 10 runs, leaving me with:

Code:
ResRegex4 - regex = "[0-9][0-9][0-9][0-9]+" :

Secs: 7.16
Secs: 7.21
Secs: 7.27
Secs: 7.28

Mean: 7.23
Median: 7.24

ResRegex5 - regex = "[0-9][0-9][0-9][0-9][0-9]*" :

Secs: 7.14
Secs: 7.17
Secs: 7.18
Secs: 7.25

Mean: 7.185
Median: 7.175

Full output of "[0-9][0-9][0-9][0-9]+" is here: http://pastebin.com/VqC5dbna

Full output of "[0-9][0-9][0-9][0-9][0-9]*" is here: http://pastebin.com/U6rpULd6

The C code to create the data file of 5 million numbers, each 1-10 digits in length, and with 10 numbers on each line is here: http://pastebin.com/6vG9WQwj
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To grep in sequence

Hi, I have a log file containg records in sequence <CRMSUB:MSIN=2200380,BSNBC=TELEPHON-7553&TS21-7716553&TS22-7716553,NDC=70,MSCAT=ORDINSUB,SUBRES=ONAOFPLM,ACCSUB=BSS,NUMTYP=SINGLE; <ENTROPRSERV:MSIN=226380,OPRSERV=OCSI-PPSMOC-ACT-DACT&TCSI-PPSMTC-ACT-DACT&UCSI-USSD;... (17 Replies)
Discussion started by: helplineinc
17 Replies

2. Fedora

Hosting issue regarding subdirectories and fwd Slashes

I admin two co-located servers. I built an app that creates subdirectories for users ie www.site.com/username. one server that works just fine when you hit that url, it sees the index within and does as it should. I moved the app to my other server running FEDORA 1 i686 standard, cPanel... (3 Replies)
Discussion started by: iecowboy
3 Replies

3. UNIX for Dummies Questions & Answers

| help | unix | grep (GNU grep) 2.5.1 | advanced regex syntax

Hello, I'm working on unix with grep (GNU grep) 2.5.1. I'm going through some of the newer regex syntax using Regular Expression Reference - Advanced Syntax a guide. ls -aLl /bin | grep "\(x\)" Which works, just highlights 'x' where ever, when ever. I'm trying to to get (?:) to work but... (4 Replies)
Discussion started by: MykC
4 Replies

4. Shell Programming and Scripting

ignore fields to check in grep

Hi, I have a pipe delimited file. I am checking for junk characters ( non printable characters and unicode values). I am using the following code grep '' file.txt But i want to ignore the name fields. For example field2 is firstname so i want to ignore if the junk characters occur... (4 Replies)
Discussion started by: ashwin3086
4 Replies

5. Shell Programming and Scripting

Grep but ignore first column

Hi, I need to perform a grep from a file, but ignore any results from the first column. For simplicity I have changed the actual data, but for arguments sake, I have a file that reads: MONACO Monaco ASMonaco MANUTD ManUtd ManchesterUnited NEWCAS NewcastleUnited NAC000 NAC ... (5 Replies)
Discussion started by: danhodges99
5 Replies

6. Shell Programming and Scripting

regex - start with a word but ignore that word

Hi Guys. I guess I have a very basic query but stuck with it :( I have a file in which I want to extract particular content. The content is between standard format like : Verify stats A=0 B=12 C=34 TEST Failed Now I want to extract data between "Verify stats" & "TEST Failed" but do... (6 Replies)
Discussion started by: ratneshnagori
6 Replies

7. Shell Programming and Scripting

Ignore escape sequence in sed

Friends, In the file i am having more then 100 lines like, File1 had the values like this: #Example East.server_01=EAST.SERVER_01 East.server_01=EAST.SERVER_01 West.server_01=WEST.SERVER_01 File2 had the values like this: #Example EAST.SERVER_01=http://yahoo.com... (3 Replies)
Discussion started by: jothi basu
3 Replies

8. Shell Programming and Scripting

Need sequence no in the grep output

Hi, How to achieve the displaying of sequence no while doing grep for an output. Ex., need the output like below with the serial no, but not the available line number in the file S.No Array Lun 1 AABC 7080 2 AABC 7081 3 AADD 8070 4 AADD 8071 5 ... (3 Replies)
Discussion started by: ksgnathan
3 Replies

9. Shell Programming and Scripting

Grep command to ignore line starting with hyphen

Hi, I want to read a file line by line and exclude the lines that are beginning with special characters. The below code is working fine except when the line starts with hyphen (-) in the file. for TEST in `cat $FILE | grep -E -v '#|/+' | awk '{FS=":"}NF > 0{print $1}'` do . . done How... (4 Replies)
Discussion started by: Srinraj Rao
4 Replies

10. Shell Programming and Scripting

Grep and ignore list from file

cat /tmp/i.txt '(ORA-28001|ORA-00100|ORA-28001|ORA-20026|ORA-20025|ORA-02291|ORA-01458|ORA-01017|ORA-1017|ORA-28000|ORA-06512|ORA-06512|Domestic Phone|ENCRYPTION)' grep -ia 'ORA-\{5\}:' Rep* |grep -iavE `cat /tmp/i.txt` grep: Unmatched ( or \( Please tell me why am i getting that (6 Replies)
Discussion started by: jhonnyrip
6 Replies
All times are GMT -4. The time now is 10:18 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy