Converting grep to awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Converting grep to awk
# 8  
Old 05-22-2012
I see. No wonder you wanted to convert it Smilie

You can convert {2,3} to more regular syntax fortunately. Just put three of them, and make the third one optional with a ? after it. And repeat the {6} 6 times. Not elegant but at least efficient.

This could replace the whole loop I think:
Code:
awk -F"|" '$9 ~ /[[:alnum:]][[:alnum:]][[:alnum:]]?\/\/[[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]/' infile > outfile

This User Gave Thanks to Corona688 For This Post:
# 9  
Old 05-22-2012
Quote:
Originally Posted by dagamier
Here is a full code snippet of what i'm trying to convert. Note that I am currently using grep in a while look and reading through a file with millions of records make this take quite a long time to complete:

Code:
while read LINE
do
   REC13=`echo $LINE |cut -d"|" -f9 |grep -ih '[[:alnum:]]\{2,3\}"//"[[:alnum:]]\{6\}'`
   if [ -n "$REC13" ]
   then
        echo $LINE >> ./$PRVYR/$MONTH/mislabeled/$MONTH-mislabeled.csv
   fi
done < INFILE

This particular record looks for these strings: CCC//SSSSSS or CC//SSSSSS

My goal is to try and convert this into an awk command.

Moderator's Comments:
Mod Comment Code tags for code, please.
Yes reading and processing a line at a time on a file with millions of records is a huge resource waste...as the entire while loop can be replaced by the awk one liner I posted...so give it a try.
# 10  
Old 05-24-2012
Ok. I made a little progress, but am still stuck. I was able to make my awk work if I use nawk versus traditional awk. Howver, it is not matching all my conditions. First a data set example:

Code:
Record1|some text (c-1234, US)|some more stuff
Record2| more text (c-1234, 897)|more stuff
Record3| new stuff (abc234, 897)| extra stuff

When I run this:
Code:
while read LINE
do
echo $LINE | nawk -F'|' 'BEGIN {search_regex = "\\([[:alnum:]]\{1\}.[[:alnum:]]\{4\},[[:blank:]][[:alnum:]]\{2,3\}\\)"} tolower($9) ~ search_regex {print $9}'
done < xx

When I run it I get this:
Code:
new stuff (abc234, 897)

but I don't get the other two records. How can I get awk to allow any charcter in that second position even if it is a dash? As you can see, I have even tried . notation with no success. Any help in helping me resolve this will help me fix about tweny other things i'm trying to work through one at a time. As always, any help is greatly appreciated.

Just as a note, I did also try this outside of the while loop and I get the same results, not sure why I expected something different in the loop.

---------- Post updated at 11:09 AM ---------- Previous update was at 10:54 AM ----------

Quote:
Originally Posted by shamrock
Yes reading and processing a line at a time on a file with millions of records is a huge resource waste...as the entire while loop can be replaced by the awk one liner I posted...so give it a try.
Shamrock, I did try this and it worked for the one condition, but not the ones where there was a - in the second field (can be other special characters too). I am working on making each awk a separate statement (like grepping a file but much faster) without the while loop.

Last edited by dagamier; 05-24-2012 at 02:55 PM.. Reason: adding more info I forgot
# 11  
Old 05-24-2012
You'll probably have better success if you did not store the regex as a string first. just use tolower($9) ~ /regex/. in fact the tolower isn't needed since you're not using any case sensitive things.

I'm confused in what you're doing now though. First it was about 897/C/123456/LNAME FNAME in field 9, now it's something else in field 2 but your code says $9?

In this latest example:

Code:
$ awk --posix -F\| '$2 ~ /\([[:alnum:]].[[:alnum:]]{4},[[:blank:]]*[[:alnum:]]{2,3}\)/ {print $2}' input2
some text (c-1234, US)
 more text (c-1234, 897)
 new stuff (abc234, 897)

and for the name thing you switched between using / and //, I think it'd be: $9 ~ /^[[:alnum:]]{2,3}\/.?\/[[:alnum:]]{6}\// to check those first 2/3 subfields (897, c?, 123456)

Last edited by neutronscott; 05-24-2012 at 04:00 PM..
This User Gave Thanks to neutronscott For This Post:
# 12  
Old 05-24-2012
Apologies Scott. I stripped out a bunch of the other fields (it will still be field 9), but my test sample is just a couple of fields. trying to simplify things without copying in tons of data
# 13  
Old 05-24-2012
...And that's the problem. If you gave us some actual data the first time, we could make something that works right with it the first time.

Go ahead, stretch the browser Smilie
# 14  
Old 05-24-2012
OK. Let's stretch the browser capability. Here are the actual data sets I need to go match Each is its own "sample" record:

Code:
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|US/C/// US/C///|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|US/C///, US/C///,|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///(abc123 US)|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  (abc123 897) LBG|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  (abc123 US) LBG|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  (abc123, 897) LBG|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  (abc123, US) LBG|Field10|Field11|Field12
Field1|Field2|Field3|2012-01-25||Field6|Field7|Field8|000/C///run (c-ab123, US) jobs.|Field10|Field11|Field12
Field1|Field2|Field3|2012-01-25||Field6|Field7|Field8|000/C///run (c-ab123, 897) jobs.|Field10|Field11|Field12
Field1|Field2|Field3|2012-01-25||Field6|Field7|Field8|000/C///run (c-ab123 US) jobs.|Field10|Field11|Field12
Field1|Field2|Field3|2012-01-25||Field6|Field7|Field8|000/C///run (c-ab123 897) jobs.|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  SN:abc123 897 LBG|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  897//abc123 LBG|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  US//abc123 LBG|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  897//a-1234 LBG|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  US//a-cdef LBG|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///pour  ;abcdef;897 LBG|Field10|Field11|Field12
Field1|Field2|Field3|2012-01-15||Field6|Field7|Field8|649/C//GTAA/ - :abcdef:897 Conver|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///,abcdef,897|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C/// /abcdef897|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C/// /abcdefUS|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|abcdef 897 000/C///|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///SN:abcdef 897|Field10|Field11|Field12
Field1|Field2|Field3|3-13-12||Field6|Field7|Field8|000/C///me@some.domain.com|Field10|Field11|Field12

As you can see there quite a few. The ones that are giving me a hard time are the ones that have a - in the field. The ones without dash I have been able to mostly figure out with the help of this forum. And a few others i'm close on, but i'm getting better at this. And I have been testing the previous suggestions of removing the storage of the regex as well to simplify the code

Last edited by dagamier; 05-25-2012 at 01:40 PM.. Reason: added bold to the problem strings per later post suggestion. the rest have been resolved with forum help
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Converting awk to perl

Hello. I'm currently teaching myself Perl and was trying to turn an awk code that I had written into Perl. I have gotten stuck on a particular part and a2p has not helped me at all. The task was to take a .csv file containing a name, assignment type, score and points possible and compute it into a... (1 Reply)
Discussion started by: Eric7giants
1 Replies

2. UNIX for Beginners Questions & Answers

Converting awk to perl

Hello. I'm trying to convert an awk script I wrote to perl (which I just started self-teaching). I tried the a2p command but I couldn't make sense of most of it. Here was the awk code: BEGIN{ FS = "," print "NAME\tLOW\tHIGH\tAVERAGE" a=0 } { if(a==0){ a+=1 (1 Reply)
Discussion started by: Eric7giants
1 Replies

3. Shell Programming and Scripting

Using awk for converting xml to txt

Hi, I have a xml script, I converted it to .txt with values comma seperated using awk function. But I want the output values should be inside double quotes My xml script (Workorders.xml) is shown like below: <?xml version="1.0" encoding="utf-8" ?> <scbm-extract version="3.3">... (8 Replies)
Discussion started by: Viswanatheee55
8 Replies

4. Shell Programming and Scripting

Converting shell/awk to ruby

any idea on how to get started with this: shell script: awk '/{/,/}/' ~/newservices.txt | while read line do BEGIN=$(echo "${line}" | egrep ":" | egrep "{") if ; then checkname=$(echo $line | awk -F":" '{print $1}' | sed 's_"__g') echo "{"... (1 Reply)
Discussion started by: SkySmart
1 Replies

5. Shell Programming and Scripting

Converting awk script from bash to csh

I have the following script set up and working properly in bash. It basically copies a set of lines which match "AS1100002" from one file and replaces the same lines in another file. awk -vN=AS1100002* 'NR==FNR { if($1 ~ N)K=$0; next } { if($1 in K) $0=K; print }' $datadir/file1... (7 Replies)
Discussion started by: ncwxpanther
7 Replies

6. Shell Programming and Scripting

awk - problems by converting date-format

Hi i try to change the date-format from DD/MM/YYYY into MM/DD/YY. Input-Data: ... 31/12/2013,23:40,198.00,6.20,2,2,2,1,11580.0,222 31/12/2013,23:50,209.00,7.30,2,2,3,0,4380.0 01/01/2014,00:00,205.90,8.30,2,2,3,1,9360.0,223 ... Output-Data should be: ...... (7 Replies)
Discussion started by: IMPe
7 Replies

7. Shell Programming and Scripting

Converting to matrix-like file using AWK

Hi, Needs for statistics, doing converting Here is a sample file Input : 1|A|17,94 1|B|22,59 1|C|56,93 2|A|63,71 2|C|23,92 5|B|19,49 5|C|67,58 expecting something like that Output : 1|A|17,94|B|22,59|C|56,93 2|A|63,71|B|0|C|23,92 5|A|0|B|19,49|C|67,58 (11 Replies)
Discussion started by: fastlane3000
11 Replies

8. Shell Programming and Scripting

Converting txt file into CSV using awk or sed

Hello folks I have a txt file of information about journal articles from different fields. I need to convert this information into a format that is easier for computers to manipulate for some research that I'm doing on how articles are cited. The file has some header information and then details... (8 Replies)
Discussion started by: ksk
8 Replies

9. UNIX for Dummies Questions & Answers

Converting HP-UX awk to Solaris

Hi, I am using awk in HP-UX to enter an encrypted entry of the password into /etc/passwd with success, this is the command I am using and it is working great. cat /tmp/passwd.gal.before|awk -F: -v gal_passwd="encrypted_password" '{OFS=":" ; print $1,gal_passwd,$3,$4,$5,$6,$7}' >... (3 Replies)
Discussion started by: galuzan
3 Replies

10. Shell Programming and Scripting

Converting a line into a list using awk or sed

Hello, I am trying to convert a line into a list using awk or sed. Line: 345 897 567 098 123 output: 345 897 567 098 123 thanks (7 Replies)
Discussion started by: smarones
7 Replies
Login or Register to Ask a Question