Text Proccessing with sort,uniq,awk

12-21-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Code:

awk -F, 'BEGIN{c=0}
s!=$3 {
  if($4 > time){
    print $2 "was late on " $3 " by coming at " $4; s=$3
    c++
  }
  else {
    s=$3
  }
}
END{print c " Days Late"}' time="09:01:00" file.log

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

12-21-2010

Registered User

7, 0

Join Date: Dec 2010

Last Activity: 15 February 2011, 5:31 AM EST

Posts: 7

Thanks Given: 3

Thanked 0 Times in 0 Posts

thank you, used this :

Quote:

sort -t ',' -k 3,3 -k 4,4 file.log | awk -F, '{if($1 != "X" && !a[$3]) {a[$3]++;if($4 < "09:10:00") v="before 09:10"; else v="after 09:10"; print "ID number " $2 " came in on",substr($3,6,5)," at",substr($4,0,5),v;}}' | grep after

as i didn't understand how it's exactly done, so i couldnt get rid of the "before" lines myself.

is it too much if i ask u to explain to me how it's done ?
here's the parts i didnt get:

Quote:

what does the following stand for please:

!a[$3]
a[$3]++ # why am i incrementing?
v="before 09:10" ## what's v ?
substr # didnt get this though i understood what it goes.

Quote:

Originally Posted by anurag.singh

Code:

 
sort -t ','  -k 3,3 -k 4,4  file.log | awk -F, '{if($1 != "X" && !a[$3]) {a[$3]++;if($4 < "09:10:00) v="before 09:10"; else v="after 09:10"; print "on",substr($3,6,5),"here came in at",substr($4,0,5),v;}}'

rollyah

View Public Profile for rollyah

Find all posts by rollyah

12-21-2010

Registered User

413, 99

Join Date: Nov 2010

Last Activity: 12 July 2012, 8:07 AM EDT

Location: Hyderabad, India

Posts: 413

Thanks Given: 13

Thanked 99 Times in 96 Posts

1. !a[$3] == >> $3 is date. This is to make sure that we are processing only 1st line for a given date (a[$3] will be set for other reords for same data)
2. a[$3]++ == >> This is just to set array with index $3.
3. v="before 09:10" == >> v is a variable, This is used in print statement (last argument to print)
4. substr == >> substr(string, start_index, lenght_of_substring)

Hope this helps !!
If you don't want before lines, just modify if condition where time value is being checked.

Code:

sort -t ',' -k 3,3 -k 4,4 file.log | awk -F, '{if($1 != "X" && !a[$3]) {a[$3]++;if($4 > "09:10:00") print "ID number " $2 " came in on",substr($3,6,5)," at",substr($4,0,5),"after 09:10";}}'

command I gave earlier, (In post #7) doing following:
1. Process only 1st line for a given date (after sorting the file) and ignore all other lines for same date
2. Compate time value with "09:10:00" and display before/after message accordingly

If you still stuck, you may post a proper input and proper output.

Last edited by anurag.singh; 12-22-2010 at 04:25 AM.. Reason: typo

This User Gave Thanks to anurag.singh For This Post:

anurag.singh

View Public Profile for anurag.singh

Find all posts by anurag.singh

12-21-2010

Registered User

7, 0

Join Date: Dec 2010

Last Activity: 15 February 2011, 5:31 AM EST

Posts: 7

Thanks Given: 3

Thanked 0 Times in 0 Posts

Thank you for your tremendous help, everything works great though i'mm still trying to get my head around this as i really would like to learn awk.
i got a tutorial for advanced scripting and i fast forwarded to AWK though they only had about 1 minute worth of relevant info and nothing this advanced.
so if you have time, if it's not too much trouble can you help me with the below?

Quote:

awk -F, '{if($1 != "X" && !a[$3]) {a[$3]++;if($4 > "09:10:00") print "ID number " $2 " came in on",substr($3,6,5)," at",substr($4,0,5),"after 09:10";}}'

if($1 != "X" # can you please explain the logic?
&& !a[$3] # i get that $3 is the 3d field, which is the date, but what does !a[$3] stand for? what does it represent? and why is it in the same if statement?
substr($3,6,5) # how did you manage to remove the year

PS: can you suggest a good place to start about awk ? a tutorial, or something you've managed to gain ur experience from.. (aside daily practice of course)?

Quote:

Originally Posted by anurag.singh

Code:

sort -t ',' -k 3,3 -k 4,4 file.log | awk -F, '{if($1 != "X" && !a[$3]) {a[$3]++;if($4 > "09:10:00") print "ID number " $2 " came in on",substr($3,6,5)," at",substr($4,0,5),"after 09:10";}}'

rollyah

View Public Profile for rollyah

Find all posts by rollyah

12-22-2010

Registered User

413, 99

Join Date: Nov 2010

Last Activity: 12 July 2012, 8:07 AM EDT

Location: Hyderabad, India

Posts: 413

Thanks Given: 13

Thanked 99 Times in 96 Posts

1. $1 != "X" == >> $1 is 1st field of every record, This check ensures that we should not process 1st record in inputFIle, The Header Record (X , ID , Date, Time, Y
). But this check is not necessary(And can be removed) in post #10 command. 2nd if condition (check on $4) will not be true for Header record and so that will not be printed.
2. what does !a[$3] stand for? == >> This checks if a[$3] is NULL OR has any value set in it. if(!a[$3]) is equivalent to if(a[$3]==NULL)
3. substr($3,6,5) == >> $3 will have year value like 2010-12-03. String indices in awk starts at 1. To get 12-03, start index is 6 (12-03 starts from 6th character) and then it goes upto the end i.e. lenght is 5.
substr($3,6) also gives same result. length is not needed if substring goes upto the last character.

I learned most of stuffs here only (After building few basics) by following posts of experts here Scrutinizer, vgersh99, Franklin52, fpmurphy, scottn, radoulov etc to name a few

Few AWK articles (Many more can be found on internet):
opengroup_awk
grymoire
Utrecht_University_docs
thegeekstuff_awk_with_examples (link removed)

Last edited by anurag.singh; 12-22-2010 at 09:01 AM.. Reason: typo

This User Gave Thanks to anurag.singh For This Post:

anurag.singh

View Public Profile for anurag.singh

Find all posts by anurag.singh

02-15-2011

Registered User

7, 0

Join Date: Dec 2010

Last Activity: 15 February 2011, 5:31 AM EST

Posts: 7

Thanks Given: 3

Thanked 0 Times in 0 Posts

i'd like to start by thanking you again and again.
however your help is needed once again.
i've almost understood the given awk help though i'm facing trouble with the following records:

Quote:

01,02530,2011-01-26,08:11:00,IN
01,02530,2011-01-26,18:40:00,OUT
01,02801,2011-01-26,09:07:00,IN
01,02801,2011-01-26,09:47:00,OUT
01,02801,2011-01-26,09:53:00,IN
01,02801,2011-01-26,18:06:00,OUT
01,02877,2011-01-26,08:29:00,IN
01,02877,2011-01-26,17:11:00,OUT
01,05713,2011-01-26,08:11:00,IN
01,05713,2011-01-26,13:47:00,OUT
01,05713,2011-01-26,14:47:00,IN
01,05713,2011-01-26,17:08:00,OUT

whaty ou helped me with is the following:

Quote:

awk -F, '{if($1 != "X" && !a[$3]) {a[$3]++;if($4 > "09:10:00") print "ID number " $2 " came in on",substr($3,6,5)," at",substr($4,0,5),"after 09:10";}}'

now i got rid of

Quote:

$1 != "X" &&

as i've managed to remove it from the output file.
though with the samples given above, there's no result.

it's something related to the first condition, though can't seem to pinpoint it
is it due to the way they're sorted?

keep in mind that it work's perfectly with the following initial output ;

01,01368,2010-12-02,09:07:00,Pass
01,01368,2010-12-02,10:54:00,Pass
01,01368,2010-12-02,13:07:04,Pass
01,01368,2010-12-02,18:54:01,Pass
01,01368,2010-12-03,09:02:00,Pass
01,01368,2010-12-03,13:53:00,Pass
01,01368,2010-12-03,16:07:00,Pass

Quote:

Originally Posted by anurag.singh

rollyah

View Public Profile for rollyah

Find all posts by rollyah

Shell Programming and Scripting

Text Proccessing with sort,uniq,awk

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Uniq and sort -u

Discussion started by: senhia83

2. Shell Programming and Scripting

Uniq or sort -u or similar only between { }

Discussion started by: fugitivus

3. Shell Programming and Scripting

File comparison and proccessing using awk

Discussion started by: stormfield

4. Shell Programming and Scripting

Sort uniq or awk

Discussion started by: LDHB2012

5. Shell Programming and Scripting

Sort and uniq after comparision

Discussion started by: nua7

6. Shell Programming and Scripting

Help with Uniq and sort

Discussion started by: pinnacle

7. Shell Programming and Scripting

Sort, Uniq, Duplicates

Discussion started by: Amruta Pitkar

8. Shell Programming and Scripting

sort and uniq in perl

Discussion started by: reggiej

9. UNIX for Dummies Questions & Answers

Help with Last,uniq, sort and cut

Discussion started by: jay1228

10. UNIX for Dummies Questions & Answers

sort/uniq

Discussion started by: jimmyflip