Text Proccessing with sort,uniq,awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Text Proccessing with sort,uniq,awk
# 1  
Old 12-21-2010
Text Proccessing with sort,uniq,awk

Hello,

I have a log file with the following input:
Code:
X , ID , Date, Time, Y
01,01368,2010-12-02,09:07:00,Pass
01,01368,2010-12-02,10:54:00,Pass
01,01368,2010-12-02,13:07:04,Pass
01,01368,2010-12-02,18:54:01,Pass
01,01368,2010-12-03,09:02:00,Pass
01,01368,2010-12-03,13:53:00,Pass
01,01368,2010-12-03,16:07:00,Pass

My goal is to get the number of times ID has a TIME that's after 09:00:00 each DATE.
That would give me two output. one is the number of days ID has been late, and secondly, the day and time this ID has been late .

I've started as such:
Code:
sort -t ','  -k 3,3 -k 4,4  file.log  # this will sort the file according to the DATE field as well as the Time fileld.

I'm stuck for the last 30 min to find a way to get the first line of each day (logically it'll be the earliest as i've sorted by date/time previously) once i know how to do this, i'll be able to compare time and proceed..

Can any one help ?
i looked into sort - u and uniq -f3 though i didnt get far with it..
i'm reading a few tutorials on AWK, though as i'm usual a sed person (it usually solved all my needs) i'm finding it a bit more complex..


Moderator's Comments:
Mod Comment Please use code tags when posting data and code samples, thank you.
# 2  
Old 12-21-2010
Assuming you want the number if times for each id per date:
Code:
awk -F, '{gsub(":","",$4)} 
int($4)>90000{a[$2,$3]++} 
END{for(i in a)print i, a[i]}
' file

Use nawk or /usr/xpg4/bin/awk on Solaris.
# 3  
Old 12-21-2010
Code:
 
awk -F'(,|:)' '{if($4+0 >= 9) a[$1," on date ",$3]++}END{for (i in a)print "count of ID",i,"is",a[i]}' inputFile

For input in post #1, output is:
Code:
 
count of ID 01 on date 2010-12-02 is 4
count of ID 01 on date 2010-12-03 is 3

# 4  
Old 12-21-2010
Thanks for your help, i've gathered to build on your advice and come up with something that works for what i want to accomplish:

Code:
 wc -l test;  awk -F , '{if ($4 > "09:10:00") print $2 " was late on", $3 " by coming at ",$4}' test

Output:
Code:
       7 test
01368 was late on 2010-12-02 by coming at  10:54:00
01368 was late on 2010-12-02 by coming at  13:07:04
01368 was late on 2010-12-02 by coming at  18:54:01
01368 was late on 2010-12-03 by coming at  13:53:00
01368 was late on 2010-12-03 by coming at  16:07:00

Though the problem with the above output, is that it checks every date/ line with 2010-12-03 instead of just the first (earliest)

is there a way to take the earliest time (according to date) or take the first line sorted by the date field?
# 5  
Old 12-21-2010
What should be the output of the given data of your first post?
# 6  
Old 12-21-2010
Thanks to your help i've reached this step:

Code:
 
original data:
 
01,01368,2010-12-02,09:07:00,Pass
01,01368,2010-12-02,10:54:00,Pass
01,01368,2010-12-02,13:07:04,Pass
01,01368,2010-12-02,18:54:01,Pass
01,01368,2010-12-03,09:02:00,Pass
01,01368,2010-12-03,13:53:00,Pass
01,01368,2010-12-03,16:07:00,Pass

Code:
 
awk -F , '{if ($4 > "09:10:00") print $2 " was late on", $3 " by coming at ",$4}' test | tee  DaysLate ; wc -l DaysLate

OUTPUT:

Code:
 
01368 was late on 2010-12-02 by coming at  10:54:00
 
01368 was late on 2010-12-02 by coming at  13:07:04
 
01368 was late on 2010-12-02 by coming at  18:54:01
 
01368 was late on 2010-12-03 by coming at  13:53:00
 
01368 was late on 2010-12-03 by coming at  16:07:00
 
       5 DaysLate

the only thing missing is to find a way to just take the earliest time of each day.

in other words the above output should be:


Code:
   0 DaysLate # as on 12-02 he came in at 09:07 which is before 09:10 and on 12-03 he came in at 09:02 which is also before the set time

# 7  
Old 12-21-2010
Code:
 
sort -t ','  -k 3,3 -k 4,4  file.log | awk -F, '{if($1 != "X" && !a[$3]) {a[$3]++;if($4 < "09:10:00") v="before 09:10"; else v="after 09:10"; print "on",substr($3,6,5),"here came in at",substr($4,0,5),v;}}'


Last edited by anurag.singh; 12-21-2010 at 05:05 PM..
This User Gave Thanks to anurag.singh For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Uniq and sort -u

Hello all, Need to pick your brains, I have a 10Gb file where each row is a name, I am expecting about 50 names in total. So there are a lot of repetitions in clusters. So I want to do a sort -u file Will it be considerably faster or slower to use a uniq before piping it to sort... (3 Replies)
Discussion started by: senhia83
3 Replies

2. Shell Programming and Scripting

Uniq or sort -u or similar only between { }

Hi ! I am trying to remove doubbled entrys in a textfile only between delimiters. Like that example but i dont know how to do that with sort or similar. input: { aaa aaa } { aaa aaa } output: { aaa } { (8 Replies)
Discussion started by: fugitivus
8 Replies

3. Shell Programming and Scripting

File comparison and proccessing using awk

Hi Guys, I am having two requirement in one of my scripts. please help out to find a fast solution using AWK (since there is lot of data to be processed) 1) First snippet - File1 has two columns and file2 has three columns If any value of column 1 of file1 matches with column 1... (4 Replies)
Discussion started by: stormfield
4 Replies

4. Shell Programming and Scripting

Sort uniq or awk

Hi again, I have files with the following contents datetime,ip1,port1,ip2,port2,number How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up? Please mind the file may contain 100k lines. (8 Replies)
Discussion started by: LDHB2012
8 Replies

5. Shell Programming and Scripting

Sort and uniq after comparision

Hi All, I have a text file with the format shown below. Some of the records are duplicated with the only exception being date (Field 15). I want to compare all duplicate records using subscriber number (field 7) and keep only those records with greater date. ... (1 Reply)
Discussion started by: nua7
1 Replies

6. Shell Programming and Scripting

Help with Uniq and sort

The key is first field i want only uniq record for the first field in file. I want the output as or output as Appreciate help on this (4 Replies)
Discussion started by: pinnacle
4 Replies

7. Shell Programming and Scripting

Sort, Uniq, Duplicates

Input File is : ------------- 25060008,0040,03, 25136437,0030,03, 25069457,0040,02, 80303438,0014,03,1st 80321837,0009,03,1st 80321977,0009,03,1st 80341345,0007,03,1st 84176527,0047,03,1st 84176527,0047,03, 20000735,0018,03,1st 25060008,0040,03, I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies

8. Shell Programming and Scripting

sort and uniq in perl

Does anyone have a quick and dirty way of performing a sort and uniq in perl? How an array with data like: this is bkupArr BOLADVICE_VN this is bkupArr MLT6800PROD2A this is bkupArr MLT6800PROD2A this is bkupArr BOLADVICE_VN_7YR this is bkupArr MLT6800PROD2A I want to sort it... (4 Replies)
Discussion started by: reggiej
4 Replies

9. UNIX for Dummies Questions & Answers

Help with Last,uniq, sort and cut

Using the last, uniq, sort and cut commands, determine how many times the different users have logged in. I know how to use the last command and cut command... i came up with last | cut -f1 -d" " | uniq i dont know if this is right, can someone please help me... thanks (1 Reply)
Discussion started by: jay1228
1 Replies

10. UNIX for Dummies Questions & Answers

sort/uniq

I have a file: Fred Fred Fred Jim Fred Jim Jim If sort is executed on the listed file, shouldn't the output be?: Fred Fred Fred Fred Jim Jim Jim (3 Replies)
Discussion started by: jimmyflip
3 Replies
Login or Register to Ask a Question