![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| c program to extract text between two delimiters from some text file | kukretiabhi13 | High Level Programming | 6 | 6 Hours Ago 05:55 AM |
| Sorting rules on a text section | Indalecio | Shell Programming and Scripting | 4 | 12-05-2006 02:25 AM |
| Text File error in email | mgirinath | Shell Programming and Scripting | 3 | 07-12-2006 03:10 AM |
| sorting received mail in unix and another error | starla0316 | UNIX for Dummies Questions & Answers | 0 | 06-06-2005 12:11 AM |
| grep multiple text files in folder into 1 text file? | coppertone | UNIX for Dummies Questions & Answers | 7 | 08-23-2002 11:50 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
||||
|
||||
|
awk error in sorting text file
Hi
Having a file as below file.txt Code:
error Server Network Name Dept Date Time =========================================================================================================================== 0 ServerA LAN1 AAA IT01 04/30/2008 09:16:26 0 ServerB LAN1 AAA IT02 04/30/2008 09:16:26 0 ServerA LAN1 AAA IT01 04/30/2008 11:11:26 0 ServerB LAN1 AAA IT02 04/30/2008 11:11:26 0 ServerA LAN1 AAA IT01 04/29/2008 12:16:26 0 ServerB LAN1 AAA IT02 04/30/2008 12:16:26 got error, not much clear with the syntax any one can help me Code:
nawk 'END { for (k in r) print r[k] }
/^[0-9]/ { split($6, d, "/")
if (d[3]d[1]d[2]OFS$2 > m[$NF] ) {
m[$NF] = d[3]d[1]d[2]OFS$2; r[$NF] = $0
}
next }1' FS=" *" file.txt
|
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
If the code is not correct, then can you describe what it's supposed to do, in some detail?
|
|
#3
|
||||
|
||||
|
Need to remove duplicate lines satisfies the below condition
if error, server, netowrk, dept and date are all the same then keep the latest line and remove old timed duplicate lines |
|
#4
|
|||
|
|||
|
Your code collects a unique line per time stamp ($NF is the last field on the line, the time stamp), not per the criteria you listed.
I don't know what the FS=" *" part is supposed to do, the regular whitespace separation that awk uses by default should work, and the FS looks like it's more or less the same thing anyway (not sure if you have tabs in there or not). The keys you want to use are $1 (error), $2 (server), $3 (network), $4 (dept), and $5 (date). You probably want to do the arithmetic normalization on the time field, not on the date. Code:
nawk '/^[0-9]/ { split($7, z, ":")
k=$1OFS$2OFS$3OFS$4OFS$5;
t=z[1]z[2]z[3];
if(t > m[k] ) {
m[k] = t; r[k] = $0
}
next }1
END { for (k in r) print r[k] }' file.txt
So t contains the time stamp from $7 with the colons removed, and k is the combination of the fields you want to compare time stamps for (error, server, network, dept, date). If t is bigger than the old t you have for this k in m[k] (or it doesn't exist, meaning it's effectively zero), replace it, and remember the whole line in r[k]. Finally print all the lines in r. Oh, the single number one after the closing brace is significant, too; it causes the header lines to be printed. If you don't want to print them, take it out. (It's a shorthand; it says "for any remaining line -- for which 1 is true, which by definition it is; this thus means all remaining lines, excluding any which were already handled earlier in the script -- do the default action, which is to print the line.") Last edited by era; 05-06-2008 at 01:26 AM. Reason: m[k] is effectively zero if it's not defined; single 1 prints header |
|||
| Google The UNIX and Linux Forums |