awk script (complex)

03-12-2012

Registered User

32, 0

Join Date: Jan 2012

Last Activity: 17 April 2013, 8:13 PM EDT

Posts: 32

Thanks Given: 10

Thanked 0 Times in 0 Posts

yes and no.

8 of the 10 were >20

however

2 out of the 10 were <20 but were genuinely new

so it looks like it's printing everything off as new.

also comparing against the initial code in my first post on this thread it has missed off one genuine new line, I would say this is down to the -vLOOK=50 variable

having amended the LOOK to 6000 it has still missed the line out.

---------- Post updated at 05:34 AM ---------- Previous update was at 04:26 AM ----------

Quote:

Originally Posted by Chubler_XL

Are the numbers your getting on the NEW: lines bigger than 20? I'm still a bit confused about what NEW lines should be, if it's just records that only appear in the most recent file then this might work better:

Code:

cd /path/to/cisco/logs
files=`ls ciscostats_* | sort -t_ -k2.5 -k2.3,2.4 -k2.1,2.2`
first=`echo "$files" | tail -1`
 
awk -F, -vLOOK=50 -vMATCH=20 '
  FNR==1{F++}F==1{a[$1","$2","$3]++;next}
  a[$1","$2","$3]&&F<LOOK{b[$1","$2","$3]++}
  a[$1","$2","$3]{c[$1","$2","$3]++}
  END{for(i in c)if(b[i]>MATCH)print i";\t\t"b[i];else if(c[i]==a[i])print "NEW:"i";\t\t"c[i]}' $first $files

interestingly the "missing" new line appears with your new code, but now only this line appears?

by a new line, I mean

if(b[i]-1&&a[i]!=b[i])
if(b[i]-1)
## if ["element that i indexed of b array's count - 1"] has a value
## so there must be at least one record
a[i]!=b[i]
## if "i indexed element of b array's count" and "i indexed element of a array's count" is not equal
## so trying to be sure that is there a record in the other files?
## if not equal then there is a record in the other files
## so it is a OLD line
## else it will be a NEW line

Last edited by slashbash; 03-12-2012 at 01:31 AM..

slashbash

View Public Profile for slashbash

Find all posts by slashbash

03-12-2012

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

OK think I have it now:

Code:

cd /path/to/cisco/logs
files=`ls ciscostats_* | sort -t_ -k2.5r -k2.3,2.4r -k2.1,2.2r`
awk -F, -vLOOK=50 -vMATCH=20 '
   FNR==1{F++}F==1{a[$1","$2","$3]++;next}
   {i=$1","$2","$3;if(!(i in a))next}
   F<=LOOK{b[i]++}
   {c[$1","$2","$3]++}
   END{for(i in a)if(b[i]>0&&a[i]+b[i]>=MATCH){print i";\t\t"a[i]+b[i]}else if(c[i]+0==0)print "NEW:"i";\t\t"a[i]}' $files

This User Gave Thanks to Chubler_XL For This Post:

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

03-12-2012

Registered User

32, 0

Join Date: Jan 2012

Last Activity: 17 April 2013, 8:13 PM EDT

Posts: 32

Thanks Given: 10

Thanked 0 Times in 0 Posts

no.

I am getting all new lines printed off as 1, some these could be more then 1 for example 2+ and still be new, plus in code we are not comparing against all records for new lines to be printed off just against 50 (I know this is variable but could we not incorporate this check)

The first script has it to a tea i.e compares current file against everything then prints off new lines ok, just problem is I need it to also check current file against 3 months worth of files then print off >20

slashbash

View Public Profile for slashbash

Find all posts by slashbash

03-12-2012

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

Perhaps I'm misunderstanding your requirement.

I used LOOK=2 and MATCH=3 for these files:

Code:

*** ciscostats_08032012 ***
B,2,1
C,1,1
D,5,5
*** ciscostats_09032012 ***
B,2,1
B,2,1
*** ciscostats_10032012 ***
A,1,1
A,1,1
B,2,1
D,5,5

and this is the output I get/expect:

Code:

B,2,1;          3
NEW:A,1,1;              2

If this is wrong ,perhaps you could supply a sample file set with low MATCH/LOOK counts that demonstrate what you want.

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

03-12-2012

Registered User

32, 0

Join Date: Jan 2012

Last Activity: 17 April 2013, 8:13 PM EDT

Posts: 32

Thanks Given: 10

Thanked 0 Times in 0 Posts

Code:

nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
END{for(i in b){if(b[i]-1&&a[i]!=b[i]){print i";\t\t"b[i]}else{print "NEW:"i";\t\t"b[i]} } }' OFS=, ciscostats_10032012 *.csv | sort -r

above code compares all file lines with NR==FNR

old repeat lines are dumped into array b where indexed lines are incremented.

It also prints off any new indexed lines in array a with an increment after comparing to array b, where no match is found then it must be new.

I think we can modify both these scripts in order to serve the purpose, my only question would be can we run the scripts simultaneously which is what I want?

i.e the script above can be modified to only produce the new lines (and we can remove some of the unnecessary bits i.e the repeat incremental lines from array b (but prob still need to keep this array in order to do the new line comparison with array a, if you understand the logic)

we can use your script with LOOK and MATCH variables to compare the last 3 month records anything >20

think this is possible..

Last edited by slashbash; 03-12-2012 at 08:14 PM..

slashbash

View Public Profile for slashbash

Find all posts by slashbash

03-12-2012

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

The code you supplied produces the following output for my 3 test files:

Code:

D,5,5;          2
B,2,1;          4
NEW:A,1,1;              2

We have simplified your requirement (1.) to "only look at the first 2 files" (ie with a LOOK value of 2) and this will change the output to:

Code:

NEW:D,5,5;              1
B,2,1;          3
NEW:A,1,1;              2

Requirement (2.) that NEW should check all available files (i.e. ciscostats_08032012 is checked as well) will produce:

Code:

B,2,1;          3
NEW:A,1,1;              2

This is because "D,5,5" is in ciscostats_08032012, so it's not new.

This output matches the output of the script I supplied in post #16, you have said that #16 is wrong but I still can't see what it's doing that you dont like.

---------- Post updated at 11:50 AM ---------- Previous update was at 09:27 AM ----------

Looking back over this thread, I suspect you are reading the code I have supplied, and determining it's not doing what you want. Rather than trying it out with actual data, so it's probably time for me to explain what it does:

$files is populated with a list of data files with the most recent first eg:
ciscostats_02012012
ciscostats_01012012
ciscostats_31122011

a[] contains a count of how many times each ID appears in the first (most recent) file.

b[] contains a count of how many times an ID from a[] appears in files 2 thru LOOK

c[] contains a count of how many times an ID from a[] appears in any other file

At the end we print any ID that appears in both a[] and b[], and has a[]+b[] count >= MATCH
otherwise, a "NEW" record is output if value appears in a[] and not in c[]

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

03-12-2012

Registered User

32, 0

Join Date: Jan 2012

Last Activity: 17 April 2013, 8:13 PM EDT

Posts: 32

Thanks Given: 10

Thanked 0 Times in 0 Posts

Code:

[V490]#files=`ls ciscostats* | sort -t_ -k2.5r -k2.3,2.4r -k2.1,2.2r`
   {c[$1","$2","$3]++}
   END{for(i in a)if(b[i]>0&&a[i]+b[i]>=MATCH){print i";\t\t"a[i]+b[i]}else if(c[i]+0==0)print "NEW:"i";\t\t"a[i]}' $files[V490]#nawk -F, -vLOOK=60 -vMATCH=20 '
>    FNR==1{F++}F==1{a[$1","$2","$3]++;next}
>    {i=$1","$2","$3;if(!(i in a))next}
>    F<=LOOK{b[i]++}
>    {c[$1","$2","$3]++}
>    END{for(i in a)if(b[i]>0&&a[i]+b[i]>=MATCH){print i";\t\t"a[i]+b[i]}else if(c[i]+0==0)print "NEW:"i";\t\t"a[i]}' $files
NEW:NREE_CISCO3750,10,2          1

I know for sure there has been more then 1 new line, I have even compared back vLOOK 600

slashbash

View Public Profile for slashbash

Find all posts by slashbash

Shell Programming and Scripting

awk script (complex)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk in complex number data

Discussion started by: rogeriogouvea

2. Shell Programming and Scripting

Building a complex xml using awk

Discussion started by: Nevergivup

3. Shell Programming and Scripting

Complex awk problem

Discussion started by: dietmar13

4. Shell Programming and Scripting

Complex transpose awk script

Discussion started by: Ophiuchus

5. Shell Programming and Scripting

Help with Complex Awk.

Discussion started by: pinnacle

6. Shell Programming and Scripting

Complex match of numbers between 2 files awk script

Discussion started by: Ophiuchus

7. Shell Programming and Scripting

complex Awk Question

Discussion started by: yoavbe

8. Shell Programming and Scripting

Sorting complex file with awk

Discussion started by: blackzinga80

9. Shell Programming and Scripting

Complex use with awk

Discussion started by: yoavbe

10. Shell Programming and Scripting

Complex Sed/Awk Question?

Discussion started by: SkySmart