Compare a file with all others then print off data

12-30-2011

Banned

60, 0

Join Date: Oct 2011

Last Activity: 7 January 2012, 5:52 AM EST

Location: USA

Posts: 60

Thanks Given: 52

Thanked 0 Times in 0 Posts

ahamed I am grateful excellent work !

can you explain the modified code so that I understand better

specifically

Code:

{if(b[i]-1&&a[i]!=b[i]){print i";\t\t"b[i]}else{print "NEW:"i";\t\t"b[i]}

Last edited by llcooljatt; 12-30-2011 at 08:45 AM..

llcooljatt

View Public Profile for llcooljatt

Find all posts by llcooljatt

12-30-2011

Registered User

1,713, 295

Join Date: Feb 2010

Last Activity: 26 April 2017, 8:59 AM EDT

Location: istanbul

Posts: 1,713

Thanks Given: 4

Thanked 295 Times in 286 Posts

Quote:

Originally Posted by llcooljatt

your script
[CODE]nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next}{b[$1OFS$2OFS$3]++}
> END{for(j in b){bx++};for(i in a){for(j in b){if(i==j){cc=a[i]+b[j]}else{cn++}
> if(cn==bx){print "NEW ENTRY FOUND -->",i,a[i];w=0}};if(w!=0){print i,cc};w=1;cn=0}}
> ' failed_lcss_reboots_20111229.csv *.csv

your script seems to add an extra 1 to the figures, where as my old one is pretty much there only thing I need it to do is not look for -1 for new resets when comparing 20111229.csv with *.csv but give accurate reflection i.e instead of 1 this could be 10

i.e

NEW:NE:883948,SHELF:10,SLOT:2; 10

highlight as NEW on left

i don't understand exactly what you want but i guess output should not contain counts which in the new file about old records..

for example

Code:

# cat old*
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011

Code:

# cat new
NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:111111,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011

i removed record counts from new file and i add a condition for recurrence records.

Code:

# nawk -F, 'NR==FNR{a[$1FS$2FS$3]++;next}{b[$1FS$2FS$3]++}
END{for(j in b){bx++};for(i in a){for(j in b){if(i==j){cc=b[j]}else{cn++}
if(cn==bx){print "NEW->",i";\t\t",a[i];}w=0};if(w=!0&&cn<bx){;print i";\t\t",cc};w=1;cn=0}}
' new old*
NE:801048,SHELF:3,SLOT:1;                5
NE:593848,SHELF:3,SLOT:1;                3
NEW-> NE:888888,SHELF:3,SLOT:1;          2
NE:801048,SHELF:6,SLOT:2;                6
NEW-> NE:111111,SHELF:3,SLOT:1;          1
NE:841068,SHELF:8,SLOT:4;                2
NE:221726,SHELF:8,SLOT:1;                5

@ahamed101 code gives same result but new records..you can use this code when you want just old records..

Code:

# nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
> END{for(i in b){if(b[i]-1&&a[i]!=b[i]){print i";\t\t"b[i]}else{print "NEW:"i";\t\t"b[i]} } }' OFS=, new old*
NE:801048,SHELF:3,SLOT:1;               5
NE:593848,SHELF:3,SLOT:1;               3
NE:801048,SHELF:6,SLOT:2;               6
NE:841068,SHELF:8,SLOT:4;               2
NE:221726,SHELF:8,SLOT:1;               5

regards
ygemici

Last edited by ygemici; 12-30-2011 at 05:12 PM..

This User Gave Thanks to ygemici For This Post:

ygemici

View Public Profile for ygemici

Find all posts by ygemici

12-30-2011

Banned

60, 0

Join Date: Oct 2011

Last Activity: 7 January 2012, 5:52 AM EST

Location: USA

Posts: 60

Thanks Given: 52

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by ygemici

i don't understand exactly what you want but i guess output should not contain counts which in the new file about old records..

for example

Code:

# cat old*
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011

Code:

# cat new
NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:111111,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011

i removed record counts from new file and i add a condition for recurrence records.

Code:

# nawk -F, 'NR==FNR{a[$1FS$2FS$3]++;next}{b[$1FS$2FS$3]++}
END{for(j in b){bx++};for(i in a){for(j in b){if(i==j){cc=b[j]}else{cn++}
if(cn==bx){print "NEW->",i";\t\t",a[i];}w=0};if(w=!0&&cn<bx){;print i";\t\t",cc};w=1;cn=0}}
' new old*
NE:801048,SHELF:3,SLOT:1;                5
NE:593848,SHELF:3,SLOT:1;                3
NEW-> NE:888888,SHELF:3,SLOT:1;          2
NE:801048,SHELF:6,SLOT:2;                6
NEW-> NE:111111,SHELF:3,SLOT:1;          1
NE:841068,SHELF:8,SLOT:4;                2
NE:221726,SHELF:8,SLOT:1;                5

@ahamed101 code gives same result but new records..you can use this code when you want just old records..

Code:

# nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
> END{for(i in b){if(b[i]-1&&a[i]!=b[i]){print i";\t\t"b[i]}else{print "NEW:"i";\t\t"b[i]} } }' OFS=, new old*
NE:801048,SHELF:3,SLOT:1;               5
NE:593848,SHELF:3,SLOT:1;               3
NE:801048,SHELF:6,SLOT:2;               6
NE:841068,SHELF:8,SLOT:4;               2
NE:221726,SHELF:8,SLOT:1;               5

regards
ygemici

can you explain or write comments for your code please, I am new to this and any help from you guys is appreciated.

llcooljatt

View Public Profile for llcooljatt

Find all posts by llcooljatt

12-30-2011

Registered User

1,713, 295

Join Date: Feb 2010

Last Activity: 26 April 2017, 8:59 AM EDT

Location: istanbul

Posts: 1,713

Thanks Given: 4

Thanked 295 Times in 286 Posts

Quote:

Originally Posted by llcooljatt

can you explain or write comments for your code please, I am new to this and any help from you guys is appreciated.

nawk -F, ## determine the our FS=,

Code:

'NR==FNR{a[$1FS$2FS$3]++;next} ## execute this until the NR equal to FNR
so NR means  number of input records and it will be increase as long as tha read new records from all input files..
FNR means current record number in the current file so FNR holds the number of record for each new file as separately.
when the awk has started to execute for input files(from stdinput or pipe), NR and FNR equals 1
and both of them increases synchronous while started to read a new file..
when a new file is read from awk then NR will continue to increase but FNR reset to zero for every new file at each time.
so NR and FNR is same while a new file processing and so NR and FNR is eqaul for first file..

in that case, ...

Code:

NR==FNR{a[$1FS$2FS$3]++;next} ## process when NR==FNR (so execute for first file which read ) and then
for example our line is "NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011"
a[$1FS$2FS$3]++  ## $1 "NE:888888" FS=, and $2 "SHELF:3" and FS=, $3 "SLOT:1" and assing to array(a) 
a[NE:888888,SHELF:3,SLOT:1] --> our first index (is a string) array(a) holds our indexes (it is an associative array)
read goes on new records from all first file..
for exa read let's same record so new line is "NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011" ( same the above)
a[$1FS$2FS$3]++  ## 
increments the count for our index (NE:888888,SHELF:3,SLOT:1) so "print a[$1FS$2FS$3]" gives the count for this lines 
finally all lines accumulated in array indexes with counts

.....

Code:

next ## go to next record(line). hereby awk is forced to read the next record
NR==FNR{....;next} ## so read all lines in first file.
When it comes to the end of first file so awk process next file(s)

Code:

{b[$1FS$2FS$3]++} ## in the same way accumulates all lines from other file(s)

Code:

END ## after all the files has been read then execute this code that in the END{...}
{for(j in b){bx++}; ## we find array length and assing to "bx" value (bx increase while read an index val in b array)

Code:

for(i in a){  ## read indexes in a array (i --> index ,, a --> array name)
for(j in b){  ## same...
if(i==j){cc=b[j]} ## if "NE:888888,SHELF:3,SLOT:1" equals to "NE:221726,SHELF:8,SLOT:1" (for exa) then "cc=b[j]" [index count]

Code:

else{cn++} ## else increase value the "cn" for how many indexes have for not equal to "NE:888888,SHELF:3,SLOT:1"
## so this comparing will continue while "for-loop" for b array indexes [ remember b array indexes come from old files not first]

Code:

if(cn==bx){print "NEW->",i";\t\t",a[i];}w=0} ## if cn equals to bx (so it's means that didnt find any matches between new file and other files indexes)
## set w to zero (for do not write as OLD value)
if then write as NEW value and a[i] (count for this index val)

Code:

if(w=!0&&cn<bx){;print i";\t\t",cc};w=1;cn=0}} ## if w val is non-zero and cn smaller than the bx val
## if w is differ from zero it means NEW value has not been found already (above) 
## and bx(length of b array is greater than cn (so there is a matching record)
## (remember these(i) values holds from new file indexes)
..actually it could have been as "-1" ) ## i modified like way for my code for this
## then can write as OLD value
## so write as OLD value and cc (with count for this(j) index which from other files (so b array))
' new old*  ## new file and old files

Code:

# nawk -F, 'NR==FNR{a[$1FS$2FS$3]++;next}{b[$1FS$2FS$3]++}
END{for(j in b){bx++};for(i in a){for(j in b){if(i==j){cc=b[j]}else{cn++}
if(cn==bx){print "NEW->",i";\t\t",a[i];}w=0};if(w=!0&&cn==bx-1){;print i";\t\t",cc};w=1;cn=0}}
' new old*

regards
ygemici

This User Gave Thanks to ygemici For This Post:

ygemici

View Public Profile for ygemici

Find all posts by ygemici

12-30-2011

Registered User

1,910, 488

Join Date: Sep 2008

Last Activity: 22 December 2019, 2:31 AM EST

Location: San Jose, CA

Posts: 1,910

Thanks Given: 54

Thanked 488 Times in 481 Posts

Quote:

Originally Posted by ygemici

@ahamed101 code gives same result but new records..you can use this code when you want just old records..

@ygemici : the file arguments which is fed to awk is to be noted. 20111229.csv *.csv

20111229.csv is fed in twice and hence it takes care of both new and old with the logic.

--ahamed

This User Gave Thanks to ahamed101 For This Post:

ahamed101

View Public Profile for ahamed101

Find all posts by ahamed101

12-31-2011

Banned

60, 0

Join Date: Oct 2011

Last Activity: 7 January 2012, 5:52 AM EST

Location: USA

Posts: 60

Thanks Given: 52

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by ahamed101

@ygemici : the file arguments which is fed to awk is to be noted. 20111229.csv *.csv

20111229.csv is fed in twice and hence it takes care of both new and old with the logic.

--ahamed

@ahamed can I say thanks for this code it will save me hours of manual work, 1 thing can you explain

Code:

if(b[i]-1&&a[i]!=b[i])

ok think iv worked it out, if my b array index is equal to -1 && (boolean) array a index is not = b index i.e no lines match in array b then {print "NEW:"i";\t\t"b[i]} is that a fair assessment ?

Last edited by llcooljatt; 12-31-2011 at 04:00 AM..

llcooljatt

View Public Profile for llcooljatt

Find all posts by llcooljatt

12-31-2011

Registered User

1,713, 295

Join Date: Feb 2010

Last Activity: 26 April 2017, 8:59 AM EDT

Location: istanbul

Posts: 1,713

Thanks Given: 4

Thanked 295 Times in 286 Posts

Quote:

Originally Posted by ahamed101

@ygemici : the file arguments which is fed to awk is to be noted. 20111229.csv *.csv

20111229.csv is fed in twice and hence it takes care of both new and old with the logic.

--ahamed

hmm ok i missed it.

Quote:

Originally Posted by llcooljatt

@ahamed can I say thanks for this code it will save me hours of manual work, 1 thing can you explain

Code:

if(b[i]-1&&a[i]!=b[i])

ok think iv worked it out, if my b array index is equal to -1 && (boolean) array a index is not = b index i.e no lines match in array b then {print "NEW:"i";\t\t"b[i]} is that a fair assessment ?

then final version (added to new file to old files)

Code:

# awk -F, 'NR==FNR{a[$1FS$2FS$3]++;next}{b[$1FS$2FS$3]++}
END{for(i in b){if(a[i]==b[i]){print "NEW->",i";\t\t",a[i]}else{print i";\t\t",b[i]}}}' new old* new

Code:

## if "i indexed element of b array's count" (other files and new records)
## and "i indexed element of a array's count" is equal (only new records)
## is equal that is to say -> there is only in new file and not in the old files
## because if there was a same record in the old files 
## then the record of count's would not be same and more than the a[i] 's count value..
## so it is a NEW line

and @ahamed code

Code:

if(b[i]-1&&a[i]!=b[i]) 
if(b[i]-1)
## if ["element that i indexed of b array's count - 1"] has a value
## so there must be at least one record (i guess be sure about read successfully the records )
a[i]!=b[i]
## if "i indexed element of b array's count" and "i indexed element of a array's count" is not equal
## so trying to be sure that is there a record in the other files?
## if not equal then there is a record in the other files
## so it is a OLD line

regards
ygemici

This User Gave Thanks to ygemici For This Post:

ygemici

View Public Profile for ygemici

Find all posts by ygemici

Shell Programming and Scripting

Compare a file with all others then print off data

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare 2 columns from the same file and print a value depending on the result

Discussion started by: ksennin

2. Shell Programming and Scripting

Compare & print content of file which is not matching

Discussion started by: aaysa123

3. Shell Programming and Scripting

Compare 2 text file with 1 column in each file and write mismatch data to 3rd file

Discussion started by: Divya Nochiyil

4. UNIX for Dummies Questions & Answers

Compare 2 files print the lines of file 2 that contain a string from file 1

Discussion started by: KevinRidley

5. Shell Programming and Scripting

Compare and print out data only appear in file 1 problem

Discussion started by: patrick87

6. Shell Programming and Scripting

Compare two file and print same line

Discussion started by: bleach8578

7. Shell Programming and Scripting

compare two columns of different files and print the matching second file..

Discussion started by: vasanth.vadalur

8. Shell Programming and Scripting

Compare selected columns from a file and print difference

Discussion started by: kingpeejay

9. Shell Programming and Scripting

compare 2 file and print difference in the third file URG PLS

Discussion started by: evvander

10. UNIX for Dummies Questions & Answers

Compare Data in the same file

Discussion started by: lweegp