Compare a file with all others then print off data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare a file with all others then print off data
# 15  
Old 12-30-2011
ahamed I am grateful excellent work !

can you explain the modified code so that I understand better

specifically
Code:
{if(b[i]-1&&a[i]!=b[i]){print i";\t\t"b[i]}else{print "NEW:"i";\t\t"b[i]}


Last edited by llcooljatt; 12-30-2011 at 08:45 AM..
# 16  
Old 12-30-2011
Quote:
Originally Posted by llcooljatt
your script
[CODE]nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next}{b[$1OFS$2OFS$3]++}
> END{for(j in b){bx++};for(i in a){for(j in b){if(i==j){cc=a[i]+b[j]}else{cn++}
> if(cn==bx){print "NEW ENTRY FOUND -->",i,a[i];w=0}};if(w!=0){print i,cc};w=1;cn=0}}
> ' failed_lcss_reboots_20111229.csv *.csv

your script seems to add an extra 1 to the figures, where as my old one is pretty much there only thing I need it to do is not look for -1 for new resets when comparing 20111229.csv with *.csv but give accurate reflection i.e instead of 1 this could be 10

i.e

NEW:NE:883948,SHELF:10,SLOT:2; 10

highlight as NEW on left
i don't understand exactly what you want but i guess output should not contain counts which in the new file about old records..

for example
Code:
# cat old*
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011

Code:
# cat new
NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:111111,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011

i removed record counts from new file and i add a condition for recurrence records.
Code:
# nawk -F, 'NR==FNR{a[$1FS$2FS$3]++;next}{b[$1FS$2FS$3]++}
END{for(j in b){bx++};for(i in a){for(j in b){if(i==j){cc=b[j]}else{cn++}
if(cn==bx){print "NEW->",i";\t\t",a[i];}w=0};if(w=!0&&cn<bx){;print i";\t\t",cc};w=1;cn=0}}
' new old*
NE:801048,SHELF:3,SLOT:1;                5
NE:593848,SHELF:3,SLOT:1;                3
NEW-> NE:888888,SHELF:3,SLOT:1;          2
NE:801048,SHELF:6,SLOT:2;                6
NEW-> NE:111111,SHELF:3,SLOT:1;          1
NE:841068,SHELF:8,SLOT:4;                2
NE:221726,SHELF:8,SLOT:1;                5

@ahamed101 code gives same result but new records..you can use this code when you want just old records..
Code:
# nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
> END{for(i in b){if(b[i]-1&&a[i]!=b[i]){print i";\t\t"b[i]}else{print "NEW:"i";\t\t"b[i]} } }' OFS=, new old*
NE:801048,SHELF:3,SLOT:1;               5
NE:593848,SHELF:3,SLOT:1;               3
NE:801048,SHELF:6,SLOT:2;               6
NE:841068,SHELF:8,SLOT:4;               2
NE:221726,SHELF:8,SLOT:1;               5


regards
ygemici

Last edited by ygemici; 12-30-2011 at 05:12 PM..
This User Gave Thanks to ygemici For This Post:
# 17  
Old 12-30-2011
MySQL

Quote:
Originally Posted by ygemici
i don't understand exactly what you want but i guess output should not contain counts which in the new file about old records..

for example
Code:
# cat old*
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011

Code:
# cat new
NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:111111,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:221726,SHELF:8,SLOT:1,01:00:02,Wed Dec 28 2011
NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:801048,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011
NE:841068,SHELF:8,SLOT:4,02:00:03,Wed Dec 28 2011
NE:593848,SHELF:3,SLOT:1,02:30:09,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011
NE:801048,SHELF:6,SLOT:2,04:00:01,Wed Dec 28 2011

i removed record counts from new file and i add a condition for recurrence records.
Code:
# nawk -F, 'NR==FNR{a[$1FS$2FS$3]++;next}{b[$1FS$2FS$3]++}
END{for(j in b){bx++};for(i in a){for(j in b){if(i==j){cc=b[j]}else{cn++}
if(cn==bx){print "NEW->",i";\t\t",a[i];}w=0};if(w=!0&&cn<bx){;print i";\t\t",cc};w=1;cn=0}}
' new old*
NE:801048,SHELF:3,SLOT:1;                5
NE:593848,SHELF:3,SLOT:1;                3
NEW-> NE:888888,SHELF:3,SLOT:1;          2
NE:801048,SHELF:6,SLOT:2;                6
NEW-> NE:111111,SHELF:3,SLOT:1;          1
NE:841068,SHELF:8,SLOT:4;                2
NE:221726,SHELF:8,SLOT:1;                5

@ahamed101 code gives same result but new records..you can use this code when you want just old records..
Code:
# nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
> END{for(i in b){if(b[i]-1&&a[i]!=b[i]){print i";\t\t"b[i]}else{print "NEW:"i";\t\t"b[i]} } }' OFS=, new old*
NE:801048,SHELF:3,SLOT:1;               5
NE:593848,SHELF:3,SLOT:1;               3
NE:801048,SHELF:6,SLOT:2;               6
NE:841068,SHELF:8,SLOT:4;               2
NE:221726,SHELF:8,SLOT:1;               5


regards
ygemici



can you explain or write comments for your code please, I am new to this and any help from you guys is appreciated.
# 18  
Old 12-30-2011
Quote:
Originally Posted by llcooljatt
can you explain or write comments for your code please, I am new to this and any help from you guys is appreciated.
nawk -F, ## determine the our FS=,
Code:
'NR==FNR{a[$1FS$2FS$3]++;next} ## execute this until the NR equal to FNR
so NR means  number of input records and it will be increase as long as tha read new records from all input files..
FNR means current record number in the current file so FNR holds the number of record for each new file as separately.
when the awk has started to execute for input files(from stdinput or pipe), NR and FNR equals 1
and both of them increases synchronous while started to read a new file..
when a new file is read from awk then NR will continue to increase but FNR reset to zero for every new file at each time.
so NR and FNR is same while a new file processing and so NR and FNR is eqaul for first file..

in that case, ...
Code:
NR==FNR{a[$1FS$2FS$3]++;next} ## process when NR==FNR (so execute for first file which read ) and then
for example our line is "NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011"
a[$1FS$2FS$3]++  ## $1 "NE:888888" FS=, and $2 "SHELF:3" and FS=, $3 "SLOT:1" and assing to array(a) 
a[NE:888888,SHELF:3,SLOT:1] --> our first index (is a string) array(a) holds our indexes (it is an associative array)
read goes on new records from all first file..
for exa read let's same record so new line is "NE:888888,SHELF:3,SLOT:1,01:30:02,Wed Dec 28 2011" ( same the above)
a[$1FS$2FS$3]++  ## 
increments the count for our index (NE:888888,SHELF:3,SLOT:1) so "print a[$1FS$2FS$3]" gives the count for this lines 
finally all lines accumulated in array indexes with counts

.....
Code:
next ## go to next record(line). hereby awk is forced to read the next record
NR==FNR{....;next} ## so read all lines in first file.
When it comes to the end of first file so awk process next file(s)

Code:
{b[$1FS$2FS$3]++} ## in the same way accumulates all lines from other file(s)

Code:
END ## after all the files has been read then execute this code that in the END{...}
{for(j in b){bx++}; ## we find array length and assing to "bx" value (bx increase while read an index val in b array)

Code:
for(i in a){  ## read indexes in a array (i --> index ,, a --> array name)
for(j in b){  ## same...
if(i==j){cc=b[j]} ## if "NE:888888,SHELF:3,SLOT:1" equals to "NE:221726,SHELF:8,SLOT:1" (for exa) then "cc=b[j]" [index count]

Code:
else{cn++} ## else increase value the "cn" for how many indexes have for not equal to "NE:888888,SHELF:3,SLOT:1"
## so this comparing will continue while "for-loop" for b array indexes [ remember b array indexes come from old files not first]

Code:
if(cn==bx){print "NEW->",i";\t\t",a[i];}w=0} ## if cn equals to bx (so it's means that didnt find any matches between new file and other files indexes)
## set w to zero (for do not write as OLD value)
if then write as NEW value and a[i] (count for this index val)

Code:
if(w=!0&&cn<bx){;print i";\t\t",cc};w=1;cn=0}} ## if w val is non-zero and cn smaller than the bx val
## if w is differ from zero it means NEW value has not been found already (above) 
## and bx(length of b array is greater than cn (so there is a matching record)
## (remember these(i) values holds from new file indexes)
..actually it could have been as "-1" ) ## i modified like way for my code for this
## then can write as OLD value
## so write as OLD value and cc (with count for this(j) index which from other files (so b array))
' new old*  ## new file and old files

Code:
# nawk -F, 'NR==FNR{a[$1FS$2FS$3]++;next}{b[$1FS$2FS$3]++}
END{for(j in b){bx++};for(i in a){for(j in b){if(i==j){cc=b[j]}else{cn++}
if(cn==bx){print "NEW->",i";\t\t",a[i];}w=0};if(w=!0&&cn==bx-1){;print i";\t\t",cc};w=1;cn=0}}
' new old*

regards
ygemici
This User Gave Thanks to ygemici For This Post:
# 19  
Old 12-30-2011
Quote:
Originally Posted by ygemici
@ahamed101 code gives same result but new records..you can use this code when you want just old records..
@ygemici : the file arguments which is fed to awk is to be noted. 20111229.csv *.csv

20111229.csv is fed in twice and hence it takes care of both new and old with the logic.

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 20  
Old 12-31-2011
Quote:
Originally Posted by ahamed101
@ygemici : the file arguments which is fed to awk is to be noted. 20111229.csv *.csv

20111229.csv is fed in twice and hence it takes care of both new and old with the logic.

--ahamed
@ahamed can I say thanks for this code it will save me hours of manual work, 1 thing can you explain
Code:
if(b[i]-1&&a[i]!=b[i])

SmilieSmilieSmilie

ok think iv worked it out, if my b array index is equal to -1 && (boolean) array a index is not = b index i.e no lines match in array b then {print "NEW:"i";\t\t"b[i]} is that a fair assessment ?

Last edited by llcooljatt; 12-31-2011 at 04:00 AM..
# 21  
Old 12-31-2011
Quote:
Originally Posted by ahamed101
@ygemici : the file arguments which is fed to awk is to be noted. 20111229.csv *.csv

20111229.csv is fed in twice and hence it takes care of both new and old with the logic.

--ahamed
hmm ok i missed it.Smilie

Quote:
Originally Posted by llcooljatt
@ahamed can I say thanks for this code it will save me hours of manual work, 1 thing can you explain
Code:
if(b[i]-1&&a[i]!=b[i])

SmilieSmilieSmilie

ok think iv worked it out, if my b array index is equal to -1 && (boolean) array a index is not = b index i.e no lines match in array b then {print "NEW:"i";\t\t"b[i]} is that a fair assessment ?
then final version (added to new file to old files)
Code:
# awk -F, 'NR==FNR{a[$1FS$2FS$3]++;next}{b[$1FS$2FS$3]++}
END{for(i in b){if(a[i]==b[i]){print "NEW->",i";\t\t",a[i]}else{print i";\t\t",b[i]}}}' new old* new

Code:
## if "i indexed element of b array's count" (other files and new records)
## and "i indexed element of a array's count" is equal (only new records)
## is equal that is to say -> there is only in new file and not in the old files
## because if there was a same record in the old files 
## then the record of count's would not be same and more than the a[i] 's count value..
## so it is a NEW line


and @ahamed code
Code:
if(b[i]-1&&a[i]!=b[i]) 
if(b[i]-1)
## if ["element that i indexed of b array's count - 1"] has a value
## so there must be at least one record (i guess be sure about read successfully the records )
a[i]!=b[i]
## if "i indexed element of b array's count" and "i indexed element of a array's count" is not equal
## so trying to be sure that is there a record in the other files?
## if not equal then there is a record in the other files
## so it is a OLD line

regards
ygemici
This User Gave Thanks to ygemici For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare 2 columns from the same file and print a value depending on the result

Hello Unix gurus, I have a file with this format (example values): label1 1 0 label2 1 0 label3 0.4 0.6 label4 0.5 0.5 label5 0.1 0.9 label6 0.9 0.1 in which: column 1 is a row label column 2 and 3 are values I would like to do a simple operation on this table and get the... (8 Replies)
Discussion started by: ksennin
8 Replies

2. Shell Programming and Scripting

Compare & print content of file which is not matching

Hi All I want to compare 2 files using awk and get output of content which is not matching I have 2 files a.txt 123 456 780 143 b.txt A|B|C|167|D|E C|K|D|123|D|E A|B|D|789|G|F C|D|G|143|A|B Not matching line from b.txt O/P A|B|C|167|D|E A|B|D|789|G|F (3 Replies)
Discussion started by: aaysa123
3 Replies

3. Shell Programming and Scripting

Compare 2 text file with 1 column in each file and write mismatch data to 3rd file

Hi, I need to compare 2 text files with around 60000 rows and 1 column. I need to compare these and write the mismatch data to 3rd file. File1 - file2 = file3 wc -l file1.txt 58112 wc -l file2.txt 55260 head -5 file1.txt 101214200123 101214700300 101250030067 101214100500... (10 Replies)
Discussion started by: Divya Nochiyil
10 Replies

4. UNIX for Dummies Questions & Answers

Compare 2 files print the lines of file 2 that contain a string from file 1

Hello I am a new unix user, and I have a work related task to compare 2 files and print all of the lines in file 2 that contain a string from file 1 Note: the fields are in different columns in the files. I suspect the is a good use for awk? Thanks for your time & help File 1 123 232 W343... (6 Replies)
Discussion started by: KevinRidley
6 Replies

5. Shell Programming and Scripting

Compare and print out data only appear in file 1 problem

Below is the data content of file_1 and file_2: file_1 >sample_1 FKGJGPOPOPOQA ASDADWEEWERE ASDAWEWQWRW ASDASDASDASDD file_2 >sample_1 DRTOWPFPOPOQA ASDADWEEASDF ASDADRTYWRW ASDASDASDASDD I got try the following perl script. Unfortunately, it can't give my desired output result... (7 Replies)
Discussion started by: patrick87
7 Replies

6. Shell Programming and Scripting

Compare two file and print same line

i want to compare two file and print same line file1 12345 a 23456 a 45678 a 45679 a file2 23456 a 34567 a 45679 a output 23456 a 45679 a any one can help me? Thank you (7 Replies)
Discussion started by: bleach8578
7 Replies

7. Shell Programming and Scripting

compare two columns of different files and print the matching second file..

Hi, I have two tab separated files; file1: S.No ddi fi cu o/l t+ t- 1 0.5 0.6 o 0.1 0.2 2 0.2 0.3 l 0.3 0.4 3 0.5 0.8 l 0.1 0.6 ... (5 Replies)
Discussion started by: vasanth.vadalur
5 Replies

8. Shell Programming and Scripting

Compare selected columns from a file and print difference

I have learned file comparison from my previous post here. Then, it is comparing the whole line. Now, i have a new problem. I have two files with 3 columns separated with a "|". What i want to do is to compare the second and third column of file 1, and the second and third column of file 2. And... (4 Replies)
Discussion started by: kingpeejay
4 Replies

9. Shell Programming and Scripting

compare 2 file and print difference in the third file URG PLS

Hi I have two files in unix. I need to compare two files and print the differed lines in other file Eg file1 1111 2222 3333 file2 1111 2222 3333 4444 5555 newfile 4444 5555 Thanks In advance (3 Replies)
Discussion started by: evvander
3 Replies

10. UNIX for Dummies Questions & Answers

Compare Data in the same file

Dear Unix-Gurus, I'm trying to write a script to compare the data in a log file. Here's how my logfile will look like: 'List All A0 Data in Destination Server' A0567 A0678 A0789 List A0 Files in Source Server A0567 A0678 A0789 So if the file match in Source Server match Destination... (1 Reply)
Discussion started by: lweegp
1 Replies
Login or Register to Ask a Question