Script to compare 1 file with all others


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Script to compare 1 file with all others
# 22  
Old 12-24-2011
Paste the input and a sample output!
For more spacing add a "\t"...

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 23  
Old 12-24-2011
sorry no further spacing required, me reading it wrong.

the last script prints everything off above but then shows a tally for select lines in the file although comparing these lines are = 46

the actual file contains

#cat 20111223.csv | wc -l
54

I just need the tally of repeats and "NEW" when comparing 20111223.csv file against all others i.e no output from other files but adds increments if match found against 20111223.csv file

---------- Post updated at 04:22 AM ---------- Previous update was at 04:20 AM ----------

loads more "NEW" lines had to cut them out


Code:
NEW: NE:508714,SHELF:3,SLOT:2
NEW: NE:222774,SHELF:3,SLOT:3
NEW: NE:851608,SHELF:6,SLOT:5
NEW: NE:221005,SHELF:5,SLOT:3
NEW: NE:503164,SHELF:5,SLOT:6
NEW: NE:593848,SHELF:7,SLOT:5
NEW: NE:222272,SHELF:10,SLOT:6
NEW: NE:509860,SHELF:15,SLOT:2
NEW: NE:542045,SHELF:17,SLOT:3
NEW: NE:567927,SHELF:6,SLOT:2
NEW: NE:220151,SHELF:6,SLOT:4
NEW: NE:895435,SHELF:7,SLOT:6
NEW: NE:221505,SHELF:10,SLOT:1
NEW: NE:222783,SHELF:12,SLOT:2
NEW: NE:877351,SHELF:2,SLOT:2
NEW: NE:222774,SHELF:3,SLOT:5
NEW: NE:542045,SHELF:17,SLOT:2
NEW: NE:851076,SHELF:8,SLOT:6
NEW: NE:852991,SHELF:10,SLOT:5
NEW: NE:220036,SHELF:1,SLOT:4
NEW: NE:223334,SHELF:9,SLOT:5
NEW: NE:877351,SHELF:2,SLOT:6
NEW: NE:852068,SHELF:1,SLOT:5
NEW: NE:564127,SHELF:9,SLOT:3
NEW: NE:508714,SHELF:2,SLOT:6
NE:219953,SHELF:5,SLOT:3        1
NE:230028,SHELF:12,SLOT:3       1
NE:801048,SHELF:2,SLOT:6        1
NE:852991,SHELF:11,SLOT:4       1
NE:877072,SHELF:5,SLOT:2        1
NE:221548,SHELF:4,SLOT:5        1
NE:236313,SHELF:16,SLOT:3       1
NE:219992,SHELF:1,SLOT:4        1
NE:236260,SHELF:11,SLOT:6       1
NE:225827,SHELF:8,SLOT:5        1
NE:847950,SHELF:10,SLOT:2       1
NE:851249,SHELF:2,SLOT:1        1
NE:571208,SHELF:7,SLOT:4        1
NE:230478,SHELF:14,SLOT:1       1
NE:852049,SHELF:10,SLOT:1       1
NE:862480,SHELF:5,SLOT:3        1
NE:854588,SHELF:9,SLOT:3        1
NE:847950,SHELF:12,SLOT:4       4
NE:509033,SHELF:2,SLOT:1        3
NE:593848,SHELF:10,SLOT:1       1
NE:848649,SHELF:9,SLOT:4        1
NE:851608,SHELF:6,SLOT:6        2
NE:222930,SHELF:12,SLOT:5       1
NE:230028,SHELF:16,SLOT:5       1
NE:221539,SHELF:13,SLOT:4       1
NE:903793,SHELF:5,SLOT:4        1
NE:222314,SHELF:10,SLOT:4       1
NE:225827,SHELF:10,SLOT:6       1
NE:541469,SHELF:11,SLOT:6       2
NE:594728,SHELF:3,SLOT:5        1
NE:222783,SHELF:13,SLOT:5       1
NE:852049,SHELF:13,SLOT:6       1
NE:862480,SHELF:8,SLOT:1        1
NE:222318,SHELF:2,SLOT:1        1
NE:852068,SHELF:7,SLOT:4        1
NE:565538,SHELF:7,SLOT:4        1
NE:219853,SHELF:4,SLOT:6        1
NE:571456,SHELF:14,SLOT:6       1
NE:220251,SHELF:3,SLOT:5        1
NE:222544,SHELF:2,SLOT:1        1
NE:828273,SHELF:2,SLOT:2        1
NE:230028,SHELF:5,SLOT:3        1
NE:542045,SHELF:6,SLOT:1        1
NE:220760,SHELF:6,SLOT:3        1
NE:571508,SHELF:14,SLOT:4       2
NE:225721,SHELF:11,SLOT:6       1

# 24  
Old 12-24-2011
Try this...
Code:
awk -F, 'NR==FNR{file=FILENAME;a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{if(FILENAME~file)next;b[$1OFS$2OFS$3]++;}
END{ for(i in a){if(a[i] && !b[i]){print "NEW: "i}} for(i in b){if(b[i])print i"\t\t"b[i]}}' OFS=, 20111222.csv *.csv

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 25  
Old 12-24-2011
great work yaar, im impressed with your knowledge it works i'll just check on a few things.

I need to know what each syntax means/does can you explain please I'm new to scripting

Smilie
# 26  
Old 12-24-2011
I am not good at explaining... but here is my best shot!

For more info, check this link...
Awk - A Tutorial and Introduction - by Bruce Barnett

Code:
#-F used to indicate the field separator.
#By default it is space, I am assigning "," to it, since your records are separated by comma
awk -F, '
#NR,FNR - inbuilt variable
#NR  = record/line number of all the files starting from 1
#FNR = Current files record/line number 
#NR==FNR means execute the code within the block only for the first file i.e. 20111222.csv
#NR=1, FNR=1 for the first line of the first file
#NR=2, FNR=2 for the first line of the first file
#NR=10, FNR=1 for the first line of the second file, got it?

NR==FNR{
        #FILENAME is a inbuilt variable which will have the name of the file currently being processed
        #I am stoing the filename to a variable "file"
        file=FILENAME;

        #Creating an associative array with first three elements separated by OFS and assigning value 1 as the key
        #It will look something like this, a[NE:223334,SHELF:9,SLOT:5]=1, a[NE:22554,SHELF:9,SLOT:5]=1 etc
        #There will not be any duplicate entries in this. If duplicate entries comes, the value will get incremented to 2 or 3 etc
        #i.e. Lets say NE:223334,SHELF:9,SLOT:5 comes twice, then a[NE:223334,SHELF:9,SLOT:5]=2
        #OFS - inbuilt variable, Output Field Separator. I have set it to , towards the end of the script.
        #By default FS amd OFS will be space
        a[$1 OFS $2 OFS $3]++;

        #continue, i.e. do not execute the code below
        next
} 
#From second file onwards, the control will come here. So from the first file we have stored all the data
#into the array "a" with fields as the key
#From the second file, the first three fields will be extracted and checked if it present in the array "a" we populated,
#if it is present it will enter the code block.
a[$1 OFS $2 OFS $3]{
        #Since we are giving *.csv, 20111222.csv will come again, so we need to skip it and hence this check.
        if(FILENAME~file){
                next
        }

        #If the entry is found in the array means, they are old and now we need to keep a count of that
        #So I create another but similar associative array to hold the old data and the count.
        #If more than one entry is detected, the count will be incremented.
        b[$1 OFS $2 OFS $3]++
}
#END - inbuild keyword. This block will be processed at the last
END{ 
        #Now we print the results.
        #array "a" will have all the data from the first file, we need to find the data which is new.
        #So for each key in array "a" i.e. i, varible i will have the keys from the for loop,
        #it should have a value in the array "a" and should not be present in the array "b", since array "b" holds old data.
        #and thus we will get the new data
        for(i in a){
                if(a[i] && !b[i]){
                        print "NEW: "i
                }
        } 

        #Now print the old data, take the keys from array "b" and print it.
        #i will have the value "NE:223334,SHELF:9,SLOT:5" and b[i] will have the count
        for(i in b){
                if(b[i]){
                        print i"\t\t"b[i]
                }
        }
}' OFS=,  20111222.csv *.csv

HTH
--ahamed

---------- Post updated at 06:15 AM ---------- Previous update was at 05:59 AM ----------

Off topic...
This is why we developers hate to document our code!... Smilie
2 lines of code and 100 lines of documentation for that, we can spend that much time testing and optimizing our code... hee hee...


--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 27  
Old 12-24-2011
Quote:
Originally Posted by ahamed101
I am not good at explaining... but here is my best shot!

For more info, check this link...
Awk - A Tutorial and Introduction - by Bruce Barnett

Code:
#-F used to indicate the field separator.
#By default it is space, I am assigning "," to it, since your records are separated by comma
awk -F, '
#NR,FNR - inbuilt variable
#NR  = record/line number of all the files starting from 1
#FNR = Current files record/line number 
#NR==FNR means execute the code within the block only for the first file i.e. 20111222.csv
#NR=1, FNR=1 for the first line of the first file
#NR=2, FNR=2 for the first line of the first file
#NR=10, FNR=1 for the first line of the second file, got it?

NR==FNR{
        #FILENAME is a inbuilt variable which will have the name of the file currently being processed
        #I am stoing the filename to a variable "file"
        file=FILENAME;

        #Creating an associative array with first three elements separated by OFS and assigning value 1 as the key
        #It will look something like this, a[NE:223334,SHELF:9,SLOT:5]=1, a[NE:22554,SHELF:9,SLOT:5]=1 etc
        #There will not be any duplicate entries in this. If duplicate entries comes, the value will get incremented to 2 or 3 etc
        #i.e. Lets say NE:223334,SHELF:9,SLOT:5 comes twice, then a[NE:223334,SHELF:9,SLOT:5]=2
        #OFS - inbuilt variable, Output Field Separator. I have set it to , towards the end of the script.
        #By default FS amd OFS will be space
        a[$1 OFS $2 OFS $3]++;

        #continue, i.e. do not execute the code below
        next
} 
#From second file onwards, the control will come here. So from the first file we have stored all the data
#into the array "a" with fields as the key
#From the second file, the first three fields will be extracted and checked if it present in the array "a" we populated,
#if it is present it will enter the code block.
a[$1 OFS $2 OFS $3]{
        #Since we are giving *.csv, 20111222.csv will come again, so we need to skip it and hence this check.
        if(FILENAME~file){
                next
        }

        #If the entry is found in the array means, they are old and now we need to keep a count of that
        #So I create another but similar associative array to hold the old data and the count.
        #If more than one entry is detected, the count will be incremented.
        b[$1 OFS $2 OFS $3]++
}
#END - inbuild keyword. This block will be processed at the last
END{ 
        #Now we print the results.
        #array "a" will have all the data from the first file, we need to find the data which is new.
        #So for each key in array "a" i.e. i, varible i will have the keys from the for loop,
        #it should have a value in the array "a" and should not be present in the array "b", since array "b" holds old data.
        #and thus we will get the new data
        for(i in a){
                if(a[i] && !b[i]){
                        print "NEW: "i
                }
        } 

        #Now print the old data, take the keys from array "b" and print it.
        #i will have the value "NE:223334,SHELF:9,SLOT:5" and b[i] will have the count
        for(i in b){
                if(b[i]){
                        print i"\t\t"b[i]
                }
        }
}' OFS=,  20111222.csv *.csv

HTH
--ahamed

---------- Post updated at 06:15 AM ---------- Previous update was at 05:59 AM ----------

Off topic...
This is why we developers hate to document our code!... Smilie
2 lines of code and 100 lines of documentation for that, we can spend that much time testing and optimizing our code... hee hee...


--ahamed




haha yes i can understand then you have to explain to dummies like me Smilie

a few questions if I may...


assigning value 1 as the key?

;a ?

++;next} ?

END ? so for example the first line will be processed last i.e all of nawk -F, 'NR==FNR{file=FILENAME;a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{if(FILENAME~file)next;b[$1OFS$2OFS$3]++;}
END ? if so why would you do this?

if(FILENAME~file){ when we skip, what are we skipping? 20111222.csv will come twice? or compared to *.csv ?

{} what do these do

[] what do these do

() what do these do

variable i ? what does it mean ?

(a[i] && !b[i]) do not understand this at all, can i have deep explanation ? array a variable i & array b variable i something ??

i}} why use this ?

seems like when we come to print results we have for & if statements, is their a bible on this ? will read the awk bit later


can I just say learning from guru's like you make doing this so satisfying I'm indebted to you
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with file compare and move script

I'm running debian (the raspbian version of it) and working on a script to compare files in 2 directories, source and target, move files with duplicate names to a 3rd directory, then move remaining files in source to target. I can't get the syntax right, keep getting syntax errors and can't get... (7 Replies)
Discussion started by: mattz40
7 Replies

2. Shell Programming and Scripting

Shell script (sh file) logic to compare contents of one file with another file and output to file

Shell script logic Hi I have 2 input files like with file 1 content as (file1) "BRGTEST-242" a.txt "BRGTEST-240" a.txt "BRGTEST-219" e.txt File 2 contents as fle(2) "BRGTEST-244" a.txt "BRGTEST-244" b.txt "BRGTEST-231" c.txt "BRGTEST-231" d.txt "BRGTEST-221" e.txt I want to get... (22 Replies)
Discussion started by: pottic
22 Replies

3. Shell Programming and Scripting

Bash script to compare 2 file

Hello Friends please help me to create script to compare 2 fiile which has rpm info . File 1: glibc-2.12.1.149.el6_6.5.x86_64.rpm glibc-common-2.12-1.149.el6_6.5.x86_64.rpm File 2 : glibc-2.12.123.el6_6.5.x86_64.rpm glibc-common-2.12-123.el6_6.5.x86_64.rpm To compare file1... (1 Reply)
Discussion started by: rnary
1 Replies

4. Shell Programming and Scripting

Script to compare lines in a file

Need help to create the script that does the following : - 1. Compare the current line "column B and C" with next line "column B and C" 2. If they are the same, print output to a file Input file 2014-08-25 04:45:56.673|T1|JO|Begin|10 2014-08-25 04:55:56.673|T1|JO|Begin|11 2014-08-25... (8 Replies)
Discussion started by: chailee
8 Replies

5. Shell Programming and Scripting

script to grep a pattern from file compare contents with another file and replace

Hi All, Need help on this I have 2 files one file file1 which has several entries as : define service{ hostgroup_name !host1,!host5,!host6,.* service_description check_nrpe } define service{ hostgroup_name !host2,!host4,!host6,.* service_description check_opt } another... (2 Replies)
Discussion started by: namitai
2 Replies

6. Shell Programming and Scripting

script to compare two columns in a file

Dear everyone, I need any sort of shell script or perl script would do the following. I have a txt file as follows: ;Stretnumber Resident Resdient (not in file) 16 John Mary 16 Mary Parker 16 Nancy Smith 16 Mary John 18 Trey ... (5 Replies)
Discussion started by: sasharma
5 Replies

7. Shell Programming and Scripting

Script to compare file sizes

I need to write a bash script larger X Y that compares the sizes of two specified files X and Y, and reports which file is larger. For example, if X is larger, the output should be "File X is larger", while if Y is larger, the output should be "File Y is larger". If the files are exactly the... (3 Replies)
Discussion started by: julia_21436
3 Replies

8. Shell Programming and Scripting

Unix script to compare the two file

Hi, I want to compare two | delimited files.Awk is not working in my unix box.So plz give alternate solutions. Please see the below code: file1=$1 file2=$2 num_of_records_file1=`awk ' END { print NR } ' $file1` num_of_records_file2=`awk ' END { print NR } ' $file2` i=1 while do... (4 Replies)
Discussion started by: autosys_nm
4 Replies

9. Shell Programming and Scripting

Check File Exists and compare to previous day file script

We have data files that are ftp'd every morning to a SUN server. The file names are exactly the same except for that each has the date included in its name. I have to write script to do 2 things: STEP 1) Verify that the file arrived in morning. STEP 2) Compare the file size of the current... (3 Replies)
Discussion started by: rbknisely
3 Replies

10. Shell Programming and Scripting

compare file size from a output file from a script

Hi guys, firstly I'm working on SunOS 5.10 Generic_125100-10 sun4u sparc SUNW,Sun-Fire-V240 I've made a script to compress two directory and then send them to an other server via ftp. This is working very well. Inside theis script I decide to log usefull data for troubleshooting in case of... (7 Replies)
Discussion started by: moustik
7 Replies
Login or Register to Ask a Question