awk to compare files and validate order of headers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to compare files and validate order of headers
# 1  
Old 04-21-2017
awk to compare files and validate order of headers

The below awk verifies the count and order of each text file in the directory. The script does execute and produce output, however the order of the headers are not compared to key. The portion in bold is supposed to do that. If the order of the headers in each text file is the same as key, then the file is good else the out of order header is printed. I am not sure what I am doing wrong. Thank you Smilie.

file.txt tab-delimited in /home/cmccabe/Desktop/validate/*.txt --- each text file ( usually 3) are the same format or should be ----
Code:
Index	Chr	Start	End	Ref	Alt 	Inheritence	Score	Quality	HGMD	Classification
1	1	10	100	A	-	.	2	GOOD	.	VUS
2	1	100	1000	-	C	.	5	STRAND BIAS	.	Benign
3	5	50	500	AA	T	.	1	GOOD	.	Benign

key tab-delimited --- order of each header (always the same) ----
Code:
Index	Chr	Start	End	Ref	Alt	Inheritence	Score	Quality	HGMD	Classification

awk
Code:
logfile=/home/cmccabe/Desktop/validate/process.log    # define log
for f in /home/cmccabe/Desktop/validate/*.txt ; do        # start loop
     echo "Start header validation creation: $(date) - File: $f"     # start entry for file in log
     bname=`basename $f`     # strip off path from filename
     awk '   # call awk
FNR==NR {  # process lines in fields
    for(n=1;n<=NF;n++)  # iterate through headers from file  
        a[$n]    # define array N
    nextfile     # next
}
NF==(n-1) {   # check header count matches key file
    print FILENAME " file has expected number of 11 fields"   # good message
    nextfile    # next
}
{
    for(i=1;i<=NF;i++)  # iterate through headers from file
        b[$i]   # define array b
    print FILENAME " is missing header for: "   # bad message
    for(i in a)    # compare each header to array a
    if(i in b==0)  # if header not found
        print i  # print missing header
    nextfile    # next
}
{
    for(n=1;n<=NF;n++)   # iterate through headers from file
        a[$n]   # define array N
    nextfile   # next
}
NF==(/home/cmccabe/Desktop/validate/key) {   # check order of headers in file to key
    print FILENAME " has expected header order"    # good message
    nextfile   # next
}
{
    for(i=1;i<=NF;i++)   # iterate through headers from file
        b[$i]   # define array b
    print FILENAME " header is out of order for: "  # bad message
    for(i in a)    # compare each header to array a
    if(i in b==0)   # if header out of order as compared to key
        print i   # print header
    nextfile   # next
}' /home/cmccabe/Desktop/validate/key $f    # define compare file (key) and each text file ($f)
done << "$logfile"    # store in log and close loop

current output
Code:
Start header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file1.txt
/home/cmccabe/Desktop/validate/file1.txt file has expected number of 11 fields
End header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file1.txt
Start header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file2.txt
/home/cmccabe/Desktop/validate/file2.txt file has expected number of 11 fields
End header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file2.txt
Start header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file3.txt
/home/cmccabe/Desktop/validate/file3.txt file has expected number of 11 fields
End header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file3.txt

desired output
Code:
Start header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file1.txt
/home/cmccabe/Desktop/validate/file1.txt file has expected number of 11 fields
/home/cmccabe/Desktop/validate/file1.txt file has expected header order
End header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file1.txt
Start header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file2.txt
/home/cmccabe/Desktop/validate/file2.txt file has expected number of 11 fields
/home/cmccabe/Desktop/validate/file2.txt file has expected header order
End header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file2.txt
Start header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file3.txt
/home/cmccabe/Desktop/validate/file3.txt file has expected number of 11 fields
/home/cmccabe/Desktop/validate/file3.txt file has expected header order
End header validation creation: Fri Apr 21 07:39:09 CDT 2017 - File: /home/cmccabe/Desktop/validate/file3.txt


Last edited by cmccabe; 04-21-2017 at 04:32 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk compare files

I have a below requirement and trying to compare the files using awk File 1 - Already stored on a prev day id | text | email id --------------------------------- 89564|this is line 1 | xyz@sample.txt 985384|this is line 2 | abc@sample.txt 657342|this is line 3 |... (3 Replies)
Discussion started by: rakesh_411
3 Replies

2. Shell Programming and Scripting

Compare 2 files, awk maybe?

I have 2 files, file1: alfa numbers numbers vita numbers numbers gama numbers numbers delta numbers numbers epsilon numbers numbers zita numbers numbers ... file2: 'zita' keepnumbers keepnumbers keepnumbers 'gama' keepnumbers keepnumbers keepnumbers 'misc' ... (11 Replies)
Discussion started by: phaethon
11 Replies

3. Shell Programming and Scripting

CSv2dat file headers and columns order

Dear all, I have a csv file which is transformed to .dat. I have an awk file which is supposing to do the mapping of the dat file. the code from the awk file is the one below. The content of the dat file is looking like this (tab separated): ODT AGE CDT CO SEX TIME ... (9 Replies)
Discussion started by: grikoss
9 Replies

4. Shell Programming and Scripting

Compare files using awk

Please help me to compare two files and remove the items in file2 from file1 file 1:delimited using pipe(|) file1 00012|Description - 1|||||AA12345|1|AB12345|2|2012/06/03 AB123|Description - 2|||||AA12345|3|ZA11111|4|2012/06/04 11111|Description - 3|||||AP00012|1|AB12345|2|2012/06/03... (8 Replies)
Discussion started by: Mary James
8 Replies

5. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

6. Shell Programming and Scripting

using awk to compare two files

Hi, I have two tab separated files; file1: S.No ddi fi cu o/l t+ t- 1 0.5 0.6 o 0.1 0.2 2 0.2 0.3 l 0.3 0.4 3 0.5 0.8 l 0.1 0.6 file2: S.No ddi fi cu o/l t+ t- 1 0.8 0.9 o 0.5 0.6 2 0.5 0.2 o 0 0 3 0.2 0.3 l 0 0 4 0.5 0.6 l 0 0 (1 Reply)
Discussion started by: vasanth.vadalur
1 Replies

7. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies

8. Shell Programming and Scripting

Compare two files using awk

Hi. I'm new to awk and have searched for a solution to my problem, but haven't found the right answer yet. I have two files that look like this: file1 Delete,3105551234 Delete,3105551236 Delete,5625559876 Delete,5625556789 Delete,5625553456 Delete,5625551234 Delete,5625556956... (8 Replies)
Discussion started by: paul.o
8 Replies

9. Programming

Question on order of headers and WEXITSTATUS

In one of the Unix Programming FAQ's they have the following headers in the program to catch SIGCHLD #include <sys/types.h> /* include this before any other sys headers */ #include <sys/wait.h> /* header for waitpid() and various macros */ #include <signal.h> /* header for signal... (5 Replies)
Discussion started by: frequency8
5 Replies

10. Shell Programming and Scripting

Challenging Compare and validate question -- plus speed.

I have a tab delimited HUGE file (13 million records) with Detail, Metadata and Summary records. Sample File looks like this M BESTWESTERN 4 ACTIVITY_CNT_L12 A 3 M AIRTRAN 4 ACTIVITY_CNT_L12 A 3 D BESTWESTERN FIRSTNAME LASTNAME 209 N SANBORN AVE D BESTWESTERN FIRSTNAME LASTNAME 6997... (25 Replies)
Discussion started by: madhunk
25 Replies
Login or Register to Ask a Question