Sponsored Content
Top Forums Shell Programming and Scripting Bash to verify and validate file header and data type Post 302995977 by cmccabe on Sunday 16th of April 2017 09:54:25 AM
Old 04-16-2017
Bash to verify and validate file header and data type

The below bash is a file validation check executed that will verify the correct header count of 10 and the correct data type in each field of the tab-delimited file. The key has the data type of each field in it. My real data has 58 headers in it but only the header and next row need to be checked. The below files are examples that have all possible data types in them. That is the data type of each line after the header is the same as the line above it. All lines will have some sort of data in it, either a numeric, alpha charter or a . (dot) for a null value. If the file is validate a message is written to the output indicated this, else the missing header or bad data type is written to output.
I'm not sure if the below is the best way to do this, but hopefully it is close. Each line is commented as to what I think is happening. Thank you Smilie.

There are 3 example files represent each of the only possibilities.
Code:
file1  --- is a good file, validated for both header and data type in all fields in file1
file2  --- is a bad file, not validated though the header line is good, the data type expected in QUAL is alpha and it is a .(dot) in red in file2
file3  --- is a bad file, not validated though the header line is not good (10 columns are expected), though the data type expected in file3

key
Code:
Index    Chr    Start    End    Ref    Alt    Freq    Qual    Score    Input    ---- defined 10 column headers ----
Integar     Integar    Integar    Integar    Alpha    Alpha    Integar    Alpha    Integar    Integar   --- data type of each line after header  ----

file1
Code:
Index    Chr    Start    End    Ref    Alt    Freq    Qual    Score    Input
1    1    1    100    C    -    1    GOOD    10    .
2    2    20    200    A    C    .002    STRAND BIAS    2    .
3    2    270    400    -    GG    .036    GOOD    6    .

file2
Code:
Index    Chr    Start    End    Ref    Alt    Freq    Qual    Score    Input
1    1    1    100    C    -    1    .    10    .
2    2    20    200    A    C    .002    STRAND BIAS    2    .
3    2    270    400    -    GG    .036    GOOD    6    .

file3
Code:
Index    Chr    Start    End    Ref    Alt    Freq    Qual    Input
1    1    1    100    C    -    1    GOOD    10    .
2    2    20    200    A    C    .002    STRAND BIAS    2    .
3    2    270    400    -    GG    .036    GOOD    6    .

Code:
#!/bin/bash# call bash script
awk -F'\t' '{print NF, "fields detected in file and they are:" ORS $0; exit}' file >> output  # detect header row in file and store in output
   if [[ $NF -eq 1 ]]; then   # display results
      echo "file has expected number of fields"   # file is validated for headers
    else
      echo "file is missing header for:"  # missing header field ...in file not-validated
      echo "$NF"
    fi  # close if.... else    
    
isnumeric()   # numeric function
{   # start block
    result=$(echo "$1" | tr -d '[[:digit:]]')  # check each field in file for numeric and store result
    echo ${#result}   # display result
}  # end block

isalpha()   # charcter function
{  # start block
    result=$(echo "$1" | tr -d '[[:alpha:]]')  # check each field in file for character and store result
    echo ${#result}   # display result
}  # end block
col1=""   # define col to search
col2=""   # define col to search
col3=""   # define col to search
col4=""   # define col to search
col5=""   # define col to search
col6=""   # define col to search
col7=""   # define col to search
col8=""   # define col to search
col9=""    # define col to search
col10=""  # define col to search
let retval=1  # data to check in this row

while read record  # start loop to read each column in file
do
    echo "$record" | awk -F'\t' '{print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 }' | read col1 col2 col3 col4 col5 col6 col7 col8 col col10  # store in col name in record
    
    # check  if numeric in col
    if [[ $(isnumeric "$col1") -eq 1 && $(isnumeric "$col2") -eq 1 && $(isnumeric "$col3") -eq 1 && $(isnumeric "$col4") -eq 1 && $(isnumeric "$col7") -eq 1 && $(isnumeric "$col9") -eq 1 && $(isnumeric "$col10") -eq 1 ]]; then
         retval=1  # check data in this row
    else
         retval=0  # go back to header row
         break
    fi  # close if.... else
    
    # check if alpha in col
    if [[ $(isalpha "$col5") -eq 1 && $(isalpha "$col6") -eq 1 && $(isalpha "$col8") -eq 1 ]]; then
         retval=1  # check data in this row
    else
         retval=0  # go back to header row
         break
    fi  # close if....else
    
    if [[ $retval -eq 1 ]]; then   # display results
      echo "file is correct data type in each field"   # file isvalidated
    else
      echo "file is  not the correct data type for:"  # colums ...in file not-validated
      echo "$col1 $col2 $col3 $col4 $col5 $col6 $col7 $col8 $col9 $col10"
    fi  # close if.... else    
    
    if [[ NF == 10 && $retval -eq 1 ]]; then   # execute and display file validated
      echo "file is validated"
    else
      echo "file is not validated"
    fi
done  < file >> output  # end loop and define file to check and add to output


Last edited by cmccabe; 04-17-2017 at 09:19 AM.. Reason: added details added red color to file2, corrected syntax errors detected by shell check
 

10 More Discussions You Might Find Interesting

1. Programming

FILE data type

Hi all, Can anyone tell me a little about the datatype FILE, which represents stream. What does its structure look like, and in which header file is it defined and so on... Ex : FILE *fp ; fp = fopen("filename", "w") ; (6 Replies)
Discussion started by: milhan
6 Replies

2. Shell Programming and Scripting

Better way to Validate column data in file.

I am trying to validate the third column in a pipe delimited file. The column must be 10 char long and all digits 0-9. I am writing out two new files from the existing file, if it would be quicker, I could leave the bad rows in the file and ignore them in the next process. What I have is... (12 Replies)
Discussion started by: barry1
12 Replies

3. UNIX for Dummies Questions & Answers

Verify the data type in a file with UNIX function

I am seeking help on this UNIX function, please help. Thanks in advance. I have a large file, named as 'MyFile'. It was tab-delmited, I am told that each record in column 1 is unique. How would I verify this with UNIX function or command? (1 Reply)
Discussion started by: duke0001
1 Replies

4. UNIX for Advanced & Expert Users

Verify file was sftp'd via bash script

Hello Experts, I have a script that that transfers a file (via sftp) and it works fine but we ran into a snag where the target server asked for the ssh key and the script didn't know what to do. I want to add some logic to this script that at least sends an email that it didn't complete as... (4 Replies)
Discussion started by: Tiberius777
4 Replies

5. Shell Programming and Scripting

Script to validate file header and trailer

Hi, I need a script that validates a file header/detail/trailer. File layout is: Header - Rec_Type|File_name|File_Date Detail - Rec_Type|field1|field2|field3... Trailder - Rec_Type|File_name|File_Date|Record_count Sample Data: HDR|customer_data.dat|20120709... (7 Replies)
Discussion started by: ash_sh
7 Replies

6. Shell Programming and Scripting

Exclude the header row in the file to validate

Hi All, File contains header row.. we need to exclude the header row...no need to validate the first row in the file. Data in the file should take valid data(two columns)..we need to exclude the more than two columns in the file except the first line. email|firstname a|123|100 b|345... (4 Replies)
Discussion started by: bmk
4 Replies

7. Shell Programming and Scripting

Verify the header and trailer in file

please see my requirement, I hope I am clear. (9 Replies)
Discussion started by: mirwasim
9 Replies

8. Shell Programming and Scripting

Extract header data from one file and combine it with data from another file

Hi, Great minds, I have some files, in fact header files, of CTD profiler, I tried a lot C programming, could not get output as I was expected, because my programming skills are very poor, finally, joined unix forum with the hope that, I may get what I want, from you people, Here I have attached... (17 Replies)
Discussion started by: nex_asp
17 Replies

9. Shell Programming and Scripting

Need a ready Shell script to validate a high volume data file

Hi, I am looking for a ready shell script that can help in loading and validating a high volume (around 4 GB) .Dat file . The data in the file has to be validated at each of its column, like the data constraint on each of the data type on each of its 60 columns and also a few other constraints... (2 Replies)
Discussion started by: Guruprasad
2 Replies

10. Shell Programming and Scripting

Script to validate header in a csv file

Hi All; I am struggling to write a script that validates file header. Header file would be like below with TAB separated TRX # TYPE REF # Source Piece Code Destination Piece Code every time I need to check the txt file if the header was same as above fields if validation success... (6 Replies)
Discussion started by: heye18
6 Replies
All times are GMT -4. The time now is 05:14 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy