Sponsored Content
Top Forums Shell Programming and Scripting Bash to verify and validate file header and data type Post 302996028 by cmccabe on Monday 17th of April 2017 08:32:51 PM
Old 04-17-2017
The first portion of the bash in bold verifies the headers in each text file in dir and creates 2 out files, one for each unique file. That seems to be working perfectly.

The second portion of the bash is to test and verify each data type. The script executes but the data type in each field is not verified, only the headers are verified.

The key also tab-delimited has the defined headers and data type of each field.

Only the header line and line under that need to be verified as all files in the dir will have the same format of each. Thank you Smilie.

file1 tab-delimited
Code:
Index   Chr Start   End Ref Alt Freq    Qual    Score   Input   ---- this file is verified with 10 headers and the data type in each field is good
1    1    1    100    C    -    1    GOOD    10    .
2    2    20    200    A    C    .002    STRAND BIAS    2    .
3    2    270    400    -    GG    .036    GOOD    6    .

file2 tab-delimited
Code:
Index   Chr Start   End Ref Alt Freq    Qual    Score    Input --- this file is verified with 10 headers but not verified as the red . in QUAL should be "GOOD" or alpha
1    1    1    100    C    -    1    .   10    .
2    2    20    200    A    C    .002    STRAND BIAS    2    .
3    2    270    400    -    GG    .036    GOOD    6    .

key
Code:
Index    Chr    Start    End    Ref    Alt    Freq    Qual    Score    Input    ---- defined 10 column headers ----
Integar     Integar    Integar    Integar    Alpha    Alpha    Integar    Alpha    Integar    Integar   --- data type of each line after header  ----

the ---- are nor part of each file, only there to help in the description


Bash
Code:
#!/bin/bash

dir="/home/cmccabe/bash"   # directory to search for files
for f in "$dir"/*.txt; do   # start for loop
bname=`basename $f`    # strip off path
pref=${bname%%.txt}    # strip of path and extention from output
awk '
FNR==NR {  # process all columns and rows in file
    for(n=1;n<=NF;n++)   # iterate through  each
        a[$n]  # store inarray n
    nextfile   # next file
}
NF==(n-1) {  # define NF
    print FILENAME " file has expected number of fields"   # Good file
    nextfile   # next file
}
{
    for(i=1;i<=NF;i++)  # iterate through headers
        b[$i]   # header lines
    print FILENAME " is missing header for: "   # Bad file
    for(i in a)   # read headers into i
    if(i in b==0)  # if can not find header in key
        print i    # print missing header
    nextfile  
}' /home/cmccabe/bash/key $f > /home/cmccabe/bash/${pref}_out # use key as headers to look for in files and create out for each
done

isnumeric()   # numeric function
{   # start block
    result=$(echo "$1" | tr -d '[[:digit:]]')  # check each field in file for numeric and store result
    echo ${#result}   # display result
}  # end block

isalpha()   # charcter function
{  # start block
    result=$(echo "$1" | tr -d '[[:alpha:]]')  # check each field in file for character and store result
    echo ${#result}   # display result
}  # end block
col1=""   # define col to search
col2=""   # define col to search
col3=""   # define col to search
col4=""   # define col to search
col5=""   # define col to search
col6=""   # define col to search
col7=""   # define col to search
col8=""   # define col to search
col9=""    # define col to search
col10=""  # define col to search
let retval=1  # data to check in this row

while read record  # start loop to read each column in file
do
    echo "$record" | awk -F'\t' '{print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 }' | read col1 col2 col3 col4 col5 col6 col7 col8 col col10  # store in col name in record
    
    # check  if numeric in col
    if [[ $(isnumeric "$col1") -eq 1 && $(isnumeric "$col2") -eq 1 && $(isnumeric "$col3") -eq 1 && $(isnumeric "$col4") -eq 1 && $(isnumeric "$col7") -eq 1 && $(isnumeric "$col9") -eq 1 && $(isnumeric "$col10") -eq 1 ]]; then
         retval=1  # check data in this row
    else
         retval=0  # go back to header row
         break
    fi  # close if.... else
    
    # check if alpha in col
    if [[ $(isalpha "$col5") -eq 1 && $(isalpha "$col6") -eq 1 && $(isalpha "$col8") -eq 1 ]]; then
         retval=1  # check data in this row
    else
         retval=0  # go back to header row
         break
    fi  # close if....else
    
    if [[ $retval -eq 1 ]]; then   # display results
      echo "file is correct data type in each field"   # file isvalidated
    else
      echo "file is  not the correct data type for:"  # colums ...in file not-validated
      echo "$col1 $col2 $col3 $col4 $col5 $col6 $col7 $col8 $col9 $col10"
    fi  # close if.... else    
    
    if [[ NF == 10 && $retval -eq 1 ]]; then   # execute and display file validated
      echo "$f is validated"
    else
      echo "$f is not validated"
    fi
done  < $f >> /home/cmccabe/bash/${pref}_out  # end loop and define file to check and add to output

desired out ---- one for each file
Code:
/home/cmccabe/bash/file1.txt file has expected number of fields
/home/cmccabe/bash/file1.txt is validated
/home/cmccabe/bash/file1.txt is correct data type in each field

Code:
/home/cmccabe/bash/file2.txt has the expected number of fields
/home/cmccabe/bash/file2.txt is not the correct data type for: QUAL
/home/cmccabe/bash/file2.txt is not validated


Last edited by cmccabe; 04-18-2017 at 06:03 AM..
 

10 More Discussions You Might Find Interesting

1. Programming

FILE data type

Hi all, Can anyone tell me a little about the datatype FILE, which represents stream. What does its structure look like, and in which header file is it defined and so on... Ex : FILE *fp ; fp = fopen("filename", "w") ; (6 Replies)
Discussion started by: milhan
6 Replies

2. Shell Programming and Scripting

Better way to Validate column data in file.

I am trying to validate the third column in a pipe delimited file. The column must be 10 char long and all digits 0-9. I am writing out two new files from the existing file, if it would be quicker, I could leave the bad rows in the file and ignore them in the next process. What I have is... (12 Replies)
Discussion started by: barry1
12 Replies

3. UNIX for Dummies Questions & Answers

Verify the data type in a file with UNIX function

I am seeking help on this UNIX function, please help. Thanks in advance. I have a large file, named as 'MyFile'. It was tab-delmited, I am told that each record in column 1 is unique. How would I verify this with UNIX function or command? (1 Reply)
Discussion started by: duke0001
1 Replies

4. UNIX for Advanced & Expert Users

Verify file was sftp'd via bash script

Hello Experts, I have a script that that transfers a file (via sftp) and it works fine but we ran into a snag where the target server asked for the ssh key and the script didn't know what to do. I want to add some logic to this script that at least sends an email that it didn't complete as... (4 Replies)
Discussion started by: Tiberius777
4 Replies

5. Shell Programming and Scripting

Script to validate file header and trailer

Hi, I need a script that validates a file header/detail/trailer. File layout is: Header - Rec_Type|File_name|File_Date Detail - Rec_Type|field1|field2|field3... Trailder - Rec_Type|File_name|File_Date|Record_count Sample Data: HDR|customer_data.dat|20120709... (7 Replies)
Discussion started by: ash_sh
7 Replies

6. Shell Programming and Scripting

Exclude the header row in the file to validate

Hi All, File contains header row.. we need to exclude the header row...no need to validate the first row in the file. Data in the file should take valid data(two columns)..we need to exclude the more than two columns in the file except the first line. email|firstname a|123|100 b|345... (4 Replies)
Discussion started by: bmk
4 Replies

7. Shell Programming and Scripting

Verify the header and trailer in file

please see my requirement, I hope I am clear. (9 Replies)
Discussion started by: mirwasim
9 Replies

8. Shell Programming and Scripting

Extract header data from one file and combine it with data from another file

Hi, Great minds, I have some files, in fact header files, of CTD profiler, I tried a lot C programming, could not get output as I was expected, because my programming skills are very poor, finally, joined unix forum with the hope that, I may get what I want, from you people, Here I have attached... (17 Replies)
Discussion started by: nex_asp
17 Replies

9. Shell Programming and Scripting

Need a ready Shell script to validate a high volume data file

Hi, I am looking for a ready shell script that can help in loading and validating a high volume (around 4 GB) .Dat file . The data in the file has to be validated at each of its column, like the data constraint on each of the data type on each of its 60 columns and also a few other constraints... (2 Replies)
Discussion started by: Guruprasad
2 Replies

10. Shell Programming and Scripting

Script to validate header in a csv file

Hi All; I am struggling to write a script that validates file header. Header file would be like below with TAB separated TRX # TYPE REF # Source Piece Code Destination Piece Code every time I need to check the txt file if the header was same as above fields if validation success... (6 Replies)
Discussion started by: heye18
6 Replies
All times are GMT -4. The time now is 12:12 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy