Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Bash to verify each line in input for specific pattern

Shell Programming and Scripting


Tags
bash

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 05-17-2017
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 21 July 2017, 10:53 AM EDT
Location: Chicago
Posts: 1,156
Thanks: 694
Thanked 15 Times in 14 Posts
Bash to verify each line in input for specific pattern

In the bash below the out put of a process is written to input. What I am trying to do is read each line in the input and verify/check it for specific text (there are always 6 lines for each file and the specific text for each line is in the description). There will always be 6 lines in each specific file in input, however the file number can vary. In this example there are 3 specific files (each color block is a file with 6 lines), but the next time there may only be two. If each line in the file is a match to description then the file is verified/good, but if it does not then the file is not.

I hope the below is a start and have commented each line. Thank you Linux.

input

Code:
Start import validation creation: Wed May 17 06:55:34 CDT 2017
/home/cmccabe/Desktop/validate/file1.txt found expected header
/home/cmccabe/Desktop/validate/file1.txt found expected order of fields
/home/cmccabe/Desktop/validate/file1.txt R_Index is a number
/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file1.txt Quality is a character
/home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/file2.txt found expected header
/home/cmccabe/Desktop/validate/file2.txt found expected order of fields
/home/cmccabe/Desktop/validate/file2.txt R_Index is a number
/home/cmccabe/Desktop/validate/file2.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file2.txt Quality is a character
/home/cmccabe/Desktop/validate/file2.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/file3.txt found expected header
/home/cmccabe/Desktop/validate/file3.txt found expected order of fields
/home/cmccabe/Desktop/validate/file3.txt R_Index is a number
/home/cmccabe/Desktop/validate/file3.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file3.txt Quality is a character
/home/cmccabe/Desktop/validate/file3.txt HGMD and Sanger are valid
End import validation creation: Wed May 17 06:55:34 CDT 2017


Code:
#!/bin/bash
while read line; do   # read each line in input
    if  echo "$line" | grep -q "Found expected header"; then echo "LINE IS GOOD"            # read line 1 
    if  echo "$line" | grep -q "Found expected order of fields"; then echo "LINE IS GOOD"   # read line 2
    if  echo "$line" | grep -q "R_Index is a number"; then echo "LINE IS GOOD"              # read line 3
    if  echo "$line" | grep -q "PopFreqMax is valid"; then echo "LINE IS GOOD"              # read line 4
    if  echo "$line" | grep -q "Quality is a character"; then echo "LINE IS GOOD"           # read line 5
    if  echo "$line" | grep -q "HGMD and Sanger are valid"; then echo "LINE IS GOOD"        # read line 6
    fi
done < file
   file="home/cmccabe/Desktop/validate/input"   # define path to input
   string="LINE IS GOOD"                        # define string to look for in each line
   count=$(grep -c "$string" "$file")           # count string occurences
               if [[ count -gt 6 ]]; then       # if count = 6
                    echo "$string has occurred 6 times"  # string is in each file x times
                    echo "FILENAME is verified"          # specific file is verified or good
               fi                    
                 else
                    echo "FILENAME not verified"         # specific file not verified
                 fi

Description

Code:
1="Found expected header"
2="Found expected order of fields"
3="R_Index is a number"
4="PopFreqMax is valid"
5="Quality is a character"
6="HGMD and Sanger are valid"


Last edited by cmccabe; 05-17-2017 at 09:04 AM.. Reason: added description
Sponsored Links
    #2  
Old Unix and Linux 05-17-2017
RavinderSingh13 RavinderSingh13 is online now Forum Advisor  
Registered User
 
Join Date: May 2013
Last Activity: 22 July 2017, 10:48 PM EDT
Location: Chennai
Posts: 2,555
Thanks: 563
Thanked 1,208 Times in 1,087 Posts
Hello cmccabe,

It is not clear, could you please put more information to your post. Also please always show us expected sample Output too.

Thanks,
R. Singh
The Following User Says Thank You to RavinderSingh13 For This Useful Post:
cmccabe (05-17-2017)
Sponsored Links
    #3  
Old Unix and Linux 05-17-2017
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 21 July 2017, 10:53 AM EDT
Location: Chicago
Posts: 1,156
Thanks: 694
Thanked 15 Times in 14 Posts
File 1 has 6 lines in it:

Code:
Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
File 2 has 6 lines in it:

Code:
Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
File 3 has 6 lines in it:

Code:
Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
desired output

Code:
/home/cmccabe/Desktop/validate/file1.txt is verified
/home/cmccabe/Desktop/validate/file2.txt is verified
/home/cmccabe/Desktop/validate/file3.txt is verified

It is also possible that there could only be 1 or two files, but there will always be 6 lines in each file. I use FILENAME to represent each file instead of hardcoding it in.

Does this help and thank you Linux.
    #4  
Old Unix and Linux 05-17-2017
RavinderSingh13 RavinderSingh13 is online now Forum Advisor  
Registered User
 
Join Date: May 2013
Last Activity: 22 July 2017, 10:48 PM EDT
Location: Chennai
Posts: 2,555
Thanks: 563
Thanked 1,208 Times in 1,087 Posts
Hello cmccabe,

Still not 100% sure, could you please try following and let me know if this helps you.

Code:
awk 'FNR==1 && /found expected header/{VAL++} FNR==2 && /found expected order of fields/{VAL++} FNR==3 && /R_Index is a number/{VAL++} FNR==4 && /PopFreqMax is valid/{VAL++} FNR==5 && /Quality is a character/{VAL++} FNR==6 && /HGMD and Sanger are valid/{VAL++} END{if(VAL==6){print FILENAME " is verified."}}'  Input_file*

You could mention in above as file* if you have only files with digits in them else you could change the regex to file[0-9] etc depending upon your files.
EDIT: Adding a non-one liner for of solution too successfully now.

Code:
awk 'FNR==1 && /found expected header/{
                                        VAL++
                                     }
     FNR==2 && /found expected order of fields/{
                                                VAL++
                                              }
     FNR==3 && /R_Index is a number/{
                                        VAL++
                                   }
     FNR==4 && /PopFreqMax is valid/{
                                        VAL++
                                   }
     FNR==5 && /Quality is a character/{
                                        VAL++
                                      }
     FNR==6 && /HGMD and Sanger are valid/{
                                                VAL++
                                         }
     END{
                if(VAL==6){
                                print FILENAME " is verified."
                          }
        }
    '   Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 05-17-2017 at 11:33 AM..
The Following User Says Thank You to RavinderSingh13 For This Useful Post:
cmccabe (05-17-2017)
Sponsored Links
    #5  
Old Unix and Linux 05-17-2017
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 21 July 2017, 10:53 AM EDT
Location: Chicago
Posts: 1,156
Thanks: 694
Thanked 15 Times in 14 Posts
I am not sure what you mean by changing the regex, but each file is a block of 6 lines within input.

input.txt (file that has the output of the process)

Code:
 Start import validation creation: Wed May 17 06:55:34 CDT 2017 -header-
/home/cmccabe/Desktop/validate/00-0000-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0000-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0000-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0000-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0000-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0000-l,f.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/00-0001-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0001-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0001-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0001-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0001-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0001-l,f.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/00-0002-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0002-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0002-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0002-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0002-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0002-l,f.txt HGMD and Sanger are valid
End import validation creation: Wed May 17 06:55:34 CDT 2017 -footer-

individual files within input.txt (in this example there are 3, but it is possible to have only 1 or 2)


Code:
 /home/cmccabe/Desktop/validate/00-0000-l,f.txt 
 /home/cmccabe/Desktop/validate/00-0001-l,f.txt 
 /home/cmccabe/Desktop/validate/00-0002-l,f.txt


Code:
 awk 'NR==2 && /found expected header/{
                                        VAL++
                                     }
     NR==3 && /found expected order of fields/{
                                                VAL++
                                              }
     NR==4 && /R_Index is a number/{
                                        VAL++
                                   }
     NR==5 && /PopFreqMax is valid/{
                                        VAL++
                                   }
     NR==6 && /Quality is a character/{
                                        VAL++
                                      }
     NR==7 && /HGMD and Sanger are valid/{
                                                VAL++
                                         }
     END{
                if(VAL==6){
                                print FILENAME " is verified."
                          }
        }
' input.txt > verify.txt
 input.txt is verified   --- output for the input file not the individual (is this what you mean by change the regex)?

I changed the NR== to skip the header (not sure if that the best). Also would adding print FILENAME " is not verified." capture any negative results where the files did not meet the expected lines (had different values)?


desired result

Code:
 /home/cmccabe/Desktop/validate/00-0000-l,f.txt is verified
 /home/cmccabe/Desktop/validate/00-0001-l,f.txt is verified
 /home/cmccabe/Desktop/validate/00-0002-l,f.txt is verified

Thank you very much Linux.
Sponsored Links
    #6  
Old Unix and Linux 05-18-2017
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 22 July 2017, 1:54 PM EDT
Location: San Jose, CA, USA
Posts: 10,412
Thanks: 527
Thanked 3,638 Times in 3,104 Posts
Your description is rather confusing, I think the following may come closer to doing what you want:

Code:
awk '
NR == 1 {
	next
}
NR % 6 == 2 && /found expected header/ ||
NR % 6 == 3 && /found expected order of fields/ ||
NR % 6 == 4 && /R_Index is a number/ ||
NR % 6 == 5 && /PopFreqMax is valid/ ||
NR % 6 == 0 && /Quality is a character/ ||
NR % 6 == 1 && /HGMD and Sanger are valid/ {
	VAL++
}
NR % 6 == 1 {
	print " " $1, (VAL == 6) ? "is verified" : "is not verified"
	VAL = 0
}
' input.txt > verify.txt

If input.txt contains the sample input you provided in post #5 in this thread, the text produced by the above script in verify.txt exactly matches the output you said you wanted.
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Extract specific line in an html file starting and ending with specific pattern to a text file dejavo Shell Programming and Scripting 13 09-03-2014 06:54 PM
Replace the line with specific pattern ldiaz2106 Shell Programming and Scripting 5 04-09-2014 06:50 AM
Help to just print out specific line from an input file perl_beginner Shell Programming and Scripting 3 11-25-2012 04:41 AM
Insert new pattern in newline after the nth occurrence of a line pattern - Bash in Ubuntu 12.04 Phil3759 Shell Programming and Scripting 14 09-13-2012 08:05 AM
Bash Script verify user input is not empty and is equal to a value spartiati Shell Programming and Scripting 4 01-27-2010 08:46 AM



All times are GMT -4. The time now is 10:51 PM.