Bash to verify each line in input for specific pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash to verify each line in input for specific pattern
# 1  
Old 05-17-2017
Bash to verify each line in input for specific pattern

In the bash below the out put of a process is written to input. What I am trying to do is read each line in the input and verify/check it for specific text (there are always 6 lines for each file and the specific text for each line is in the description). There will always be 6 lines in each specific file in input, however the file number can vary. In this example there are 3 specific files (each color block is a file with 6 lines), but the next time there may only be two. If each line in the file is a match to description then the file is verified/good, but if it does not then the file is not.

I hope the below is a start and have commented each line. Thank you Smilie.

input
Code:
Start import validation creation: Wed May 17 06:55:34 CDT 2017
/home/cmccabe/Desktop/validate/file1.txt found expected header
/home/cmccabe/Desktop/validate/file1.txt found expected order of fields
/home/cmccabe/Desktop/validate/file1.txt R_Index is a number
/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file1.txt Quality is a character
/home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/file2.txt found expected header
/home/cmccabe/Desktop/validate/file2.txt found expected order of fields
/home/cmccabe/Desktop/validate/file2.txt R_Index is a number
/home/cmccabe/Desktop/validate/file2.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file2.txt Quality is a character
/home/cmccabe/Desktop/validate/file2.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/file3.txt found expected header
/home/cmccabe/Desktop/validate/file3.txt found expected order of fields
/home/cmccabe/Desktop/validate/file3.txt R_Index is a number
/home/cmccabe/Desktop/validate/file3.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/file3.txt Quality is a character
/home/cmccabe/Desktop/validate/file3.txt HGMD and Sanger are valid
End import validation creation: Wed May 17 06:55:34 CDT 2017

Code:
#!/bin/bash
while read line; do   # read each line in input
    if  echo "$line" | grep -q "Found expected header"; then echo "LINE IS GOOD"            # read line 1 
    if  echo "$line" | grep -q "Found expected order of fields"; then echo "LINE IS GOOD"   # read line 2
    if  echo "$line" | grep -q "R_Index is a number"; then echo "LINE IS GOOD"              # read line 3
    if  echo "$line" | grep -q "PopFreqMax is valid"; then echo "LINE IS GOOD"              # read line 4
    if  echo "$line" | grep -q "Quality is a character"; then echo "LINE IS GOOD"           # read line 5
    if  echo "$line" | grep -q "HGMD and Sanger are valid"; then echo "LINE IS GOOD"        # read line 6
    fi
done < file
   file="home/cmccabe/Desktop/validate/input"   # define path to input
   string="LINE IS GOOD"                        # define string to look for in each line
   count=$(grep -c "$string" "$file")           # count string occurences
               if [[ count -gt 6 ]]; then       # if count = 6
                    echo "$string has occurred 6 times"  # string is in each file x times
                    echo "FILENAME is verified"          # specific file is verified or good
               fi                    
                 else
                    echo "FILENAME not verified"         # specific file not verified
                 fi

Description
Code:
1="Found expected header"
2="Found expected order of fields"
3="R_Index is a number"
4="PopFreqMax is valid"
5="Quality is a character"
6="HGMD and Sanger are valid"


Last edited by cmccabe; 05-17-2017 at 10:04 AM.. Reason: added description
# 2  
Old 05-17-2017
Hello cmccabe,

It is not clear, could you please put more information to your post. Also please always show us expected sample Output too.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 05-17-2017
File 1 has 6 lines in it:
Code:
Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
File 2 has 6 lines in it:
Code:
Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
File 3 has 6 lines in it:
Code:
Line 1 is /home/cmccabe/Desktop/validate/file1.txt found expected header
Line 2 is /home/cmccabe/Desktop/validate/file1.txt found expected order of fields
Line 3 is /home/cmccabe/Desktop/validate/file1.txt R_Index is a number
Line 4 is/home/cmccabe/Desktop/validate/file1.txt PopFreqMax is valid
Line 5 is /home/cmccabe/Desktop/validate/file1.txt Quality is a character
Line 6 is /home/cmccabe/Desktop/validate/file1.txt HGMD and Sanger are valid
 Line 1 matches the expected pattern in description so "LINE IS GOOD"
Line 2 matches the expected pattern in description so "LINE IS GOOD"
Line 3 matches the expected pattern in description so "LINE IS GOOD"
Line 4 matches the expected pattern in description so "LINE IS GOOD"
Line 5 matches the expected pattern in description so "LINE IS GOOD"
Line 6 matches the expected pattern in description so "LINE IS GOOD"

Since "LINE IS GOOD" = 6 then File1 is verified (desired output), but if the pattern is anything else then the "LINE IS GOOD" will be less than 6 so the File is not verified.
desired output
Code:
/home/cmccabe/Desktop/validate/file1.txt is verified
/home/cmccabe/Desktop/validate/file2.txt is verified
/home/cmccabe/Desktop/validate/file3.txt is verified

It is also possible that there could only be 1 or two files, but there will always be 6 lines in each file. I use FILENAME to represent each file instead of hardcoding it in.

Does this help and thank you Smilie.
# 4  
Old 05-17-2017
Hello cmccabe,

Still not 100% sure, could you please try following and let me know if this helps you.
Code:
awk 'FNR==1 && /found expected header/{VAL++} FNR==2 && /found expected order of fields/{VAL++} FNR==3 && /R_Index is a number/{VAL++} FNR==4 && /PopFreqMax is valid/{VAL++} FNR==5 && /Quality is a character/{VAL++} FNR==6 && /HGMD and Sanger are valid/{VAL++} END{if(VAL==6){print FILENAME " is verified."}}'  Input_file*

You could mention in above as file* if you have only files with digits in them else you could change the regex to file[0-9] etc depending upon your files.
EDIT: Adding a non-one liner for of solution too successfully now.
Code:
awk 'FNR==1 && /found expected header/{
                                        VAL++
                                     }
     FNR==2 && /found expected order of fields/{
                                                VAL++
                                              }
     FNR==3 && /R_Index is a number/{
                                        VAL++
                                   }
     FNR==4 && /PopFreqMax is valid/{
                                        VAL++
                                   }
     FNR==5 && /Quality is a character/{
                                        VAL++
                                      }
     FNR==6 && /HGMD and Sanger are valid/{
                                                VAL++
                                         }
     END{
                if(VAL==6){
                                print FILENAME " is verified."
                          }
        }
    '   Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 05-17-2017 at 12:33 PM..
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 05-17-2017
I am not sure what you mean by changing the regex, but each file is a block of 6 lines within input.

input.txt (file that has the output of the process)
Code:
 Start import validation creation: Wed May 17 06:55:34 CDT 2017 -header-
/home/cmccabe/Desktop/validate/00-0000-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0000-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0000-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0000-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0000-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0000-l,f.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/00-0001-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0001-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0001-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0001-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0001-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0001-l,f.txt HGMD and Sanger are valid
/home/cmccabe/Desktop/validate/00-0002-l,f.txt found expected header
/home/cmccabe/Desktop/validate/00-0002-l,f.txt found expected order of fields
/home/cmccabe/Desktop/validate/00-0002-l,f.txt R_Index is a number
/home/cmccabe/Desktop/validate/00-0002-l,f.txt PopFreqMax is valid
/home/cmccabe/Desktop/validate/00-0002-l,f.txt Quality is a character
/home/cmccabe/Desktop/validate/00-0002-l,f.txt HGMD and Sanger are valid
End import validation creation: Wed May 17 06:55:34 CDT 2017 -footer-

individual files within input.txt (in this example there are 3, but it is possible to have only 1 or 2)

Code:
 /home/cmccabe/Desktop/validate/00-0000-l,f.txt 
 /home/cmccabe/Desktop/validate/00-0001-l,f.txt 
 /home/cmccabe/Desktop/validate/00-0002-l,f.txt

Code:
 awk 'NR==2 && /found expected header/{
                                        VAL++
                                     }
     NR==3 && /found expected order of fields/{
                                                VAL++
                                              }
     NR==4 && /R_Index is a number/{
                                        VAL++
                                   }
     NR==5 && /PopFreqMax is valid/{
                                        VAL++
                                   }
     NR==6 && /Quality is a character/{
                                        VAL++
                                      }
     NR==7 && /HGMD and Sanger are valid/{
                                                VAL++
                                         }
     END{
                if(VAL==6){
                                print FILENAME " is verified."
                          }
        }
' input.txt > verify.txt
 input.txt is verified   --- output for the input file not the individual (is this what you mean by change the regex)?

I changed the NR== to skip the header (not sure if that the best). Also would adding print FILENAME " is not verified." capture any negative results where the files did not meet the expected lines (had different values)?


desired result
Code:
 /home/cmccabe/Desktop/validate/00-0000-l,f.txt is verified
 /home/cmccabe/Desktop/validate/00-0001-l,f.txt is verified
 /home/cmccabe/Desktop/validate/00-0002-l,f.txt is verified

Thank you very much Smilie.
# 6  
Old 05-18-2017
Your description is rather confusing, I think the following may come closer to doing what you want:
Code:
awk '
NR == 1 {
	next
}
NR % 6 == 2 && /found expected header/ ||
NR % 6 == 3 && /found expected order of fields/ ||
NR % 6 == 4 && /R_Index is a number/ ||
NR % 6 == 5 && /PopFreqMax is valid/ ||
NR % 6 == 0 && /Quality is a character/ ||
NR % 6 == 1 && /HGMD and Sanger are valid/ {
	VAL++
}
NR % 6 == 1 {
	print " " $1, (VAL == 6) ? "is verified" : "is not verified"
	VAL = 0
}
' input.txt > verify.txt

If input.txt contains the sample input you provided in post #5 in this thread, the text produced by the above script in verify.txt exactly matches the output you said you wanted.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to cut a specific pattern from a line

Hello, I need to cut a specific pattern from a line irrespecitve of knowing field place. I am aware to cut field if you know the place of the field, but for me The sting place varies. 1468129514436,0,something_error,Non HTTP response code: java.net.URISyntaxException,Non HTTP response... (5 Replies)
Discussion started by: mirwasim
5 Replies

2. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

3. Shell Programming and Scripting

Replace the line with specific pattern

Hello All I'm trying to change one string from a file contening this patern: xxxx-xxxx 4 numbers - end 4 other numbers This is a sample of the file: LDR 00679 am a2200205 4500 =001 3617 =008 030219s2000\\\\xxx|||||\||||\00|\0\spa\d =020 \\$a0211-1942 =041 \\$aCastellà =093 ... (5 Replies)
Discussion started by: ldiaz2106
5 Replies

4. Shell Programming and Scripting

Replace string in line below specific pattern?

Hi, I'm trying to replace a string with sed, in a text file containing this pattern: location alpha value x location beta value y location gamma value y location delta value y location theta value z ... What I want to achieve is: Find location beta into text file... (1 Reply)
Discussion started by: TECK
1 Replies

5. Shell Programming and Scripting

Help to just print out specific line from an input file

Hi, I have a file which contains 2,500,500,432 lines. Can I know what command I should type in order just print out particular line from the input file? eg. I just wanna to see what is the contents at line 522,484,612. Thanks for advice. (3 Replies)
Discussion started by: perl_beginner
3 Replies

6. Shell Programming and Scripting

Insert new pattern in newline after the nth occurrence of a line pattern - Bash in Ubuntu 12.04

Hi, I am getting crazy after days on looking at it: Bash in Ubuntu 12.04.1 I want to do this: pattern="system /path1/file1 file1" new_pattern=" data /path2/file2 file2" file to edit: data.db - I need to search in the file data.db for the nth occurrence of pattern - pattern must... (14 Replies)
Discussion started by: Phil3759
14 Replies

7. Programming

Print specific pattern line in c++

Input file: @HWI-BRUNOP1_header_1 GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC +HWI-BRUNOP1_header_1 TNTTJTTTETceJSP__VRJea`_NfcefbWe Desired output file: >HWI-BRUNOP1_header_1 GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC >HWI-BRUNOP1_header_2... (10 Replies)
Discussion started by: cpp_beginner
10 Replies

8. Shell Programming and Scripting

Added new line before a specific pattern problem asking

Input file: Sample1 Type pattern 842 3150 Sample1 Type range 842 3150 Sample1 Type pattern 842 1127 Sample1 Type option 842 1127 Sample1 Type length 1483 1603 Sample1 Type pattern 1483 1603 Sample1 Type length 1698 1747 Sample1 Type option 1698 1747 Sample1 Type length 1868 1935 Sample1... (13 Replies)
Discussion started by: patrick87
13 Replies

9. Shell Programming and Scripting

Bash Script verify user input is not empty and is equal to a value

I need to create a script that has a user enter a value. I want to verify that the value is either 1,2, or 3. If it is not then I want them to try entering it again. I am using a while loop to force them to retry. I am able to test the input against 1,2, and 3, but when I test agains an... (4 Replies)
Discussion started by: spartiati
4 Replies

10. Shell Programming and Scripting

merge columns into one line after a specific pattern

Hi all, im a linux newbie, plz help! I have a file - box -------- Fox-2 -------- UF29 zip42 -------- zf-CW SNF2_N Heli_Z -------- Fox -------- Kel_1 box (3 Replies)
Discussion started by: sam_2921
3 Replies
Login or Register to Ask a Question