Data validation engine


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data validation engine
# 1  
Old 03-18-2013
Data validation engine

Generic Data validator

Data file:
Code:
Name,Sal,Dept
ABC,1234,D1
AYX,12356,D2
DHF,345,ED3
123,4565,FGJG

Config File:
Code:
Delimiter-","
Rule1-Name-[:upper:]
Rule2-Sal-[:digit:]
Rule3-Dept-[D]*

Can be used to match any regex including date different format and numbers.

Depending on the Delimiter the file would be read.
The validator would check for the columns against the regex for each column specified.
If any column record doesnt match then the row should be flagged with the related rule no for the failure

Eg.

DHF,345,ED3(doesnt begin with D),Rule3


if multiple failure then
123,4565,FGJG,Rule2;Rule

What would be the best coding language awk or perl?
Examples would be appreciated

---------- Post updated at 02:42 PM ---------- Previous update was at 02:32 PM ----------

Post validation,we might have a requirement to convert the columns to format as specifier in the config.

Config File:
Code:
Delimiter-","
Rule1-Name-[:upper:]
Rule2-Sal-[:digit:]-0.00
Rule3-Dept-[D]*

Data File Output:
Code:
Name,Sal,Dept
ABC,1234.00,D1
AYX,12356.00,D2
DHF,345,ED3,Rule3
123,4565,FGJG,Rule2;Rule3

Notice the Sal column has changed to 1234.00 from 1234

Regards,D
Dikesh Shah.
# 2  
Old 03-26-2013
Shouldn't that be '123,4565,FGJG,Rule1;Rule3'?
# 3  
Old 04-13-2013
Quote:
Originally Posted by DGPickett
Shouldn't that be '123,4565,FGJG,Rule1;Rule3'?
Yes.My typo.
# 4  
Old 04-13-2013
Quote:
Originally Posted by dikesm
Generic Data validator

Data file:
Code:
Name,Sal,Dept
ABC,1234,D1
AYX,12356,D2
DHF,345,ED3
123,4565,FGJG

Config File:
Code:
Delimiter-","
Rule1-Name-[:upper:]
Rule2-Sal-[:digit:]
Rule3-Dept-[D]*

Can be used to match any regex including date different format and numbers.

Depending on the Delimiter the file would be read.
The validator would check for the columns against the regex for each column specified.
If any column record doesnt match then the row should be flagged with the related rule no for the failure

Eg.

DHF,345,ED3(doesnt begin with D),Rule3


if multiple failure then
123,4565,FGJG,Rule2;Rule

What would be the best coding language awk or perl?
Examples would be appreciated

---------- Post updated at 02:42 PM ---------- Previous update was at 02:32 PM ----------

Post validation,we might have a requirement to convert the columns to format as specifier in the config.

Config File:
Code:
Delimiter-","
Rule1-Name-[:upper:]
Rule2-Sal-[:digit:]-0.00
Rule3-Dept-[D]*

Data File Output:
Code:
Name,Sal,Dept
ABC,1234.00,D1
AYX,12356.00,D2
DHF,345,ED3,Rule3
123,4565,FGJG,Rule2;Rule3

Notice the Sal column has changed to 1234.00 from 1234

Regards,D
Dikesh Shah.
I don't get it. Smilie

You say the 3rd field in Rule* lines in your config file is a regular expression, but none of your input fields match any of your regular expressions. (You also don't say what type of regular expression, but since you mentioned awk I'll assume that you want extended regular expression. Rule1's [:upper:] would match a single character from the set ":", "e", "p", "r", and "u"; not three uppercase characters. To match "ABC", "AYX", and "DHF", you would need an ERE something like ^[[:upper:]]{3}$. Similarly, Rule2's [:digit:] would match a single character from the set ":", "d", "g", "i", and "t". To match the values you have in field 2, you would need an ERE similar to ^[[:digit:]]+$. And, Rule3's [D]* matches every string that contains zero or more copies of the letter "D". (In other words, the ERE in Rule3 will match EVERY input string.) If you're looking for a "D" followed by a single digit or by one or more digits, you would need EREs similar to ^D[[:digit:]]$ or ^D[[:digit:]]+$, respectively. You could simplify this somewhat if you were to specify that all EREs are anchored at both ends (i.e., the "^" at the beginning of the ERE and the "$" at the end of the ERE are assumed and should not be explicitly mentioned).

Then there is the question of how your format string works. In what formatting language does the format string "0.00" transform "1234" to "1234.00"? I could understand "%d.00", "%s.00", or" %.2f", but it seems to me that "0.00" should change any input string to the string "0.00".

Also note that using "-" as the field delimiter in your Config file severely restricts the EREs and format strings users can easily specify. Do you have the ability to change the format of the Config file? If you use the same EREs and format strings that awk uses, it would be much better to use <tab> as your config file field delimiter (in awk, "\t" can be easily used in both EREs and format strings anywhere a <tab> character is needed). This is especially a problem if you ever want to match dates of the form YYYY-MM-DD.

Are there any other commands (besides "delimiter" and "Rule*") allowed in a Config file? Is there a default delimiter if a "delimiter" command is not specified in a Config file?
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Source data validation

I am using below logic to validate whether i am expecting the correct data from source,if not logic should give which column has error.i am running below logic in linux awk -F, ' NR==1{next} {f=" "} $1!~/^{0,5}$|^$/{f=f?f" emp_id-error":"emp_id-error"} $4!~/^{0,6}$|^$/{f=f?f"... (4 Replies)
Discussion started by: katakamvivek
4 Replies

2. Shell Programming and Scripting

Help With UNIX Shell Scripting For Data Validation

Hi All, I am completely new to Unix Shell Scripting. I m validating(Basic File Validation) an .HHT file in TIBCO. After that i need to do Data Validation through UNIX Shell scripting. Rules in DataValidation: 1.) Need to Check Every field wheather it is a Char or Number?(Fields are... (1 Reply)
Discussion started by: Chaitanya K
1 Replies

3. Shell Programming and Scripting

basic data validation

hpux. older version. don't have alot of the newer features in some utilities. How do I verify that a variable starts with the letter A. I would like to make it case insensitive. Also, if I have a variable that has letters and numbers. I want to change all the lower case letters to upper case.... (2 Replies)
Discussion started by: guessingo
2 Replies

4. Shell Programming and Scripting

File and Data Validation.

Hello, I am working on an interface between a legacy system and an ERP system.The format of the data extracted into the staging folder from the legacy system is a follows. One control file named ExtractDataControl.txt Multiple entity files eg R1001.txt, R1002.txt, R1020.txt The control... (5 Replies)
Discussion started by: Savio_Saldanha
5 Replies

5. Shell Programming and Scripting

Data Validation

I have a comma delimited file that I need to validate the data in one two columns in. Sample File: 1234,1234,1234,DESCRIPTION,1,1,2 1234,1234,1234,DESCRIPTION,1,1,2 1234,1234,1234,DESCRIPTION,1,1,2 1234,1234,1234,DESCRIPTION,1,1,2 I need to make sure that the second column's entries are... (3 Replies)
Discussion started by: hmnetonline
3 Replies

6. Shell Programming and Scripting

shell script data & time validation

How to validate a date and optionly a time in shell scripting when i get the date and time as pararmeters that sent out with the call of the file? (in my case sh union.sh `first parameter ,second parameter...` (4 Replies)
Discussion started by: tal
4 Replies

7. Shell Programming and Scripting

validation of data using filter (awk or other that works...) in csv files

Hello People I have the following file.csv: date,string,float,number,boolean 20080303,abc,1.5,123,Y 20080304,abc,1.2,345,N 20080229,nvh,1.4,098,Y 20080319,ugy,1.9,586,N 20080315,gyh,2.4,345,Y 20080316,erf,3.1,932,N I need to filter the date field where I have a data bigger than I... (1 Reply)
Discussion started by: Rafael.Buria
1 Replies

8. UNIX for Dummies Questions & Answers

Data Validation

Hello, I am trying to use data validation with a program. I have everything else working fine. I just can't figure out what I am doing wrong with the data validation in one of my files. Here is the code: # data validation loop while : do # get input from keyboard ... (2 Replies)
Discussion started by: ericelysia
2 Replies

9. UNIX for Advanced & Expert Users

awk data validation

Hi , This is a general doubt.... Is there any way to do data validation inside an awk script.. Let me make it more clear.... I have a string variable inside awk script .. Is there any way to check whether it is number or string etc... Thanks in advance. Shihab (1 Reply)
Discussion started by: shihabvk
1 Replies
Login or Register to Ask a Question