Unix File Validation! Help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Unix File Validation! Help
# 1  
Old 10-23-2008
Unix File Validation! Help

Hi All,

I got a file with 3 fields delimited by hyphen "-". I have to validate & cleanse the data before i begine the processing

Requirements

1. No record should contain more than 2 delimiters
2. No record should even contain less than 2 delimiters
3. Any records that matches rule 1 & 2 should be captured to
bad_records.dat
4. Delete records satisfying rules 1 & 2 from original incoming data file


Sample Format

input.dat
COL1-COL2-COL3
scott-2000-10
tiger-1000-20
c-bill-1000-30
mike20-1000

so after the validation&cleansing process, the data should appear like below

input.dat
scott-2000-10
tiger-1000-20

bad_records.dat
c-bill-1000-30
mike20-1000

Please note i can't use perl progressive scanning. I need to achieve this via korn shell script


Thanks guys
# 2  
Old 10-23-2008
Hammer & Screwdriver

Here is a process to think about

Code:
read thru file
  do
  read line
  tr "-" "\n" to change to rows
  then check row count
  if row count = 3
    write record to good_file
    else write record to bad file
done

See where you can go with that
# 3  
Old 10-23-2008
Thanks Joey

Thats really a fast response. Ideally this should work. Any other ideas to make this logic fast enough to process a 5 million records with 30 columns in a minimum time of 1-3 mins
# 4  
Old 10-23-2008
How about using grep and grep -v, might be something like below can work
grep "^[a-z0-9]*-[a-z0-9]*-[0-9]*$" <inputfile> >goodFile
grep -v "^[a-z0-9]*-[a-z0-9]*-[0-9]*$" <inputfile> >badFile
# 5  
Old 10-23-2008
Thanks Avis,

I will try that too tommorow, Now taking this to one step further

My actual delimiter is non ascii character. It is from extended ascii set.

My input file is delimited by "Ç". i.e. CHR(199).

Will that matter to grep or sed as replied above?
# 6  
Old 10-24-2008
Code:
nawk -F"-" '{
if(NF!=3)
	print $0 >> "bad.txt"
else
	print
}' file

# 7  
Old 10-24-2008
Joey,

I have tried your logic. It works but involves huges amount of processing & time while processing millions of records.

#!bin/ksh
date
delimiter_char=","
last_field_num=6
file_name=2.txt
col_count=$last_field_num
while read line; do
act_count=`echo $line | tr ${delimiter_char} "\n" | wc -l | awk '{print $1}'`
if [ $col_count -ne $act_count ]; then
echo $line >> ${file_name%.txt}.bad
else
echo $line >> ${file_name%.txt}.good
fi
done < ${file_name}
---------------------------------------------------------------

Avis, Your logic works fine, But i couldn't use it either. Coz i couldn't make UDF out of it. Say i have many files, Number of columns vary

Yours is good at performance
grep -e "^[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*$" 2.txt > 3.good &
pid_good=$!
grep -v "^[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*$" 2.txt > 3.bad &
pid_bad=$!
wait $pid_good
wait $pid_bad

Note: Worked on a CSV File. Any unicode character might appear in my data except the delimiter. ;-)


Any other ideas??????
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

PAN card validation using UNIX

Please help me to validate PAN card using unix ---------- Post updated at 03:06 AM ---------- Previous update was at 02:21 AM ---------- 1) The first three letters are sequence of alphabets from AAA to zzz 2) The fourth character informs about the type of holder of the Card. Each assesse... (1 Reply)
Discussion started by: suryanarayana
1 Replies

2. Shell Programming and Scripting

Help With UNIX Shell Scripting For Data Validation

Hi All, I am completely new to Unix Shell Scripting. I m validating(Basic File Validation) an .HHT file in TIBCO. After that i need to do Data Validation through UNIX Shell scripting. Rules in DataValidation: 1.) Need to Check Every field wheather it is a Char or Number?(Fields are... (1 Reply)
Discussion started by: Chaitanya K
1 Replies

3. UNIX for Dummies Questions & Answers

Unix date validation

Dears, I am working on a batch that processes file with name containing date prefix eg., 20101222_file.dat. The logic is to process files in order. Eg., 20101225 must be processed only after 20101222. Ok first glance it looked simple, it use a variable to check this date value as number and... (2 Replies)
Discussion started by: naraink
2 Replies

4. Shell Programming and Scripting

Date Validation in unix

I have a script which is take date as parameter sh abc.sh <2010-02-01> #!/sh/bin my_date=$1 #Here i want to two diffrent dates ## 3 Days before ##date14query=$mydate - 4 (it will be 2010-01-28) ##date24query=$mydate +4 (it will be 2010-01-05) #Please Help (3 Replies)
Discussion started by: pritish.sas
3 Replies

5. UNIX for Advanced & Expert Users

Unix File Validation! Help

Hi All, I got a file with 3 fields delimited by hyphen "-". I have to validate & cleanse the data before i begine the processing Requirements 1. No record should contain more than 2 delimiters 2. No record should even contain less than 2 delimiters 3. Any records that matches rule 1 &... (3 Replies)
Discussion started by: minnuverma
3 Replies

6. UNIX for Dummies Questions & Answers

Unix File Validation! Help

Hi All, I got a file with 3 fields delimited by hyphen "-". I have to validate & cleanse the data before i begine the processing Requirements 1. No record should contain more than 2 delimiters 2. No record should even contain less than 2 delimiters 3. Any records that matches rule 1 &... (1 Reply)
Discussion started by: minnuverma
1 Replies

7. UNIX for Dummies Questions & Answers

validation required in unix

is there any way to check null data against some of the column in file My file have such structure 1,,4,SUMISHO ,SMG110880 ,1,12,SUMISHO CAPITAL MANAGEMENT (SINGAPORE) PTE LTD ,ACCT01,20080531,2008,5,30,20080630,1,1,TXGRP ,CGST ,1,74,5.18,74,0,5.18... (2 Replies)
Discussion started by: u263066
2 Replies

8. Shell Programming and Scripting

UNIX script Validation

Hi, I have a UNIX script which has two parts: 1. It connects to a database and refreshes a materialized view 2. It then connects to another database and inserts refresh statistics to a table The script works, but I'm not too good at UNIX validation. Currently, if the first part of the job... (1 Reply)
Discussion started by: matchey
1 Replies

9. UNIX for Dummies Questions & Answers

How Can I Do Time Validation in UNIX

I am very new to scripting in UNIX and in need of help. I am creating a program that will check a file that has a target time in the form of HH:MM:SS before another program can begin executing. The file with the target time will only have that target time in it and nothing else. Is there any... (4 Replies)
Discussion started by: mosammey
4 Replies

10. Shell Programming and Scripting

Time Validation in UNIX?

I am very new to scripting in UNIX and in need of help. I am creating a program that will check a file that has a target time in the form of HH:MM:SS before another program can begin executing. The file with the target time will only have that target time in it and nothing else. Is there any way... (1 Reply)
Discussion started by: mosammey
1 Replies
Login or Register to Ask a Question