what is the better way to validate records in a file.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting what is the better way to validate records in a file.
# 1  
Old 10-15-2012
Bug what is the better way to validate records in a file.

hi all,

We are checking for the delimited file records validation

Delimited file will have data like this:
Code:
Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg|
Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg|
Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg|
Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg|

So we are checking for where the records of files we got is having validating length or not.

The structer of file/table will be configured in Teradata, we will fetch the column length from tht file.
ex:
Code:
col1 varchar(5),
col2varchar(5),
col3varchar(5),
col4 varchar(5)

we hav to check all columns have field length not greater than 5 if its then we will write the hole error record to bad file.

In the script col_nm col_order_num col_len
col_nm =column name
col_order_num =oder number will be order of column in tht table….it will be 1 2 3….like tht
col_len=length of the column
Code:
#------------------------------------------
#  Reading through the file and checking for the column length
#----------------------------------------------------
                logNote "Reading through the temp file and and checking for the column length"
 
                while read col_nm col_order_num col_len
                do
                                typeset -i col_len
                                typeset -i col_len_good
 
                                col_len_good=`expr $col_len + 1`
 
                                logNote "col_nm : $col_nm"
                                logNote "col_order_num : $col_order_num"
                                logNote "col_len : $col_len"
                                logNote "col_len_good : $col_len_good"
 
                                awk 'BEGIN{col_ord='$col_order_num';col_l='$col_len'}{FS="|"}{if (length($col_ord) > col_l) print $0;}'  $Src_File >> $Src_File.bad
 
                                awk 'BEGIN{col_ord='$col_order_num';col_l='$col_len_good'}{FS="|"}{if (length($col_ord) < col_l) print $0;}'  $Src_File > $Src_File.temp
 
                                rm -f $Src_File
                                mv $Src_File.temp $Src_File
 
                done <$RPT_FILE

================================
we are using this script but its very slow in validating, preformance is very slow
can amy ione come up with soem better way plzs.

Last edited by Scrutinizer; 10-15-2012 at 01:40 AM.. Reason: code tags
# 2  
Old 10-15-2012
You could try to do it all in awk, that would speed up things. For example:
Code:
awk '
  NR==FNR{
    W[$2]=$3
    next
  }
  {
    for(i in W)
      if(length($i)>W[i]){
        print > "file.bad"
        next
      }
  }
  1
' FS='[^0-9]*' colwidthfile FS=\| file

# 3  
Old 10-15-2012
Try like...
Code:
awk -F\| 'length($1)<=4 && length($2)<=4 && length($3)<=4 && length($4)<=4' test.txt

# 4  
Old 10-16-2012
i am getting table definition(cloumn size) from database which i hav to fetch ..i am giving the values!!!@bmk

---------- Post updated at 02:51 PM ---------- Previous update was at 02:45 PM ----------

@Scrutinizer can u explain me the code plzs
# 5  
Old 10-16-2012
Sure:

Code:
awk '
  NR==FNR{                                 # When the first file is being read (only then are FNR and NR equal)
    W[$2]=$3                               # create an (associative) array element for the column widths with the second 
                                           # field as the index using the Field separator (FS) (see below)
    next                                   # Proceed to the next record
  }
  {
    for(i in W)                            # for every line in the second file, for every column in array W
      if(length($i)>W[i]){                 # if the length of the corresponding field is more than the max column width then
        print > "file.bad"                 # print that record of the second file to "file.bad"
        next                               # Proceed to the next record
      }
  }
  1                                        # If there are no fields with more characters than the max column width then print the record..
' FS='[^0-9]*' colwidthfile FS=\| file     # Set FS to any sequence of non-digits for the first file. Set it to "|" for the second file.

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Validate csv file

Hi guys, i want to validate the no.of colums in a csv file ,but if there is a comma(,) in any of the data values it should skip and count only valid (,) commas. e.g 1.abc,pqrs,1234,567,hhh result :4 2.abc,pqrs,1234,567,hhh,"in,valid",end12 result:6 here script should skip the comma inside... (10 Replies)
Discussion started by: harry123
10 Replies

2. Shell Programming and Scripting

Separate records of a file on 2 types of records

Hi I am new to shell programming in unix Please if I can provide help. I have a file structure of a header record and "N" detail records. The header record will be the total number of detail records I need to split the file in 2: One for the header Another for all detail records Could... (1 Reply)
Discussion started by: jamcogar
1 Replies

3. UNIX for Advanced & Expert Users

Wanted best way to validate delimited file records

actually i post about this issue before but many folkz miss-understood with my quesion, We are checking for the delimited file records validation Delimited file will have data like this: Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg| Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg|... (3 Replies)
Discussion started by: Seshendranath
3 Replies

4. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

5. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

6. Shell Programming and Scripting

Validate the file

How do we validate the header file. The file number should increament by 1 (position 17 to 19) if not abend the process. first week ABC0001 20100101123 second week ABC0001 20100108124 Third week ABC0001 20100115125 (7 Replies)
Discussion started by: zooby
7 Replies

7. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies

8. Shell Programming and Scripting

validate against a file

Hello all, I am having problem in writing a if condition for the following: I have a file Instance.dat which has: #Server Environment server1 dev server2 dev server3 sit #!/bin/ksh ENV=dev for i in $( cat Instances.dat | grep -v '#' |awk {'print $2'} ) do if ]... (7 Replies)
Discussion started by: chiru_h
7 Replies

9. Shell Programming and Scripting

How to validate a CSV file?

Hi. I think some people have already asked this, but the answers/questions seem to be about validating the contents inside a CSV file. I am simply after a simple variable solution (ie 0 = false, 1 = true) that I can use in my script to say that file so-and-so is actually a CSV file, or in some... (4 Replies)
Discussion started by: ElCaito
4 Replies

10. Shell Programming and Scripting

validate the file name

write a shell script that check file name like pstat_24.txt (up to 5 digits) i mean to say this digit can be range from 1 to 99999 only correct file name are pstat_10000.txt pstat_12345.txt pstat_14569.txt wrong file name are pstat_1234567.txt pstat_1a2345.txt... (2 Replies)
Discussion started by: maykap100
2 Replies
Login or Register to Ask a Question