12-18-2006
Better way to Validate column data in file.
I am trying to validate the third column in a pipe delimited file.
The column must be 10 char long and all digits 0-9.
I am writing out two new files from the existing file, if it would be quicker, I could leave the bad rows in the file and ignore them in the next process.
What I have is working, but is taking a long time to run.
There are over 1,000,000 rows in the file and the current code has taken 1hr 40 min to process 230,000 rows.
Main part of program.
while read line
do
echo "${line}" | awk -F"|" '{print $3 }' | read emplid
badrow="N"
# remove echo commands
if [ ${badrow} = "N" ]
then
if [ ${#emplid} -ne 10 ]
then
badrow="Y"
# echo " Bad emplid length = ${emplid}"
fi
fi
if [ ${badrow} = "N" ]
then
nonums="$(echo ${emplid} | sed 's/[0-9]//g')"
if [ ! -z "$nonums" ]
then
badrow="Y"
# echo " Bad emplid numeric = ${emplid} "
else
badrow="N"
# echo " Good emplid = ${emplid} "
fi
fi
#
# If badrow = N then write to bad file if good then write to good file.
#
# increment new counters
if [ ${badrow} = "N" ]
then
# Write good row to file.
echo "${line}" >> ${good_record}
let good_recno=${good_recno}+1
else
# Write bad row to file.
echo "${line}" >> ${bad_record}
let bad_recno=${bad_recno}+1
fi
let recno=${recno}+1
done < ${incomingpathfile}
***
Example data
0001010101|TT10101|0000011111|More data and delimiters
1001010101|SS10101|0000022222|More data and delimiters
2001010101|RR10101| 00022222|More data and delimiters
2001010101|QQ10101|O000033333|More data and delimiters
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Dear guru's,
I am learning shellscripting and now I 'm struggeling with this problem:
When the number in the left column is equal or higer then 200, I want to send an email to "postmaster" @ the corresponding domain in the right column.
220 shoes.com
217 dishwashers.net
209 ... (11 Replies)
Discussion started by: algernonz
11 Replies
2. Shell Programming and Scripting
Hi All,
I have two files one of which having some mobile numbers and corresponding value whose sample content as follows:
9058629605,8.0
9122828964,30.0
And in second file complete details of all mobile numbers and sample content as follows and delimeter used is comma(,):
... (8 Replies)
Discussion started by: poweroflinux
8 Replies
3. UNIX for Dummies Questions & Answers
I have a software which generates excel report with some specific data. The excel file format is .xls (old 2003 format)
The data are in the forms like differenct cells contains numeric, string and alphanumeric data.
The data per cell for specific input data is fixed.
I need to retrive specific... (11 Replies)
Discussion started by: PratLinux
11 Replies
4. Shell Programming and Scripting
Can anyone please help with this? I have 2 files as given below.
If 2nd column of file1 has pattern foo1@a, find the matching 1st column in file2 & replace 2nd column of file1 with file2's value.
file1
abc_1 foo1@a ....
abc_1 soo2@a ...
def_2 soo2@a ....
def_2 foo1@a ........ (7 Replies)
Discussion started by: prashali
7 Replies
5. Shell Programming and Scripting
Hello experts,
Please help me in achieving this in an easier way possible. I have 2 csv files with following data:
File1
08/23/2012 12:35:47,JOB_5330
08/23/2012 12:35:47,JOB_5330
08/23/2012 12:36:09,JOB_5340
08/23/2012 12:36:14,JOB_5340
08/23/2012 12:36:22,JOB_5350
08/23/2012... (5 Replies)
Discussion started by: asnandhakumar
5 Replies
6. Shell Programming and Scripting
Hi,
I have a data file with :
01/28/2012,1,1,98995
01/28/2012,1,2,7195
01/29/2012,1,1,98995
01/29/2012,1,2,7195
01/30/2012,1,1,98896
01/30/2012,1,2,7083
01/31/2012,1,1,98896
01/31/2012,1,2,7083
02/01/2012,1,1,98896
02/01/2012,1,2,7083
02/02/2012,1,1,98899
02/02/2012,1,2,7083
I... (1 Reply)
Discussion started by: himanish
1 Replies
7. Shell Programming and Scripting
Hi,
I am looking for a ready shell script that can help in loading and validating a high volume (around 4 GB) .Dat file . The data in the file has to be validated at each of its column, like the data constraint on each of the data type on each of its 60 columns and also a few other constraints... (2 Replies)
Discussion started by: Guruprasad
2 Replies
8. Shell Programming and Scripting
Hello,
I have this file outputData:
# cat /tmp/outputData
__Capacity^6^NBSC01_Licences^L3_functionality_for_ESB_switch
__Capacity^2100^NBSC01_Licences^Gb_over_IP
__Capacity^1837^NBSC01_Licences^EDGE_BSS_Fnc
__Capacity^1816^NBSC01_Licences^GPRS_CS3_and_CS4... (1 Reply)
Discussion started by: nypreH
1 Replies
9. Shell Programming and Scripting
The below bash is a file validation check executed that will verify the correct header count of 10 and the correct data type in each field of the tab-delimited file. The key has the data type of each field in it. My real data has 58 headers in it but only the header and next row need to be... (6 Replies)
Discussion started by: cmccabe
6 Replies
10. UNIX for Beginners Questions & Answers
Source Code of the original script is down below please run the script and try to solve this problem
this is my data and I want it column wise
2019-03-20 13:00:00:000
2019-03-20 15:00:00:000
1
Operating System
LAB
0
1
1
1
1
1
1
1
1
1
0
1 (5 Replies)
Discussion started by: Shubham1182
5 Replies
PSC(1) General Commands Manual PSC(1)
NAME
psc - prepare sc files
SYNOPSIS
psc [-fLkrSPv] [-s cell] [-R n] [-C n] [-n n] [-d c]
DESCRIPTION
Psc is used to prepare data for input to the spreadsheet calculator sc(1). It accepts normal ascii data on standard input. Standard out-
put is a sc file. With no options, psc starts the spreadsheet in cell A0. Strings are right justified. All data on a line is entered on
the same row; new input lines cause the output row number to increment by one. The default delimiters are tab and space. The column for-
mats are set to one larger than the number of columns required to hold the largest value in the column.
OPTIONS
-f Omit column width calculations. This option is for preparing data to be merged with an existing spreadsheet. If the option is not
specified, the column widths calculated for the data read by psc will override those already set in the existing spreadsheet.
-L Left justify strings.
-k Keep all delimiters. This option causes the output cell to change on each new delimiter encountered in the input stream. The
default action is to condense multiple delimiters to one, so that the cell only changes once per input data item.
-r Output the data by row first then column. For input consisting of a single column, this option will result in output of one row
with multiple columns instead of a single column spreadsheet.
-s cell
Start the top left corner of the spreadsheet in cell. For example, -s B33 will arrange the output data so that the spreadsheet
starts in column B, row 33.
-R n Increment by n on each new output row.
-C n Increment by n on each new output column.
-n n Output n rows before advancing to the next column. This option is used when the input is arranged in a single column and the
spreadsheet is to have multiple columns, each of which is to be length n.
-d c Use the single character c as the delimiter between input fields.
-P Plain numbers only. A field is a number only when there is no imbedded [-+eE].
-S All numbers are strings.
-v Print the version of psc
SEE ALSO
sc(1)
AUTHOR
Robert Bond
PSC 7.16 19 September 2002 PSC(1)