The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Reading a binary file in text or ASCII format Nagendra High Level Programming 3 12-03-2008 06:11 PM
To convert multi format file to a readable ascii format gaur.deepti UNIX for Dummies Questions & Answers 5 03-25-2008 03:03 PM
how to number format a data file without using SED? Cactus Jack UNIX for Dummies Questions & Answers 3 01-12-2008 07:47 PM
how to check a file to contain only ascii charaters srichakra9 UNIX for Dummies Questions & Answers 1 10-06-2006 01:26 PM
Converting Tab delimited file to Comma delimited file in Unix charan81 Shell Programming and Scripting 22 01-20-2006 09:24 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-25-2007
Mandab Mandab is offline
Registered User
  
 

Join Date: Apr 2007
Posts: 22
Check whether a given file is in ASCII format and data is tab-delimited

Hi All,

Please help me out with a script which checks whether a given file say abc.txt is in ASCII format and data is tab-delimited. If the condition doesn't satisfy then it should generate error code "100" for file not in ASCII format and "105" if it is not in tab-delimited format.
If the above condition satisfies it should check whether field 1 datatype and length(numeric(9)) are same or not. If not error "101" and field 2, field 3 and field 5 (which are of date data type) have data in date format or not. If the data is not in date format(yyyymmdd) or null, then it should generate an error code 112 if field 2 is not in date format or null and 113 if field 3 is not in date format or null etc., If the field is null then it should generate an error code say 150.

Data starts from 2nd line as first line contains filename,filesize and record count.

sample file: abc.txt
row 1 : abc.txt0824673850572854
row 2 : 545689512<tab>20070424<tab>20070414<tab>456.25<tab>20061121<tab>pqr
row 3 : 602584561<tab>20060726<tab>20060524<tab>800.12<tab><tab>abc
row 4 : 24<tab><tab>05242006<tab>22.15<tab>20050815<tab>xyz
.
.
.
row n : 57<tab>20040425<tab>20041214<tab>486.75<tab>20040628<tab>stv
  #2 (permalink)  
Old 04-25-2007
srikanthus2002's Avatar
srikanthus2002 srikanthus2002 is offline
Registered User
  
 

Join Date: Sep 2006
Location: Can u guess...!
Posts: 160
there is a command in unix

Code:
file <file_name>

# it shows file type...
  #3 (permalink)  
Old 04-25-2007
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,365
Quote:
Originally Posted by Mandab
Hi All,

Please help me out with a script which checks whether a given file say abc.txt is in ASCII format and data is tab-delimited. If the condition doesn't satisfy then it should generate error code "100" for file not in ASCII format

What do you mean by "ASCII format"? Do you mean a file in which no bytes have the top bit set (i.e., all are values less than 128)?

Or do you mean it only contains printable characters?

Quote:
and "105" if it is not in tab-delimited format.
If the above condition satisfies it should check whether field 1 datatype and length(numeric(9)) are same or not. If not error "101" and field 2, field 3 and field 5 (which are of date data type) have data in date format or not. If the data is not in date format(yyyymmdd)

That is not a date format; that is an integer, and if it happens to contain a date, how are you supposed to tell? You should use the standard date format, YYYY-MM-DD.

Quote:
or null, then it should generate an error code 112 if field 2 is not in date format or null and 113 if field 3 is not in date format or null etc., If the field is null then it should generate an error code say 150.

Data starts from 2nd line as first line contains filename,filesize and record count.

sample file: abc.txt
row 1 : abc.txt0824673850572854
row 2 : 545689512<tab>20070424<tab>20070414<tab>456.25<tab>20061121<tab>pqr
row 3 : 602584561<tab>20060726<tab>20060524<tab>800.12<tab><tab>abc
row 4 : 24<tab><tab>05242006<tab>22.15<tab>20050815<tab>xyz
.
.
.
row n : 57<tab>20040425<tab>20041214<tab>486.75<tab>20040628<tab>stv


Code:
awk 'BEGIN { IFS = "\t" }
 NR == 1 { next }  ## ignore first line
 !/\t/ { exit 105 }  ## line doesn't contain a tab
 length($1) != 9 || $1 ~ /[^0-9]/ { exit 101 }
  {
     n = 2
     while ( n <= NF ) {
        if ( length($n) == 0 || $n ~ /[^0-9]/ ) exit 110 + n
        ## add other tests if desired
     }
  }
}'

  #4 (permalink)  
Old 04-25-2007
Mandab Mandab is offline
Registered User
  
 

Join Date: Apr 2007
Posts: 22
Thank you cfajohnson for your quick response, I'll confirm you what ASCII format means. For now I know that my script should check whether a file is in ASCII format or not. Regarding date format, my requirement is to match for data type and the length. The value I'll be getting is 20070425 and the data type is date then how do I check it? Is it not possible to check for the date data type if data comes as yyyymmdd?
  #5 (permalink)  
Old 05-01-2007
Mandab Mandab is offline
Registered User
  
 

Join Date: Apr 2007
Posts: 22
Hi,
I am trying to execute the following script but I am getting error:
My requirement is to check whether the data in the file is tab delimited and pass error as say "105" to var1 and desc as "not tab delimited" to var2 and also check for the data which starts from 3rd line of the file. If the above condition satisfies it should check whether field 1 datatype and length(numeric(9)) are same or not and also whether it is null. If not var1 = "101" and var2 desc "Missing/wrong field1", field 2 datatype and length(char(9)) are same or not also for null. if not then var1 ="102" var2 "Missing/wrong field2" and so on. Any help would be appreciated.


Here is the code:
#!/bin/ksh
eval $(awk 'BEGIN { IFS = "\t" }
NR>=3 {print $1}
!/\t/ ## check whether lines contain tab else var1="105" and var2="No Tabs"
{
if ( length($1) == 0 || $1 !~ /[^0-9]/ ) ## check for null and numeric value and length(9)
then
var1="101"
var2="Missing or wrong First Field"
elif ( length($2) == 0 || $2 !~ /[a-zA-Z]/ ) ## check for null and char value and length(9)
then
var1="102"
var2="Missing or Wrong Second Field"
fi
}
}' $1)

echo "$var1"
echo "$var2"
  #6 (permalink)  
Old 05-01-2007
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,365
Quote:
Originally Posted by Mandab
Hi,
I am trying to execute the following script but I am getting error:

What is the error?
Quote:
My requirement is to check whether the data in the file is tab delimited and pass error as say "105" to var1 and desc as "not tab delimited" to var2 and also check for the data which starts from 3rd line of the file. If the above condition satisfies it should check whether field 1 datatype and length(numeric(9)) are same or not and also whether it is null. If not var1 = "101" and var2 desc "Missing/wrong field1", field 2 datatype and length(char(9)) are same or not also for null. if not then var1 ="102" var2 "Missing/wrong field2" and so on. Any help would be appreciated.


Here is the code:

If it's code, please put it inside [CODE] tags so that it is properly formatted.
Quote:
#!/bin/ksh
eval $(awk 'BEGIN { IFS = "\t" }

What is the ouput of the awk script that you expect to eval?

In order to use eval, you need to output valid shell code.
Quote:
NR>=3 {print $1}
!/\t/ ## check whether lines contain tab else var1="105" and var2="No Tabs"
{
if ( length($1) == 0 || $1 !~ /[^0-9]/ ) ## check for null and numeric value and length(9)

You haven't checked that the length is 9. You have checked that it is not empty and that it doesn't contain any numbers.
Quote:
then

That is not awk syntax.
Quote:
var1="101"
var2="Missing or wrong First Field"
elif ( length($2) == 0 || $2 !~ /[a-zA-Z]/ ) ## check for null and char value and length(9)

There is no 'then', 'elif', or 'fi' keyword in awk.

You still haven't (even after the syntax is fixed) checked that the length is 9. You have checked that it is not empty and that it doesn't contain any letters.
Quote:
then
var1="102"
var2="Missing or Wrong Second Field"
fi
}
}' $1)

echo "$var1"
echo "$var2"

I suggest that you start with the code I posted, and tell us what it lacks. (Reply directly to that post, quoting relevant segments.)

  #7 (permalink)  
Old 05-02-2007
Mandab Mandab is offline
Registered User
  
 

Join Date: Apr 2007
Posts: 22
I am totally confused now.
Since I am a newbie and wrote the above script with the help of this forum.
I'll get a file which is tab delimited and from 3rd line onwards it has data. First field is numeric(9) not null and second field is char(8) not null, third field is numeric(9) null and fourth field is (13) not null. My requirement is first to check whether it is in tab-delimited format or not. If it is not then generate error and put it in var1 "101" and var2="Not in tab-delimited format" and if it is in tab-delimited format then check whether first field datatype and length and also for not null value, if doesn't match then var1 "110" and var2="Mismatch/Wrong Field one" if matches then check second field and put var1= "120" and var2= "Mismatch/Wrong Field two" and so on. I want to use var1 and var2 to be used for other computation. Whatever comments you have written above have gone over my head. Please help me.
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 09:57 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0