Check whether a given file is in ASCII format and data is tab-delimited


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Check whether a given file is in ASCII format and data is tab-delimited
# 1  
Old 04-25-2007
Check whether a given file is in ASCII format and data is tab-delimited

Hi All,

Please help me out with a script which checks whether a given file say abc.txt is in ASCII format and data is tab-delimited. If the condition doesn't satisfy then it should generate error code "100" for file not in ASCII format and "105" if it is not in tab-delimited format.
If the above condition satisfies it should check whether field 1 datatype and length(numeric(9)) are same or not. If not error "101" and field 2, field 3 and field 5 (which are of date data type) have data in date format or not. If the data is not in date format(yyyymmdd) or null, then it should generate an error code 112 if field 2 is not in date format or null and 113 if field 3 is not in date format or null etc., If the field is null then it should generate an error code say 150.

Data starts from 2nd line as first line contains filename,filesize and record count.

sample file: abc.txt
row 1 : abc.txt0824673850572854
row 2 : 545689512<tab>20070424<tab>20070414<tab>456.25<tab>20061121<tab>pqr
row 3 : 602584561<tab>20060726<tab>20060524<tab>800.12<tab><tab>abc
row 4 : 24<tab><tab>05242006<tab>22.15<tab>20050815<tab>xyz
.
.
.
row n : 57<tab>20040425<tab>20041214<tab>486.75<tab>20040628<tab>stv
# 2  
Old 04-25-2007
there is a command in unix
Code:
file <file_name>

# it shows file type...
# 3  
Old 04-25-2007
Quote:
Originally Posted by Mandab
Hi All,

Please help me out with a script which checks whether a given file say abc.txt is in ASCII format and data is tab-delimited. If the condition doesn't satisfy then it should generate error code "100" for file not in ASCII format

What do you mean by "ASCII format"? Do you mean a file in which no bytes have the top bit set (i.e., all are values less than 128)?

Or do you mean it only contains printable characters?

Quote:
and "105" if it is not in tab-delimited format.
If the above condition satisfies it should check whether field 1 datatype and length(numeric(9)) are same or not. If not error "101" and field 2, field 3 and field 5 (which are of date data type) have data in date format or not. If the data is not in date format(yyyymmdd)

That is not a date format; that is an integer, and if it happens to contain a date, how are you supposed to tell? You should use the standard date format, YYYY-MM-DD.

Quote:
or null, then it should generate an error code 112 if field 2 is not in date format or null and 113 if field 3 is not in date format or null etc., If the field is null then it should generate an error code say 150.

Data starts from 2nd line as first line contains filename,filesize and record count.

sample file: abc.txt
row 1 : abc.txt0824673850572854
row 2 : 545689512<tab>20070424<tab>20070414<tab>456.25<tab>20061121<tab>pqr
row 3 : 602584561<tab>20060726<tab>20060524<tab>800.12<tab><tab>abc
row 4 : 24<tab><tab>05242006<tab>22.15<tab>20050815<tab>xyz
.
.
.
row n : 57<tab>20040425<tab>20041214<tab>486.75<tab>20040628<tab>stv

Code:
awk 'BEGIN { IFS = "\t" }
 NR == 1 { next }  ## ignore first line
 !/\t/ { exit 105 }  ## line doesn't contain a tab
 length($1) != 9 || $1 ~ /[^0-9]/ { exit 101 }
  {
     n = 2
     while ( n <= NF ) {
        if ( length($n) == 0 || $n ~ /[^0-9]/ ) exit 110 + n
        ## add other tests if desired
     }
  }
}'

# 4  
Old 04-25-2007
Thank you cfajohnson for your quick response, I'll confirm you what ASCII format means. For now I know that my script should check whether a file is in ASCII format or not. Regarding date format, my requirement is to match for data type and the length. The value I'll be getting is 20070425 and the data type is date then how do I check it? Is it not possible to check for the date data type if data comes as yyyymmdd?
# 5  
Old 05-01-2007
Hi,
I am trying to execute the following script but I am getting error:
My requirement is to check whether the data in the file is tab delimited and pass error as say "105" to var1 and desc as "not tab delimited" to var2 and also check for the data which starts from 3rd line of the file. If the above condition satisfies it should check whether field 1 datatype and length(numeric(9)) are same or not and also whether it is null. If not var1 = "101" and var2 desc "Missing/wrong field1", field 2 datatype and length(char(9)) are same or not also for null. if not then var1 ="102" var2 "Missing/wrong field2" and so on. Any help would be appreciated.


Here is the code:
#!/bin/ksh
eval $(awk 'BEGIN { IFS = "\t" }
NR>=3 {print $1}
!/\t/ ## check whether lines contain tab else var1="105" and var2="No Tabs"
{
if ( length($1) == 0 || $1 !~ /[^0-9]/ ) ## check for null and numeric value and length(9)
then
var1="101"
var2="Missing or wrong First Field"
elif ( length($2) == 0 || $2 !~ /[a-zA-Z]/ ) ## check for null and char value and length(9)
then
var1="102"
var2="Missing or Wrong Second Field"
fi
}
}' $1)

echo "$var1"
echo "$var2"
# 6  
Old 05-01-2007
Quote:
Originally Posted by Mandab
Hi,
I am trying to execute the following script but I am getting error:

What is the error?
Quote:
My requirement is to check whether the data in the file is tab delimited and pass error as say "105" to var1 and desc as "not tab delimited" to var2 and also check for the data which starts from 3rd line of the file. If the above condition satisfies it should check whether field 1 datatype and length(numeric(9)) are same or not and also whether it is null. If not var1 = "101" and var2 desc "Missing/wrong field1", field 2 datatype and length(char(9)) are same or not also for null. if not then var1 ="102" var2 "Missing/wrong field2" and so on. Any help would be appreciated.


Here is the code:

If it's code, please put it inside [CODE] tags so that it is properly formatted.
Quote:
#!/bin/ksh
eval $(awk 'BEGIN { IFS = "\t" }

What is the ouput of the awk script that you expect to eval?

In order to use eval, you need to output valid shell code.
Quote:
NR>=3 {print $1}
!/\t/ ## check whether lines contain tab else var1="105" and var2="No Tabs"
{
if ( length($1) == 0 || $1 !~ /[^0-9]/ ) ## check for null and numeric value and length(9)

You haven't checked that the length is 9. You have checked that it is not empty and that it doesn't contain any numbers.
Quote:
then

That is not awk syntax.
Quote:
var1="101"
var2="Missing or wrong First Field"
elif ( length($2) == 0 || $2 !~ /[a-zA-Z]/ ) ## check for null and char value and length(9)

There is no 'then', 'elif', or 'fi' keyword in awk.

You still haven't (even after the syntax is fixed) checked that the length is 9. You have checked that it is not empty and that it doesn't contain any letters.
Quote:
then
var1="102"
var2="Missing or Wrong Second Field"
fi
}
}' $1)

echo "$var1"
echo "$var2"

I suggest that you start with the code I posted, and tell us what it lacks. (Reply directly to that post, quoting relevant segments.)

# 7  
Old 05-02-2007
I am totally confused now.
Since I am a newbie and wrote the above script with the help of this forum.
I'll get a file which is tab delimited and from 3rd line onwards it has data. First field is numeric(9) not null and second field is char(8) not null, third field is numeric(9) null and fourth field is (13) not null. My requirement is first to check whether it is in tab-delimited format or not. If it is not then generate error and put it in var1 "101" and var2="Not in tab-delimited format" and if it is in tab-delimited format then check whether first field datatype and length and also for not null value, if doesn't match then var1 "110" and var2="Mismatch/Wrong Field one" if matches then check second field and put var1= "120" and var2= "Mismatch/Wrong Field two" and so on. I want to use var1 and var2 to be used for other computation. Whatever comments you have written above have gone over my head. Please help me.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Creating data delimited by ASCII code 1

<Any suggestion how to create a file where the values are separated by ASCII code 1,with data extracted from a table using shell script The format is :/> <columnname1(binary1)columnvalue(binary1)columnname2(binary1)columnvalue(binary1)columnname3(binary1)columnvalue... 1st row/>... (6 Replies)
Discussion started by: dasun
6 Replies

2. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies

3. UNIX for Beginners Questions & Answers

Check if file is EBCDIC or ASCII format

So, i have this requirement where i need to check the file format, whether it's EBCDIC or ASCII, and based on format retrieve the information from that file: my file is: file1.txt-->this ebcdic file file2.txt-->ascii file i tried below code: file=file1.txt type="`file $file`" i get... (7 Replies)
Discussion started by: gnnsprapa
7 Replies

4. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

5. Shell Programming and Scripting

How to read data from tab delimited file after a specific position?

Hi Experts, I have a tab deliminated file as below myfile.txt Local Group Memberships *Administrators *Guests I need data in below format starting from 4th position. myfile1.txt Administrators Guests the above one is just an example and there could... (15 Replies)
Discussion started by: Litu1988
15 Replies

6. Shell Programming and Scripting

Parse tab delimited file, check condition and delete row

I am fairly new to programming and trying to resolve this problem. I have the file like this. CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam tg93 77 T C T T T T T tg93 79 ... (4 Replies)
Discussion started by: empyrean
4 Replies

7. Shell Programming and Scripting

Append output in tab delimited format

hello.. i m scripting in Perl and having issue writing the output in specific format..i read two files and run some commands and write output to one file. i want this to be a 2d table, File one has 48 rows and file two has 48 rows, first i take one id from file one, and go to second file, loop... (2 Replies)
Discussion started by: empyrean
2 Replies

8. UNIX for Dummies Questions & Answers

How to convert a text file into tab delimited format?

I have a text file that made using text editor in Ubuntu. However the text file is not being recognized as space or tab delimited, the formatting seems to be messed up. How can I convert the text file into tab delimited format? (3 Replies)
Discussion started by: evelibertine
3 Replies

9. Shell Programming and Scripting

Extracting a portion of data from a very large tab delimited text file

Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

10. Shell Programming and Scripting

how to check the file data type(ascii or binary)

hi i am receiving a file from one system , i have to verify the format of the file data i.e whether the data is in acii format or binary format, please help thanks in advance satya (1 Reply)
Discussion started by: Satyak
1 Replies
Login or Register to Ask a Question