How to: Validate a CSV file using an XSD?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to: Validate a CSV file using an XSD?
# 1  
Old 06-06-2011
How to: Validate a CSV file using an XSD?

Hi All,

I was wondering if there is a utility/perl library/way of validating the contents of a CSV file using an XSD.

i.e. Customer csv (including Header & Trailer)

Cust_num, Cust_nme, Cust_typ, Cust_act_dte, Cust_loc,
101,Joe's Pizza,Retail,10121979,Detroit,
102,Sony Corp,Commercial,10101946,Tokyo,
103,K-CO Foods,Wholesale,01041987,London,
3

The XSD to validate against would contain:

<xs:element name="Cust_num," type="xsSmilieositiveInteger"/>
<xs:element name="Cust_nme" type="xs:string" minOccurs="0"/>
<xs:element name="Cust_typ" type="xs:string" minOccurs="0"/>
<xs:element name="Cust_act_dte" type="xs:date" />

Please let us have your thoughts?

Many Thanks

Luinzi
# 2  
Old 06-06-2011
DSDL looks like a match for this sort of thing. https://forge.gridforum.org/sf/projects/dfdl-wg
This User Gave Thanks to DGPickett For This Post:
# 3  
Old 06-06-2011
An XML schema describes the allowed structure and types for an XML document, not a CSV file. The best you can do is create an XML document from the CSV file contents and validate that document against the XML schema.
# 4  
Old 06-07-2011
CSV does have some rules that might support a validation.
  1. Lines should be cr-lf, but for most of us, that is not critical.
  2. You cannot leave double quotes open at end of line, or is that a line break embedded in a field? Probably, the user should have to allow that.
  3. It'd be nice if the field count of every line was identical (quoting problems, embedded linefeeds?), but that is somewhat a warning. The user should be able to provide an expected range of fields, or indicate if the field count is not always identical in the data set.
  4. You need to be double-quote sensitive when counting fields, it is not simple comma delimited text.
  5. You need to be doubled-double-quote sensitive when evaluating double quotes.
Yeah, I always see XSD in the context of parsers like xerces having XML checking against XSD, although for speed I did my own checks. I was not sure if XSD was extensible to CSV format. XML is a slippery world, in terms of what you can do! Smilie
This User Gave Thanks to DGPickett For This Post:
# 5  
Old 07-29-2011
Actually, fp, XSD's can define formats for other than XML. This company received a number of XSD's for ANSI X.12 files from Microsoft with our Biztalk software. (Specifically in my case, using an XSD for X.12 HIPAA 5010 837P claims files).

I looked at this thread because we have reason to want to validate the X.12 files against the XSD from code rather than through Biztalk in one instance

Regards
# 6  
Old 07-29-2011
Quote:
Actually, fp, XSD's can define formats for other than XML.
I believe that Biztalk builds an XML representation of the X.12 message before validating it using the appropriate XSD.
# 7  
Old 08-01-2011
Certainly, for generic CSV to XML, the records can be in an XML element, and the fields can be XML strings, field-numbered by attribute within each element. Generally, a CSV record's position in the file is not significant, but records could be numbered, too, so that order is available.

However, the CSV needs to be CSV-Valid for conversion to anything.
  • I can imagine XML that describes the CSV rules, and
  • if I imagine a little harder, I can imagine XML to describe a specific CSV file so they validity of text to other type can be checked, as well as the validity of field counts, and to annotate columns with official labels.
  • But how far does it go? If I imagine really hard, I imagine XSD can do all this, but it feels a bit of a reach, since XSD exists to validate XML, not other file structures. I think it sounds like a nice accessory, but not a subset of XSD, conceptually. You could even write a reverse XML compiler to sniff files and suggest structure, create XML, validate to that XML, interactively, like Excel text to columns but smarter, more tightly typed.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to validate header in a csv file

Hi All; I am struggling to write a script that validates file header. Header file would be like below with TAB separated TRX # TYPE REF # Source Piece Code Destination Piece Code every time I need to check the txt file if the header was same as above fields if validation success... (6 Replies)
Discussion started by: heye18
6 Replies

2. UNIX for Beginners Questions & Answers

Validate csv file

Hi guys, i want to validate the no.of colums in a csv file ,but if there is a comma(,) in any of the data values it should skip and count only valid (,) commas. e.g 1.abc,pqrs,1234,567,hhh result :4 2.abc,pqrs,1234,567,hhh,"in,valid",end12 result:6 here script should skip the comma inside... (10 Replies)
Discussion started by: harry123
10 Replies

3. Shell Programming and Scripting

Script to ingest a csv, validate data and insert into Oracle

Hi all i would appreciate your help... I am looking for a set of unix commands which i can use to 1) ingest a csv file with a known format 2) validate the filename 3) validate the data/datatypes 4) Insert into an oracle db Can you help get me started? yogz888 (1 Reply)
Discussion started by: yogz888
1 Replies

4. Shell Programming and Scripting

Validating XML file using XSD in UNIX

Hi, I have a xml file and a xsd file(xml schema file). Here using unix script i wanted to validate the xml file by referring to xsd file. The validation is in terms of Datatype,Field length and null values. If the data present in the xml file is not matching in terms of datatype,field length... (3 Replies)
Discussion started by: shree11
3 Replies

5. Programming

problem with xsd file creation

Hi every one, I am new to xml data files,I have two xml files with same data but only small difference as shown below <List> <number>1101</number> <Area>inner walls in a room.</Area> <Detection>less pressure.</Detection> <reason> <normal> <Component Num="15"... (1 Reply)
Discussion started by: veerubiji
1 Replies

6. Shell Programming and Scripting

Validate xml agaist xsd is ksh

how do i validate xml agaist xsd is ksh? (1 Reply)
Discussion started by: LiorAmitai
1 Replies

7. Shell Programming and Scripting

Help required converting XSD to XML file in PERL

Hi, Please find below the xsd. <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="fruitorder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson"... (2 Replies)
Discussion started by: vanitham
2 Replies

8. Shell Programming and Scripting

XMLLINT COMMAND IN UNIX TO VALIDATE XML AGAINST XSD

Hi i am baby to unix shell script. how do i validate xml agaist xsd and transforms xml using xslt. Thanks Mohan (2 Replies)
Discussion started by: mohan.cheepu
2 Replies

9. Shell Programming and Scripting

How to validate a CSV file?

Hi. I think some people have already asked this, but the answers/questions seem to be about validating the contents inside a CSV file. I am simply after a simple variable solution (ie 0 = false, 1 = true) that I can use in my script to say that file so-and-so is actually a CSV file, or in some... (4 Replies)
Discussion started by: ElCaito
4 Replies

10. Shell Programming and Scripting

validate csv file load

Hi All, I am using sqlldr to load my csv files into the database. The code in the sh script is as follows. sqlldr ${DBUSER}/${DBPASS}@${ORACLE_SID} \ data=myCSV.data \ bad=myCSV.bad \ control=myCSV.ctl \ ... (0 Replies)
Discussion started by: rahulrathod
0 Replies
Login or Register to Ask a Question