Sponsored Content
Top Forums Shell Programming and Scripting Reading filenames with extension .xml Post 302160285 by awk on Monday 21st of January 2008 10:49:05 AM
Old 01-21-2008
Quote:
Originally Posted by bhalotias
Hi,

I want to write a script to read all the filenames with extension .xml in a directory and pass the name of the file, one by one, to another function.

Please help me out.

Regards.
Saurabh
As a warning, xml can be tricky to handle for various reasons:
1) it could all be on a single line - which means many unix utilities will not handle properly.
2) white space between tags is totally allowed, or optional (see #1)
3) It could be in "uni-code" which means there could be binary zeroes between every other character.

so have fun with the variations.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

reading filenames inside a program

UNIX Sun Ultra60 5.5.1 Hello everybody, I have a problem that seems simple but turns out to be complex (for me at least). My program needs to open a directory (this part is easy), scan each filename and determine whether or not a file with the suffix (.07) exists. So the program would return... (5 Replies)
Discussion started by: j_t_kim
5 Replies

2. UNIX for Dummies Questions & Answers

Adding an extension to a group of filenames

Hi - I'm stuck. I have a group of text files created using the split command. My files have the names "projectaa", "projectab", "projectac", etc. What I want to do is add the extension ".txt" to each file. I think I've got part of a sed command together, but I'm stuck on my regex - I keep getting... (9 Replies)
Discussion started by: pepintheshort
9 Replies

3. UNIX for Dummies Questions & Answers

reading long filenames from nero to AIX

One of my colleagues is having an issue moving files between a windows box and the AIX servers in the office. The filenames are being truncated though i don't know to what extent. He's using Nero to burn the CD and I think he mentioned he's using Joliet. I found another thread that shows a... (1 Reply)
Discussion started by: categoryzd
1 Replies

4. Shell Programming and Scripting

bash: reading filenames from file

Hi, I'm trying to write a script that reads filenames from a file and use these filenames in a loop. The filenames are all on one line and the problem is that these filenames have wildcards like * and braces like in them. Right now what I'm doing is something like this: echo "reading from... (0 Replies)
Discussion started by: warp17
0 Replies

5. Shell Programming and Scripting

How to match the last XML extension by using Case statement

Hi All, I have a status.txt file which contains following three files. 1.xml 2.xml 3.xml Now i have written a shell script 1.sh which contains the following cat status.txt | while read filename do echo $filename case "$filename" in xml) echo "running 1.xml" ;; ... (3 Replies)
Discussion started by: sunitachoudhury
3 Replies

6. UNIX for Dummies Questions & Answers

removing the extension from all filenames in a folder

Hi there, I'm pretty new to UNIX and have tried trawling through this forum to find an answer to what I want to try to do, which I'm sure is very simple but I don't know how to do it. What I have a a folder that contains multiple files that I have copied from Windows and I want to remove the... (5 Replies)
Discussion started by: johnmcclintock
5 Replies

7. Shell Programming and Scripting

Reading extension of files

Hi, I need a command to read extension of files. Could anyone please help me? (14 Replies)
Discussion started by: priyadarshini
14 Replies

8. Shell Programming and Scripting

change filenames but not extension

I have a filename with a bunch of periods that I want to replace with underscores, but I don't want to change the extension. Ex: I want file.test1.f-1.fig.eps to be file_test1_f-1_fig.eps Using awk, the following line will replace ALL periods with underscores, but I want to leave the... (2 Replies)
Discussion started by: erinbot
2 Replies

9. Shell Programming and Scripting

How to remove filenames having the same extension.?

hi, i have a directory which contains some files and a subdirectory. i am writing only the files names to a file using the below code. ls -ltr | grep "^-" | awk '{print $NF}' > /home/file_list$$ cat /home/file_list$$ s1_abc.txt s2_def.xls s3_def.xls as you can see there is one .txt... (7 Replies)
Discussion started by: Little
7 Replies

10. UNIX for Dummies Questions & Answers

Reading filenames with spaces

Hello I've got a certain no. of files in a directory whose names I'm reading and redirecting into a temporary text file using the command below: ls -l | grep ^- | awk '{print $9}'However, whenever the file names contain spaces the above command considers only the part of the file name up to... (5 Replies)
Discussion started by: S. BASU
5 Replies
XMLWF(1)																  XMLWF(1)

NAME
xmlwf - Determines if an XML document is well-formed SYNOPSIS
xmlwf [ -s] [ -n] [ -p] [ -x] [ -e encoding] [ -w] [ -d output-dir] [ -c] [ -m] [ -r] [ -t] [ -v] [ file ...] DESCRIPTION
xmlwf uses the Expat library to determine if an XML document is well-formed. It is non-validating. If you do not specify any files on the command-line, and you have a recent version of xmlwf, the input file will be read from stdin. WELL-FORMED DOCUMENTS A well-formed document must adhere to the following rules: o The file begins with an XML declaration. For instance, <?xml version="1.0" standalone="yes"?>. NOTE: xmlwf does not currently check for a valid XML declaration. o Every start tag is either empty (<tag/>) or has a corresponding end tag. o There is exactly one root element. This element must contain all other elements in the document. Only comments, white space, and pro- cessing instructions may come after the close of the root element. o All elements nest properly. o All attribute values are enclosed in quotes (either single or double). If the document has a DTD, and it strictly complies with that DTD, then the document is also considered valid. xmlwf is a non-validating parser -- it does not check the DTD. However, it does support external entities (see the -x option). OPTIONS
When an option includes an argument, you may specify the argument either separate ("d output") or mashed ("-doutput"). xmlwf supports both. -c If the input file is well-formed and xmlwf doesn't encounter any errors, the input file is simply copied to the output directory unchanged. This implies no namespaces (turns off -n) and requires -d to specify an output file. -d output-dir Specifies a directory to contain transformed representations of the input files. By default, -d outputs a canonical representation (described below). You can select different output formats using -c and -m. The output filenames will be exactly the same as the input filenames or "STDIN" if the input is coming from STDIN. Therefore, you must be careful that the output file does not go into the same directory as the input file. Otherwise, xmlwf will delete the input file before it generates the output file (just like running cat < file > file in most shells). Two structurally equivalent XML documents have a byte-for-byte identical canonical XML representation. Note that ignorable white space is considered significant and is treated equivalently to data. More on canonical XML can be found at http://www.jclark.com/xml/canonxml.html . -e encoding Specifies the character encoding for the document, overriding any document encoding declaration. xmlwf has four built-in encodings: US-ASCII, UTF-8, UTF-16, and ISO-8859-1. Also see the -w option. -m Outputs some strange sort of XML file that completely describes the the input file, including character postitions. Requires -d to specify an output file. -n Turns on namespace processing. (describe namespaces) -c disables namespaces. -p Tells xmlwf to process external DTDs and parameter entities. Normally xmlwf never parses parameter entities. -p tells it to always parse them. -p implies -x. -r Normally xmlwf memory-maps the XML file before parsing. -r turns off memory-mapping and uses normal file IO calls instead. Of course, memory-mapping is automatically turned off when reading from STDIN. -s Prints an error if the document is not standalone. A document is standalone if it has no external subset and no references to parameter entities. -t Turns on timings. This tells Expat to parse the entire file, but not perform any processing. This gives a fairly accurate idea of the raw speed of Expat itself without client overhead. -t turns off most of the output options (-d, -m -c, ...). -v Prints the version of the Expat library being used, and then exits. -w Enables Windows code pages. Normally, xmlwf will throw an error if it runs across an encoding that it is not equipped to handle itself. With -w, xmlwf will try to use a Windows code page. See also -e. -x Turns on parsing external entities. Non-validating parsers are not required to resolve external entities, or even expand entities at all. Expat always expands internal entities (?), but external entity parsing must be enabled explicitly. External entities are simply entities that obtain their data from outside the XML file currently being parsed. This is an example of an internal entity: <!ENTITY vers '1.0.2'> And here are some examples of external entities: <!ENTITY header SYSTEM "header-&vers;.xml"> (parsed) <!ENTITY logo SYSTEM "logo.png" PNG> (unparsed) -- For some reason, xmlwf specifically ignores "--" anywhere it appears on the command line. Older versions of xmlwf do not support reading from STDIN. OUTPUT
If an input file is not well-formed, xmlwf outputs a single line describing the problem to STDOUT. If a file is well formed, xmlwf outputs nothing. Note that the result code is not set. BUGS
According to the W3C standard, an XML file without a declaration at the beginning is not considered well-formed. However, xmlwf allows this to pass. xmlwf returns a 0 - noerr result, even if the file is not well-formed. There is no good way for a program to use xmlwf to quickly check a file -- it must parse xmlwf's STDOUT. The errors should go to STDERR, not stdout. There should be a way to get -d to send its output to STDOUT rather than forcing the user to send it to a file. I have no idea why anyone would want to use the -d, -c and -m options. If someone could explain it to me, I'd like to add this information to this manpage. ALTERNATIVES
Here are some XML validators on the web: http://www.hcrc.ed.ac.uk/~richard/xml-check.html http://www.stg.brown.edu/service/xmlvalid/ http://www.scripting.com/frontier5/xml/code/xmlValidator.html http://www.xml.com/pub/a/tools/ruwf/check.html 22 April 2002 XMLWF(1)
All times are GMT -4. The time now is 11:12 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy