Sponsored Content
Top Forums Shell Programming and Scripting Using AWK to separate data from a large XML file into multiple files Post 302362706 by JRy on Saturday 17th of October 2009 02:57:43 AM
Old 10-17-2009
Quote:
Originally Posted by jp2542a
The only thing that I can think of is that split.awk contains more than just the script code. Invisible characters? I've tried this on both Centos and a windows (cygwin) system...
Here's the exact contents of my file if you want to take a look:
share1t.com File Sharing | Download: split.awk
I still feel like I may be doing something wrong.

Thank you again for your help, I really do appreciate it. Smilie
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
Discussion started by: rtroscianecki
2 Replies

2. Shell Programming and Scripting

handling multiple files using awk command and wants to get separate out file for each

hai all I am new to the world of shell scripting I wanted to extract two columns from multiple files say around 25 files and i wanted to get the separate outfile for each input file tired using the following command to extract two columns from 25 files awk... (2 Replies)
Discussion started by: hema dhevi
2 Replies

3. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

4. UNIX for Dummies Questions & Answers

awk to match multiple regex and create separate output files

Howdy Folks, I have a list that looks like this: (file2.txt) AAA BBB CCC DDD and there are 24 of these short words. I am matching these patterns to another file with 755795 lines (file1.txt). I have this code for matching: awk -v f2=file2.txt ' BEGIN { while(... (2 Replies)
Discussion started by: heecha
2 Replies

5. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

6. Shell Programming and Scripting

create separate files from one excel file with multiple sheets

Hi, I have one requirement, create separate files (".csv") from one excel file(xlsx) with multiple sheets. These ".csv" files are my source files. So anybody please suggest me the process. Thanks in Advance. Regards, Harris (3 Replies)
Discussion started by: harris
3 Replies

7. Shell Programming and Scripting

Process multiple large files with awk

Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a... (4 Replies)
Discussion started by: camor
4 Replies

8. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

9. Shell Programming and Scripting

awk - Multiple files - 1 file with multi-line data

Greetings experts, Have 2 input files, of which 1 file has 1 record per line; in 2nd file, multiple lines constitute 1 record; Hence declared the RS=";" Now in the first file which ends with ";" at each line of the line; But \nis also being considered as part of the data due to which I am... (1 Reply)
Discussion started by: chill3chee
1 Replies

10. Shell Programming and Scripting

Split large xml into mutiple files and with header and footer in file

Split large xml into mutiple files and with header and footer in file tried below it splits unevenly and also i need help in adding header and footer command : csplit -s -k -f my_XML_split.xml extrfile.xml "/<Document>/" {1} sample xml <?xml version="1.0" encoding="UTF-8"?><Recipient>... (36 Replies)
Discussion started by: karthik
36 Replies
uri(n)						    Tcl Uniform Resource Identifier Management						    uri(n)

__________________________________________________________________________________________________________________________________________________

NAME
uri - URI utilities SYNOPSIS
package require Tcl 8.2 package require uri ?1.2.1? uri::split url ?defaultscheme? uri::join ?key value?... uri::resolve base url uri::isrelative url uri::geturl url ?options...? uri::canonicalize uri uri::register schemeList script _________________________________________________________________ DESCRIPTION
This package contains two parts. First it provides regular expressions for a number of url/uri schemes. Second it provides a number of com- mands for manipulating urls/uris and fetching data specified by them. For the latter this package analyses the requested url/uri and then dispatches it to the appropriate package (http, ftp, ...) for actual fetching. The package currently does not conform to RFC 2396 (http://www.rfc-editor.org/rfc/rfc2396.txt), but quite likely should be. Patches and other help are welcome. COMMANDS
uri::split url ?defaultscheme? uri::split takes an url, decodes it and then returns a list of key/value pairs suitable for array set containing the constituents of the url. If the scheme is missing from the url it defaults to the value of defaultscheme if it was specified, or http else. Cur- rently only the schemes http, ftp, mailto, urn, news, ldap and file are supported by the package itself. See section EXTENDING on how to expand that range. The set of constituents of an url (= the set of keys in the returned dictionary) is dependent on the scheme of the url. The only key which is therefore always present is scheme. For the following schemes the constituents and their keys are known: ftp user, pwd, host, port, path, type http(s) user, pwd, host, port, path, query, fragment. The fragment is optional. file path, host. The host is optional. mailto user, host. The host is optional. news Either message-id or newsgroup-name. uri::join ?key value?... uri::join takes a list of key/value pairs (generated by uri::split, for example) and returns the canonical url they represent. Cur- rently only the schemes http, ftp, mailto, urn, news, ldap and file are supported. See section EXTENDING on how to expand that range. uri::resolve base url uri::resolve resolves the specified url relative to base. In other words: A non-relative url is returned unchanged, whereas for a relative url the missing parts are taken from base and prepended to it. The result of this operation is returned. For an empty url the result is base. uri::isrelative url uri::isrelative determines whether the specified url is absolute or relative. uri::geturl url ?options...? uri::geturl decodes the specified url and then dispatches the request to the package appropriate for the scheme found in the url. The command assumes that the package to handle the given scheme either has the same name as the scheme itself (including possible capitalization) followed by ::geturl, or, in case of this failing, has the same name as the scheme itself (including possible capi- talization). It further assumes that whatever package was loaded provides a geturl-command in the namespace of the same name as the package itself. This command is called with the given url and all given options. Currently geturl does not handle any options itself. Note: file-urls are an exception to the rule described above. They are handled internally. It is not possible to specify results of the command. They depend on the geturl-command for the scheme the request was dispatched to. uri::canonicalize uri uri::canonicalize returns the canonical form of a URI. The canonical form of a URI is one where relative path specifications, ie. . and .., have been resolved. uri::register schemeList script uri::register registers the first element of schemeList as a new scheme and the remaining elements as aliases for this scheme. It creates the namespace for the scheme and executes the script in the new namespace. The script has to declare variables containing the regular expressions relevant to the scheme. At least the variable schemepart has to be declared as that one is used to extend the variables keeping track of the registered schemes. SCHEMES
In addition to the commands mentioned above this package provides regular expression to recognize urls for a number of url schemes. For each supported scheme a namespace of the same name as the scheme itself is provided inside of the namespace uri containing the variable url whose contents are a regular expression to recognize urls of that scheme. Additional variables may contain regular expressions for parts of urls for that scheme. The variable uri::schemes contains a list of all supported schemes. Currently these are ftp, ldap, file, http, gopher, mailto, news, wais and prospero. EXTENDING
Extending the range of schemes supported by uri::split and uri::join is easy because both commands do not handle the request by themselves but dispatch it to another command in the uri namespace using the scheme of the url as criterion. uri::split and uri::join call Split[string totitle <scheme>] and Join[string totitle <scheme>] respectively. CREDITS
Original code (regular expressions) by Andreas Kupries. Modularisation by Steve Ball, also the split/join/resolve functionality. BUGS, IDEAS, FEEDBACK This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category uri of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation. KEYWORDS
fetching information, file, ftp, gopher, http, ldap, mailto, news, prospero, rfc 2255, rfc 2396, uri, url, wais, www uri 1.2.1 uri(n)
All times are GMT -4. The time now is 01:21 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy