Sponsored Content
Top Forums Shell Programming and Scripting Splitting xml file into several xml files using perl Post 302951190 by Aia on Monday 3rd of August 2015 07:14:57 PM
Old 08-03-2015
You did not post your last modified code.
split will consume that part that matches as you found out, except if you use a lookahead or lookbehind. I use that technique on this code.


Code:
#!/usr/bin/perl

use strict;
use warnings;

sub usage {
    print "Usage: $_[0] <filename>\n";
    exit;
}

my $filename = shift or usage $0;

open my $fh, '<', $filename or die "$!\n";
     # read the file as one string
     local $/;
     # split when after it finds </Document>
     my @content = split /(?<=<\/Document>)/, <$fh>;
close $fh;

# dynamically create files with content
my $suffix = 0;
for my $block (@content) {
    if($block ne "\n") {
    open $fh, '>', "$filename." . ++$suffix or die "$!\n";
            print $fh "$block\n";
    close $fh
    }
}

This User Gave Thanks to Aia For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to parse a XML file using PERL and XML::DOm

I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies

2. Shell Programming and Scripting

Splitting huge XML Files into fixsized wellformed parts

Hi, I need to split xml-files with sizes greater than 2 gb into smaler chunks. As I dont want to end up with billions of files, I want those splitted files to have configurable sizes like 250 MB. Each file should be well formed having an exact copy of the header (and footer as the closing of the... (0 Replies)
Discussion started by: Malapha
0 Replies

3. Shell Programming and Scripting

splitting huge xml into multiple files

hi all i have a some huge html files (500MB to 1GB). Each file has multiple <html></html> tags <html> ................. .................... .................... </html> <html> ................. .................... .................... </html> <html> .................... (5 Replies)
Discussion started by: uttamhoode
5 Replies

4. Shell Programming and Scripting

splitting a file (xml) into multiple files

To split the files Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Test.xml --------- <?xml version="UTF_8"> <emp: ....> <name>a</name> <age>10</age> </emp> <?xml version="UTF_8"> <emp: ....> <name>b</name> <age>10</age>... (11 Replies)
Discussion started by: sasi_u
11 Replies

5. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

6. Shell Programming and Scripting

XML Splitting into multi files

Hi , I have a XML file like below file name : sample.xml <?xml version="1.0"?> <catalog> <author>Rajini</author> <title>XML Guide</title> <Text> </Text> <genre>Computer</genre> <price>44.95</price> </catalog> <?xml version="1.0"?> <catalog> ... (5 Replies)
Discussion started by: karthinvk
5 Replies

7. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

8. Shell Programming and Scripting

Splitting CSV into variables then to XML file

I have a text file that looks like this: FIELD1, FIELD2, THIS IS FIELD3, FIELD4 FIELD1, FIELD2, THIS IS FIELD3, FIELD4 FIELD1, FIELD2, THIS IS FIELD3, FIELD4 I need it to turn it into an XML file to run against a custom application. My ultimate goal is for it to look like... (15 Replies)
Discussion started by: jeffs42885
15 Replies

9. Shell Programming and Scripting

Splitting the XML file into three different files

Hello Shell Guru's I have a requirement to split the source xml file into three different text file. And i need your valuable suggestion to finish this. Here is my source xml snippet, here i am using only one entry of <jms-system-resource>. There may be multiple entries in the source file. ... (5 Replies)
Discussion started by: Siv51427882
5 Replies

10. UNIX for Beginners Questions & Answers

Splitting the XML file and renaming the files

Hello Gurus, I have a requirement to split the xml file into different xml files. Can you please help me with that? Here is my Source XML file <jms-system-resource> <name>PS6SOAJMSModule</name> <target>soa_server1</target> <sub-deployment> ... (3 Replies)
Discussion started by: Siv51427882
3 Replies
Bio::Index::Swissprot(3pm)				User Contributed Perl Documentation				Bio::Index::Swissprot(3pm)

NAME
Bio::Index::Swissprot - Interface for indexing one or more Swissprot files. SYNOPSIS
# Make an index for one or more Swissprot files: use Bio::Index::Swissprot; use strict; my $index_file_name = shift; my $inx = Bio::Index::Swissprot->new( -filename => $index_file_name, -write_flag => 1); $inx->make_index(@ARGV); # Print out several sequences present in the index in Genbank # format: use Bio::Index::Swissprot; use Bio::SeqIO; use strict; my $out = Bio::SeqIO->new( -format => 'genbank', -fh => *STDOUT ); my $index_file_name = shift; my $inx = Bio::Index::Swissprot->new(-filename => $index_file_name); foreach my $id (@ARGV) { my $seq = $inx->fetch($id); # Returns a Bio::Seq object $out->write_seq($seq); } # alternatively my ($id, $acc); my $seq1 = $inx->get_Seq_by_id($id); my $seq2 = $inx->get_Seq_by_acc($acc); DESCRIPTION
By default the index that is created uses the AC and ID identifiers as keys. This module inherits functions for managing dbm files from Bio::Index::Abstract.pm, and provides the basic functionality for indexing Swissprot files and retrieving Sequence objects from them. For best results 'use strict'. You can also set or customize the unique key used to retrieve by writing your own function and calling the id_parser() method. For example: $inx->id_parser(&get_id); # make the index $inx->make_index($index_file_name); # here is where the retrieval key is specified sub get_id { my $line = shift; $line =~ /^KWs+([A-Z]+)/i; $1; } FEED_BACK Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Ewan Birney Also lorenz@ist.org, bosborne at alum.mit.edu APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ _index_file Title : _index_file Usage : $index->_index_file( $file_name, $i ) Function: Specialist function to index Swissprot format files. Is provided with a filename and an integer by make_index in its SUPER class. Example : Returns : Args : id_parser Title : id_parser Usage : $index->id_parser( CODE ) Function: Stores or returns the code used by record_id to parse the ID for record from a string. Returns &default_id_parser (see below) if not set. An entry will be added to the index for each string in the list returned. Example : $index->id_parser( &my_id_parser ) Returns : ref to CODE if called without arguments Args : CODE default_id_parser Title : default_id_parser Usage : $id = default_id_parser( $line ) Function: The default parser for Swissprot.pm Returns $1 from applying the regexp /^IDs*(S+)/ or /^ACs+([A-Z0-9]+)/ to the current line. Returns : ID string Args : a line string _file_format Title : _file_format Usage : Internal function for indexing system Function: Provides file format for this database Example : Returns : Args : perl v5.14.2 2012-03-02 Bio::Index::Swissprot(3pm)
All times are GMT -4. The time now is 01:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy