split file by delimiter with csplit Post: 302680245

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

split string with multibyte delimiter

Hi, I need to split a string, either using awk or cut or basic unix commands (no programming) , with a multibyte charectar as a delimeter. Ex: abcd-efgh-ijkl split by -efgh- to get two segments abcd & ijkl Is it possible? Thanks A.H.S

2. UNIX for Dummies Questions & Answers

Split files using Csplit

I have an excel file with more than 65K records... Since excel does not take more than 65K records i wan to split the file and send it as two excel files... Could some help me how to use the csplit by specifiying the no of records

3. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each...

4. Shell Programming and Scripting

How to split a string with no delimiter

Hi; I want to write a shell script that will split a string with no delimiter. Basically the script will read a line from a file. For example the line it read from the file contains: 99234523 These values are never the same but the length will always be 8. How do i split this...

5. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment...

6. Shell Programming and Scripting

Split file into multiple files using delimiter

Hi, I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file. http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html I have used the below code to arrange...

7. Shell Programming and Scripting

How to target certain delimiter to split text file?

Hi, all. I have an input file. I would like to generate 3 types of output files. Input: LG10_PM_map_19_LEnd_1000560 LG10_PM_map_6-1_27101856 LG10_PM_map_71_REnd_20597718 LG12_PM_map_5_chr_118419232 LG13_PM_map_121_24341052 LG14_PM_1a_456799 LG1_MM_scf_5a_opt_abc_9029993 ...

8. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use...

9. UNIX for Beginners Questions & Answers

Shell script to Split matrix file with delimiter into multiple files

I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like: ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3 AA;ax;ay;az;01;02;03;04;05;06;07;08;09 BB;bx;by;bz;03;05;33;44;15;26;27;08;09 I want to split this table in to multiple files: ...

LEARN ABOUT MOJAVE

file::globmapper5.18

File::GlobMapper(3pm)					 Perl Programmers Reference Guide				     File::GlobMapper(3pm)

NAME

       File::GlobMapper - Extend File Glob to Allow Input and Output Files

SYNOPSIS

	   use File::GlobMapper qw( globmap );

	   my $aref = globmap $input => $output
	       or die $File::GlobMapper::Error ;

	   my $gm = new File::GlobMapper $input => $output
	       or die $File::GlobMapper::Error ;

DESCRIPTION

       This module needs Perl5.005 or better.

       This module takes the existing "File::Glob" module as a starting point and extends it to allow new filenames to be derived from the files
       matched by "File::Glob".

       This can be useful when carrying out batch operations on multiple files that have both an input filename and output filename and the output
       file can be derived from the input filename. Examples of operations where this can be useful include, file renaming, file copying and file
       compression.

   Behind The Scenes
       To help explain what "File::GlobMapper" does, consider what code you would write if you wanted to rename all files in the current directory
       that ended in ".tar.gz" to ".tgz". So say these files are in the current directory

	   alpha.tar.gz
	   beta.tar.gz
	   gamma.tar.gz

       and they need renamed to this

	   alpha.tgz
	   beta.tgz
	   gamma.tgz

       Below is a possible implementation of a script to carry out the rename (error cases have been omitted)

	   foreach my $old ( glob "*.tar.gz" )
	   {
	       my $new = $old;
	       $new =~ s#(.*).tar.gz$#$1.tgz# ;

	       rename $old => $new
		   or die "Cannot rename '$old' to '$new': $!
;
	   }

       Notice that a file glob pattern "*.tar.gz" was used to match the ".tar.gz" files, then a fairly similar regular expression was used in the
       substitute to allow the new filename to be created.

       Given that the file glob is just a cut-down regular expression and that it has already done a lot of the hard work in pattern matching the
       filenames, wouldn't it be handy to be able to use the patterns in the fileglob to drive the new filename?

       Well, that's exactly what "File::GlobMapper" does.

       Here is same snippet of code rewritten using "globmap"

	   for my $pair (globmap '<*.tar.gz>' => '<#1.tgz>' )
	   {
	       my ($from, $to) = @$pair;
	       rename $from => $to
		   or die "Cannot rename '$old' to '$new': $!
;
	   }

       So how does it work?

       Behind the scenes the "globmap" function does a combination of a file glob to match existing filenames followed by a substitute to create
       the new filenames.

       Notice how both parameters to "globmap" are strings that are delimited by <>.  This is done to make them look more like file globs - it is
       just syntactic sugar, but it can be handy when you want the strings to be visually distinctive. The enclosing <> are optional, so you don't
       have to use them - in fact the first thing globmap will do is remove these delimiters if they are present.

       The first parameter to "globmap", "*.tar.gz", is an Input File Glob.  Once the enclosing "< ... >" is removed, this is passed (more or
       less) unchanged to "File::Glob" to carry out a file match.

       Next the fileglob "*.tar.gz" is transformed behind the scenes into a full Perl regular expression, with the additional step of wrapping
       each transformed wildcard metacharacter sequence in parenthesis.

       In this case the input fileglob "*.tar.gz" will be transformed into this Perl regular expression

	   ([^/]*).tar.gz

       Wrapping with parenthesis allows the wildcard parts of the Input File Glob to be referenced by the second parameter to "globmap", "#1.tgz",
       the Output File Glob. This parameter operates just like the replacement part of a substitute command. The difference is that the "#1"
       syntax is used to reference sub-patterns matched in the input fileglob, rather than the $1 syntax that is used with perl regular
       expressions. In this case "#1" is used to refer to the text matched by the "*" in the Input File Glob. This makes it easier to use this
       module where the parameters to "globmap" are typed at the command line.

       The final step involves passing each filename matched by the "*.tar.gz" file glob through the derived Perl regular expression in turn and
       expanding the output fileglob using it.

       The end result of all this is a list of pairs of filenames. By default that is what is returned by "globmap". In this example the data
       structure returned will look like this

	    ( ['alpha.tar.gz' => 'alpha.tgz'],
	      ['beta.tar.gz'  => 'beta.tgz' ],
	      ['gamma.tar.gz' => 'gamma.tgz']
	    )

       Each pair is an array reference with two elements - namely the from filename, that "File::Glob" has matched, and a to filename that is
       derived from the from filename.

   Limitations
       "File::GlobMapper" has been kept simple deliberately, so it isn't intended to solve all filename mapping operations. Under the hood
       "File::Glob" (or for older versions of Perl, "File::BSDGlob") is used to match the files, so you will never have the flexibility of full
       Perl regular expression.

   Input File Glob
       The syntax for an Input FileGlob is identical to "File::Glob", except for the following

       1.   No nested {}

       2.   Whitespace does not delimit fileglobs.

       3.   The use of parenthesis can be used to capture parts of the input filename.

       4.   If an Input glob matches the same file more than once, only the first will be used.

       The syntax

       ~
       ~user
       .    Matches a literal '.'.  Equivalent to the Perl regular expression

		.

       *    Matches zero or more characters, except '/'. Equivalent to the Perl regular expression

		[^/]*

       ?    Matches zero or one character, except '/'. Equivalent to the Perl regular expression

		[^/]?

           Backslash is used, as usual, to escape the next character.

       []   Character class.

       {,}  Alternation

       ()   Capturing parenthesis that work just like perl

       Any other character it taken literally.

   Output File Glob
       The Output File Glob is a normal string, with 2 glob-like features.

       The first is the '*' metacharacter. This will be replaced by the complete filename matched by the input file glob. So

	   *.c *.Z

       The second is

       Output FileGlobs take the

       "*"  The "*" character will be replaced with the complete input filename.

       #1   Patterns of the form /#d/ will be replaced with the

   Returned Data
EXAMPLES

   A Rename script
       Below is a simple "rename" script that uses "globmap" to determine the source and destination filenames.

	   use File::GlobMapper qw(globmap) ;
	   use File::Copy;

	   die "rename: Usage rename 'from' 'to'
"
	       unless @ARGV == 2 ;

	   my $fromGlob = shift @ARGV;
	   my $toGlob	= shift @ARGV;

	   my $pairs = globmap($fromGlob, $toGlob)
	       or die $File::GlobMapper::Error;

	   for my $pair (@$pairs)
	   {
	       my ($from, $to) = @$pair;
	       move $from => $to ;
	   }

       Here is an example that renames all c files to cpp.

	   $ rename '*.c' '#1.cpp'

   A few example globmaps
       Below are a few examples of globmaps

       To copy all your .c file to a backup directory

	   '</my/home/*.c>'    '</my/backup/#1.c>'

       If you want to compress all

	   '</my/home/*.[ch]>'	  '<*.gz>'

       To uncompress

	   '</my/home/*.[ch].gz>'    '</my/home/#1.#2>'

SEE ALSO

       File::Glob

AUTHOR

       The File::GlobMapper module was written by Paul Marquess, pmqs@cpan.org.

COPYRIGHT AND LICENSE

       Copyright (c) 2005 Paul Marquess. All rights reserved.  This program is free software; you can redistribute it and/or modify it under the
       same terms as Perl itself.

perl v5.18.2							    2013-11-04						     File::GlobMapper(3pm)