Sponsored Content
Full Discussion: File splitting script help
Top Forums UNIX for Dummies Questions & Answers File splitting script help Post 302943283 by Don Cragun on Thursday 7th of May 2015 04:01:25 AM
Old 05-07-2015
Using:
Code:
awk -v POS="$Position" -v NXT="$NextPostion" -v PRF="$TargetfilePrefix" '{print >  PRF "_" substr($0, POS, NXT-POS+1) ".txt"}'  $Filename

could have a problem on some versions of awk because the precedence between concatenation of strings and the output redirection operator in print statements is not specified by the standards. Some implementations of awk will treat this code as:
Code:
awk -v POS="$Position" -v NXT="$NextPostion" -v PRF="$TargetfilePrefix" '{(print >  PRF) "_" substr($0, POS, NXT-POS+1) ".txt"}'  $Filename

(possibly giving a syntax error, and certainly not producing the output files you want) and others will treat it as:
Code:
awk -v POS="$Position" -v NXT="$NextPostion" -v PRF="$TargetfilePrefix" '{print > (PRF "_" substr($0, POS, NXT-POS+1) ".txt")}'  $Filename

Since it is working on your system, we can assume that your version of awk is doing the latter.

This code doesn't close and reopen files, so, if it doesn't give an error (too many open files) it should run pretty quickly. It might run slightly faster if you move appending the "_" out of the loop and just do it once instead of two million times:
Code:
awk -v POS="$Position" -v NXT="$NextPostion" -v PRF="${TargetfilePrefix}_" '{print >  (PRF substr($0, POS, NXT-POS+1) ".txt")}'  $Filename

As was noted before, if this fails on your two million record file with a too many open files error, you'll have to build in code to close and open files. If you open and close files for each output line, that will run considerably slower. But, doing anything smarter than that would require you to evaluate the input file to see if lines directed to the same output file are closely grouped in the input, if there are common occurrences of adjacent lines that will be directed to the same output file, etc. that can be used to make smarter decisions about when to close a file and when (if ever) a file needs to be reopened.

Since you couldn't even guess at how many output files would be produced from your input file, I assume you have not tried to evaluate any of the above questions that might help produce more efficient code if you do have to close and reopen files.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with shell script - splitting

Hi, I need to split the file lines in below format as. Input file : Sample.txt <Rule expression="DeliverToCompID IS NULL" invert="true"> <Rule field="PossDupFlag" value="Y" > <Rule expression="OrdStatus = '2' AND OrigClOrdID IS NULL"> Output... (5 Replies)
Discussion started by: manosubsulo
5 Replies

2. Shell Programming and Scripting

File splitting and grouping using unix script

Hello All, I have a small problem with file group/splitting and I am trying to get the best way to perform this in unix. I am trying with awk but need some suggestion what would be the best and fastest way to-do it. Here is the problem. I have a fixed length file with filled with product... (4 Replies)
Discussion started by: nandhan11891
4 Replies

3. Shell Programming and Scripting

script for splitting file

Can anyone help me in giving a script for the below scenario I have file from the source increamenting in size... I require to write a script witch will move the data to the new file once the file reaches 50MB of size . This needs If the first file reaches 50MB then my script has to generate... (3 Replies)
Discussion started by: Sudhakishore.P
3 Replies

4. UNIX for Dummies Questions & Answers

Is there any way of splitting the script (Noob Here).

I m writing a script to check Server Hardening. The problem is whenever i add new point it grows and it become very tedious to edit the script file. Is there any way of making them separate and call them from one base script? Is it possible to define global variable that can be accessed via... (5 Replies)
Discussion started by: pinga123
5 Replies

5. Shell Programming and Scripting

Splitting a file in to multiple files and passing each individual file to a command

I have an input file with contents like: MainFile.dat: 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 ... (4 Replies)
Discussion started by: rkrish
4 Replies

6. Shell Programming and Scripting

Splitting a file and creating new files using Perl script

Hi All, I am new to Scripting language. I want to split a file and create several subfiles using Perl script. Example : File format : Sourcename ID Date Nbr SU IMYFDJ 9/17/2012 5552159976555 SU BWZMIG 9/14/2012 1952257857887 AR PEHQDF 11/26/2012 ... (13 Replies)
Discussion started by: Deepak9870
13 Replies

7. Shell Programming and Scripting

Splitting XML file on basis of line number into multiple file

Hi All, I have more than half million lines of XML file , wanted to split in four files in a such a way that top 7 lines should be present in each file on top and bottom line of should be present in each file at bottom. from the 8th line actual record starts and each record contains 15 lines... (14 Replies)
Discussion started by: ajju
14 Replies

8. Shell Programming and Scripting

Execution of loop :Splitting a single file into multiple .dat file

hdr=$(cut -c1 $path$file|head -1)#extract header”H” trl=$(cut -c|path$file|tail -1)#extract trailer “T” SplitFile=$(cut -c 50-250 $path 1$newfile |sed'$/ *$//' head -1')# to trim white space and extract table name If; then # start loop if it is a header While read I #read file Do... (4 Replies)
Discussion started by: SwagatikaP1
4 Replies

9. Shell Programming and Scripting

Script for splitting file of records into multiple files

Hello I have a file of following format HDR 1234 abc qwerty abc def ghi jkl HDR 4567 xyz qwerty abc def ghi jkl HDR 890 mno qwerty abc def ghi jkl HDR 1234 abc qwerty abc def ghi jkl HDR 1234 abc qwerty abc def ghi jkl -Need to split this into multiple files based on tag... (8 Replies)
Discussion started by: wincrazy
8 Replies

10. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies
IO::Seekable(3perl)					 Perl Programmers Reference Guide				       IO::Seekable(3perl)

NAME
IO::Seekable - supply seek based methods for I/O objects SYNOPSIS
use IO::Seekable; package IO::Something; @ISA = qw(IO::Seekable); DESCRIPTION
"IO::Seekable" does not have a constructor of its own as it is intended to be inherited by other "IO::Handle" based objects. It provides methods which allow seeking of the file descriptors. $io->getpos Returns an opaque value that represents the current position of the IO::File, or "undef" if this is not possible (eg an unseekable stream such as a terminal, pipe or socket). If the fgetpos() function is available in your C library it is used to implements getpos, else perl emulates getpos using C's ftell() function. $io->setpos Uses the value of a previous getpos call to return to a previously visited position. Returns "0 but true" on success, "undef" on failure. See perlfunc for complete descriptions of each of the following supported "IO::Seekable" methods, which are just front ends for the corresponding built-in functions: $io->seek ( POS, WHENCE ) Seek the IO::File to position POS, relative to WHENCE: WHENCE=0 (SEEK_SET) POS is absolute position. (Seek relative to the start of the file) WHENCE=1 (SEEK_CUR) POS is an offset from the current position. (Seek relative to current) WHENCE=2 (SEEK_END) POS is an offset from the end of the file. (Seek relative to end) The SEEK_* constants can be imported from the "Fcntl" module if you don't wish to use the numbers 0 1 or 2 in your code. Returns 1 upon success, 0 otherwise. $io->sysseek( POS, WHENCE ) Similar to $io->seek, but sets the IO::File's position using the system call lseek(2) directly, so will confuse most perl IO operators except sysread and syswrite (see perlfunc for full details) Returns the new position, or "undef" on failure. A position of zero is returned as the string "0 but true" $io->tell Returns the IO::File's current position, or -1 on error. SEE ALSO
perlfunc, "I/O Operators" in perlop, IO::Handle IO::File HISTORY
Derived from FileHandle.pm by Graham Barr <gbarr@pobox.com> perl v5.14.2 2010-12-30 IO::Seekable(3perl)
All times are GMT -4. The time now is 07:46 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy