Split files with formatted numbers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split files with formatted numbers
# 8  
Old 06-27-2014
I guess something another print command to be placed before and after
Quote:
print > fn
for header and trailer. But how?


Expected result.
Code:
split.001.txt
=============
001 of n files
record 1
record 2
....
record 100,000
date

Code:
split.002.txt
=============
002 of n files
record 1
record 2
....
record 100,000
date

# 9  
Old 06-27-2014
That section executes for every line (as the comment says) - you would probably want the header in a (new) NR%100000==1 section and the trailer in the NR%100000==0 section before the close. You'd also need an END section to handle the trailer for the last file (since it's unlikely to end on exactly 100000 lines, I assume).

If you have GNU awk you can use strftime() to get the date.

Getting the total number of files is an issue though, since awk won't know that until its processed the entire file. It might be easiest to work that out in a shell script wrapper and just pass it in as a variable.

Last edited by CarloM; 06-27-2014 at 04:31 PM.. Reason: Corrected
This User Gave Thanks to CarloM For This Post:
# 10  
Old 06-27-2014
If you don't have GNU awk (or if you want code that should work on any system), you could try something like:
Code:
#!/bin/ksh
lc=$(wc -l < split.txt)
awk -v lc="$lc" '
BEGIN {	lpf = 1000000	# Lines per output file.
}
FNR == 1 {
	# This is not in the BEGIN section to allow the default value of lpf
	# to be overridden by an assignment before the filename operand.
	nf = int((lc + lpf - 1) / lpf)	# Total number of files to be created.
}
NR % lpf == 1 {
	# 1st line of output file:
	fn=sprintf("split.%03d.txt", ++ofc)
	printf("%03d of %03d files\n", ofc, nf) > fn
}
{	# all lines:
	print > fn
}
NR % lpf == 0 {
	# Last line of output file:
	trailer()
}
END {	if(NR % lpf)
		trailer()
}
function trailer() {
	close(fn)
	cmd = sprintf("date >> \"%s\"\n", fn)
	system(cmd)
}' "$@" split.txt

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

Although I normally use the Korn shell (and this was tested using ksh), it will work with any shell that supports basic POSIX shell standard syntax (such as bash and ksh).
This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 06-30-2014
I have SunOS and awk did not work as it threw some error the following error
Quote:
awk: syntax error near line 1
awk: bailing out near line 1
However
PHP Code:
/usr/xpg4/bin/awk 
worked.
# 12  
Old 07-01-2014
Again.. Issue.
I have another Server - AIX.
awk, nawk and /usr/xpg4/bin/awk is not working.
Throwing following error
Code:
syntax error The source line is 3.
 The error context is
                   BEGIN >>>
 <<<
 awk: Quitting
 The source line is 3.

# 13  
Old 07-01-2014
Post your AIX version of the script.
# 14  
Old 07-02-2014
Fixed after having curly braces after "BEFORE" and "END" instead of having it next line.

Besides I have another issue. Initially I had requirement to pass date and now the date is not needed. But at time I was trying to format the date. From Don Cragun's code
cmd = sprintf("date >> \"%s\"\n", fn)
I changed it to
cmd = sprintf("date \+\'\%Y\%m\%d\'>> \"%s\"\n", fn) and
cmd = sprintf("date \+\"\%Y\%m\%d\">> \"%s\"\n", fn).
It threw exception with the following error
With double quotes I get the following error
Code:
awk: There are not enough parameters in printf statement date +"%Y%m%d" >> "%s"
.

 The input line number is 2. The file is tstSplit.
 The source line number is 28.

With Single Quote, I get the following error
Code:
        cmd = sprintf("date \+\'\%Y\%m\%d\' >> \"%s\"\n", fn)
        system(cmd)
} ' "$@" tstSplit
split.ksh[3]: syntax error at line 32 : `"' unmatched

However if I replace date with Echo and some text with double quotes, it works.
Code:
cmd = sprintf("echo \"Trailer\" >> \"%s\"\n", fn)

I more interested in reason than correction in code(ofcourse I also want to know how the date can be formatted Smilie)
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sum up formatted numbers with comma separation

I need to sum up the values in field nr 5 in a data file that contains some file listing. The 5th field denotes the size of each file and following are some sample values. 1,775,947,633 4,738 7,300 16,610 15,279 0 0 I tried the following code in a shell script. awk '{sum+=$5} END{print... (4 Replies)
Discussion started by: krishmaths
4 Replies

2. UNIX for Beginners Questions & Answers

Split and Rename Split Files

Hello, I need to split a file by number of records and rename each split file with actual filename pre-pended with 3 digit split number. What I have tried is the below command with 2 digit numeric value split -l 3 -d abc.txt F (# Will Produce split Files as F00 F01 F02) How to produce... (19 Replies)
Discussion started by: techedipro
19 Replies

3. Shell Programming and Scripting

awk split numbers

I would like to split a string of numbers "1-2,4-13,16,19-20,21-25,31-32" and output these with awk into -dFirstPage=1 -dLastPage=2 file.pdf -dFirstPage=4 -dLastPage=13 file.pdf -dFirstPage=16 -dLastPage=16 file.pdf file.pdf -dFirstPage=19 -dLastPage=20 file.pdf -dFirstPage=21 -dLastPage=25... (3 Replies)
Discussion started by: sdf
3 Replies

4. Shell Programming and Scripting

Split a file into multiple files based on line numbers and first column value

Hi All I have one query,say i have a requirement like the below code should be move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines. This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first... (2 Replies)
Discussion started by: sarav.shan
2 Replies

5. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies

6. Shell Programming and Scripting

Extracting formatted text and numbers

Hello, I have a file of text and numbers from which I want to extract certain fields and write it to a new file. I would use awk but unfortunately the input data isn't always formatted into the correct columns. I am using tcsh. For example, given the following data I want to extract: and... (3 Replies)
Discussion started by: DFr0st
3 Replies

7. UNIX for Dummies Questions & Answers

Split Function Prefix Numbers

Hello, Hello, I use the following command to split a file: split -Number_of_Lines Input_File MyPrefix_ output is MyPrefix_a MyPrefix_b MyPrefix_c ...... Instead, how can I get numerical values like: MyPrefix_1 MyPrefix_2 MyPrefix_3 ...... (2 Replies)
Discussion started by: Gussifinknottle
2 Replies

8. Shell Programming and Scripting

Generating formatted reports from log files

Given that I have a log file of the format: DATE ID LOG_LEVEL | EVENT 2009-07-23T14:05:11Z T-4030097550 D | MessX 2009-07-23T14:10:44Z T-4030097550 D | MessY 2009-07-23T14:34:08Z T-7298651656 D | MessX 2009-07-23T14:41:00Z T-7298651656 D | MessY 2009-07-23T15:05:10Z T-4030097550 D | MessZ... (5 Replies)
Discussion started by: daccad
5 Replies

9. Shell Programming and Scripting

Need to remove improperly formatted fortran output line from files, tried sed

I have been trying to remove some improperly formatted lines of output from fortran code I have been using. The problem is that I have some singularities in the math for some points that causes an incorrectly large value to be reported that exceeds the normal formating set in the code resulting in... (2 Replies)
Discussion started by: gillesc_mac
2 Replies
Login or Register to Ask a Question