Split files with formatted numbers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split files with formatted numbers
# 1  
Old 06-20-2014
Split files with formatted numbers

How to split the file and have suffix with formatted numbers

Tried the following code
Code:
awk '{filename="split."int((NR-1)/2)".txt"; print >> filename}' split.txt

Current Result
Quote:
split.0.txt
split.1.txt
split.2.txt
split.3.txt
split.4.txt
Expected Result
Quote:
split.000.txt
split.001.txt
split.002.txt
split.003.txt
split.004.txt
# 2  
Old 06-20-2014
You should use a printf statement, like this with the d printing directive...
Code:
{printf("split.%03d.txt\n","1")}

# 3  
Old 06-20-2014
how do I embed the printf in awk. I also need the content of original file in all these split file based on no. of lines I like to split
# 4  
Old 06-20-2014
sprintf prints to a string.
Code:
filename=sprintf("split.%03d.txt",(NR-1)/2)

This User Gave Thanks to MadeInGermany For This Post:
# 5  
Old 06-20-2014
Since you're specifying 3 digit file sequence numbers, I assume you expect that you'll be producing more than a hundred files with this script. There is a good chance that awk will run out of file descriptors if you keep all of them open. You might want to consider something like:
Code:
awk '
NR%2 {	# Odd lines:
	fn = sprintf("split.%03d.txt", (NR - 1) / 2)
}
{	# all lines:
	print >> fn
}
(NR%2) == 0 {
	# Even lines:
	close(fn)
}' split.txt

If there are existing split.xxx.txt files when you start this script do you really want to append data to them, or do you want to remove any data that was there before and just keep what you find in the current input file?

If you want to append to existing files, the script above should work.

If you want to replace data instead of appending data, change:
Code:
	print >> fn

to:
Code:
	print > fn

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 06-26-2014
I get 2 to 10 mil records file. I have to split them with 100,000 records in each file. Assuming that i mostly get 3 mil records, so I have to split the file in 300 files. What is the limit that awk can handle certain number of file descriptors.

Besides, how do I get header (n records) and trailer with file number or some content in it.
# 7  
Old 06-26-2014
Quote:
Originally Posted by bobbygsk
I get 2 to 10 mil records file. I have to split them with 100,000 records in each file. Assuming that i mostly get 3 mil records, so I have to split the file in 300 files. What is the limit that awk can handle certain number of file descriptors.

Besides, how do I get header (n records) and trailer with file number or some content in it.
Simple, you slightly modify the code I gave you to put 100000 lines per output file instead of 2 lines per output file. The code I gave you already closes files when it is done with them so it only keeps one output file open at a time.

You're going to have to give us a lot more than "get header (n records) and trailer with file number or some content in it" to guess at what you want to put as headers and trailers in your files. Show us sample input and show us sample output! How is your script supposed to identify which lines are headers, which lines are trailers, and what data you want added to or removed from those headers as you copy parts of the input file to your hundreds of output files?
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sum up formatted numbers with comma separation

I need to sum up the values in field nr 5 in a data file that contains some file listing. The 5th field denotes the size of each file and following are some sample values. 1,775,947,633 4,738 7,300 16,610 15,279 0 0 I tried the following code in a shell script. awk '{sum+=$5} END{print... (4 Replies)
Discussion started by: krishmaths
4 Replies

2. UNIX for Beginners Questions & Answers

Split and Rename Split Files

Hello, I need to split a file by number of records and rename each split file with actual filename pre-pended with 3 digit split number. What I have tried is the below command with 2 digit numeric value split -l 3 -d abc.txt F (# Will Produce split Files as F00 F01 F02) How to produce... (19 Replies)
Discussion started by: techedipro
19 Replies

3. Shell Programming and Scripting

awk split numbers

I would like to split a string of numbers "1-2,4-13,16,19-20,21-25,31-32" and output these with awk into -dFirstPage=1 -dLastPage=2 file.pdf -dFirstPage=4 -dLastPage=13 file.pdf -dFirstPage=16 -dLastPage=16 file.pdf file.pdf -dFirstPage=19 -dLastPage=20 file.pdf -dFirstPage=21 -dLastPage=25... (3 Replies)
Discussion started by: sdf
3 Replies

4. Shell Programming and Scripting

Split a file into multiple files based on line numbers and first column value

Hi All I have one query,say i have a requirement like the below code should be move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines. This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first... (2 Replies)
Discussion started by: sarav.shan
2 Replies

5. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies

6. Shell Programming and Scripting

Extracting formatted text and numbers

Hello, I have a file of text and numbers from which I want to extract certain fields and write it to a new file. I would use awk but unfortunately the input data isn't always formatted into the correct columns. I am using tcsh. For example, given the following data I want to extract: and... (3 Replies)
Discussion started by: DFr0st
3 Replies

7. UNIX for Dummies Questions & Answers

Split Function Prefix Numbers

Hello, Hello, I use the following command to split a file: split -Number_of_Lines Input_File MyPrefix_ output is MyPrefix_a MyPrefix_b MyPrefix_c ...... Instead, how can I get numerical values like: MyPrefix_1 MyPrefix_2 MyPrefix_3 ...... (2 Replies)
Discussion started by: Gussifinknottle
2 Replies

8. Shell Programming and Scripting

Generating formatted reports from log files

Given that I have a log file of the format: DATE ID LOG_LEVEL | EVENT 2009-07-23T14:05:11Z T-4030097550 D | MessX 2009-07-23T14:10:44Z T-4030097550 D | MessY 2009-07-23T14:34:08Z T-7298651656 D | MessX 2009-07-23T14:41:00Z T-7298651656 D | MessY 2009-07-23T15:05:10Z T-4030097550 D | MessZ... (5 Replies)
Discussion started by: daccad
5 Replies

9. Shell Programming and Scripting

Need to remove improperly formatted fortran output line from files, tried sed

I have been trying to remove some improperly formatted lines of output from fortran code I have been using. The problem is that I have some singularities in the math for some points that causes an incorrectly large value to be reported that exceeds the normal formating set in the code resulting in... (2 Replies)
Discussion started by: gillesc_mac
2 Replies
Login or Register to Ask a Question