When you use the *printf() family of functions and you want to print a percent sign (%) rather than have it act as a format field introducing character, you need to use %% as in:
If you need to include characters in the date format operand that have to be escaped from the shell (such as if you wanted the date output to include spaces between fields), it would be something like:
This User Gave Thanks to Don Cragun For This Post:
Don Cragun's code works fine. But since I rarely use Unix, I'm not expert in awk.
My requirement changed and in header, it is needed to print no. of records in each file.
Though we are splitting 100,000 records, the last file might have less than 100,000.
So to display the number of records in each split files, I guess, I have to take FNR (record number in current file). But how do I print it. FNR is known only at the end of record and we are displaying header and all the records(lines) first.
So my split files header should look like the following ~being the delimiter
Even though you're not an expert in awk, which line in the code I supplied do you think needs to be changed? Did you make any attempt at changing that line to meet your new requirements? What part of what you tried is not working?
Do you want the line count in the header of each file to include the header and trailer in that file in the count, or just the number of lines in that file from the file that is being split?
Do you still want a 3 digit number (with leading zeros) for the "Total number of files" field at the end of the header line?
I tried the following before NR % lpf == 1
I do not know where to increment it.
I need header in each splitted file, how many records(lines) it has excluding header and footer.
I guess I missed something - generally I think it is better to use a command that does what you want than to write a script, in this case
is a possible choice. It is educational to write a script but a better idea to use known good commands for production work.
Explanation: split csprap01.logscan into five files named splitz000..splitz004
-f splitz -prefix for numbered file name - splitz001 .. splits999
-n number of decimal digits in the number: -n 3 means use zero filled numbers with 3 digits for output filenames
10000 means start from where you are in the file (usually the beginning) and stop 10000 lines later == lines 1-9999 are in the first split. 10000 - 19999 in the second.
{5} repeat five times - {*} (Linux csplit) means keep on repeating. This last option will cause you to overwrite the splitz000 file (and others) if you create more than 999 files as splits.
The line in red means the last file came up short of lines. With -k you lose no lines in the splits in case of error.
I tried the following before NR % lpf == 1
I do not know where to increment it.
I need header in each splitted file, how many records(lines) it has excluding header and footer.
OK. Unfortunately, you can't count how many lines you have written into a file before you write those lines into the file. So using cntRec like you tried can only show you how many lines were written into previous files.
But, since we know how many lines we've read and how many lines are in the input file, we can calculate how many lines we are going to write into this file before we write the header record. So, remove the new action you added:
and just change the printf() statement you changed to something like:
If the current line number (which is the 1st line in an output file) - 1 + the maximum number of lines that we will write to a file is less than or equal to the the number of lines in the input file, print the maximum number of lines to write to a file; otherwise, print the number of lines left over (which will only happen on the last file and only then if there are less than lpf lines left to go into that file).
Jim,
The reason for the script is that csplit doesn't add the desired headers and trailers in the split files. And, yes, that could be done with after csplit did the big part of the job; but why read and write the data again to add a header when awk can do it in one pass.
This User Gave Thanks to Don Cragun For This Post:
I need to sum up the values in field nr 5 in a data file that contains some file listing. The 5th field denotes the size of each file and following are some sample values.
1,775,947,633
4,738
7,300
16,610
15,279
0
0
I tried the following code in a shell script.
awk '{sum+=$5} END{print... (4 Replies)
Hello,
I need to split a file by number of records and rename each split file with actual filename pre-pended with 3 digit split number.
What I have tried is the below command with 2 digit numeric value
split -l 3 -d abc.txt F (# Will Produce split Files as F00 F01 F02)
How to produce... (19 Replies)
I would like to split a string of numbers "1-2,4-13,16,19-20,21-25,31-32" and output these with awk into
-dFirstPage=1 -dLastPage=2 file.pdf -dFirstPage=4 -dLastPage=13 file.pdf -dFirstPage=16 -dLastPage=16 file.pdf file.pdf -dFirstPage=19 -dLastPage=20 file.pdf -dFirstPage=21 -dLastPage=25... (3 Replies)
Hi All
I have one query,say i have a requirement like the below code should be
move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines.
This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first... (2 Replies)
Hey,
I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time:
for i in *.rtf.out
do
awk '/^>/{f=++d".fasta"} {print > $i.out}' $i
done (1 Reply)
Hello,
I have a file of text and numbers from which I want to extract certain fields and write it to a new file. I would use awk but unfortunately the input data isn't always formatted into the correct columns. I am using tcsh.
For example, given the following data
I want to extract:
and... (3 Replies)
Hello,
Hello,
I use the following command to split a file:
split -Number_of_Lines Input_File MyPrefix_
output is
MyPrefix_a
MyPrefix_b
MyPrefix_c
......
Instead, how can I get numerical values like:
MyPrefix_1
MyPrefix_2
MyPrefix_3
...... (2 Replies)
Given that I have a log file of the format:
DATE ID LOG_LEVEL | EVENT
2009-07-23T14:05:11Z T-4030097550 D | MessX
2009-07-23T14:10:44Z T-4030097550 D | MessY
2009-07-23T14:34:08Z T-7298651656 D | MessX
2009-07-23T14:41:00Z T-7298651656 D | MessY
2009-07-23T15:05:10Z T-4030097550 D | MessZ... (5 Replies)
I have been trying to remove some improperly formatted lines of output from fortran code I have been using. The problem is that I have some singularities in the math for some points that causes an incorrectly large value to be reported that exceeds the normal formating set in the code resulting in... (2 Replies)