04-01-2014
For high performance, you need a language that can seek to half way and then search forward or back for 5 linefeeds (assuming no DOS cr or MAC cr only lines). As I recall, fcntl() can truncate the original file to size after the new file is successfully written, if you like the efficiency of less copying. Using mmap() makes both examining and writing simpler, as the file becomes a char[] in VM you can walk through and write() the tail of in one write(). You only need to mmap the top half if the first half is to be the bigger, and with mmap(), no seek()!
Did it have to be half by items, by lines or by byte count?
---------- Post updated at 03:36 PM ---------- Previous update was at 03:27 PM ----------
Taking a completely different tack, one could split it in sed or awk or such by writing alternate items to two different files. Is the exactly 4 blank lines reliable? Are they there at EOF?
Last edited by DGPickett; 04-01-2014 at 04:35 PM..
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I need to have an if statement in a script to run if there are certain processes running. Easiest way I can see to do this is to run a ps and grep the results based on what I am looking for:
$ ps -ef | grep wtrs
---
webtrend 5046 1 0 May 12 ? 0:28 /webtrends/versions/6.1/wtrs_ui... (6 Replies)
Discussion started by: LordJezo
6 Replies
2. Shell Programming and Scripting
Hello all.
Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need.
My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to... (11 Replies)
Discussion started by: shankster
11 Replies
3. Shell Programming and Scripting
Experts,
I have a file datafile.txt that consists of 1732 Line,
I want to split the file into equal number of lines with 10 file.
(The last file can have 2 line extra to match 1732)
Please advise how to do that,
Thanks in advance.. (2 Replies)
Discussion started by: rveri
2 Replies
4. Shell Programming and Scripting
Dear All,
I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...):
ATOM 1 N GLY A 1 -3.198 27.537 -5.958 1.00 0.00 N
ATOM 2 CA GLY A 1 -2.199 28.399 -6.617 1.00 0.00 ... (3 Replies)
Discussion started by: tomasl
3 Replies
5. Shell Programming and Scripting
I am getting a few gzip files into a folder by doing ftp to another server. Once I get them I move them to another location .But before that I need to make sure each gzip is not more than 5000 lines and split it up . The files I get are anywhere from 500 lines to 10000 lines in them and is in gzip... (4 Replies)
Discussion started by: gubbu
4 Replies
6. Shell Programming and Scripting
hi, i really need it ...it's not simple to explain but as it's part of a crontab i can't split the file manually...and the file can change every day so the lines are not a good base.
example: how to split 1 csv file in 15 files?
thank you very much
regards :b: (4 Replies)
Discussion started by: 7stars
4 Replies
7. UNIX for Dummies Questions & Answers
Hello Friends,
Can anyone help me for the below requirement.
I am having a file called Input.txt.
My requirement is first check the count that is wc -l input.txt
If the result of the wc -l Input.txt is less than 10 then don't split the Input.txt file. Where as if Input.txt >= 10 the split... (12 Replies)
Discussion started by: malaya kumar
12 Replies
8. Shell Programming and Scripting
Hi
i have requirement like below
M <form_name> sdasadasdMklkM
D ......
D .....
M form_name> sdasadasdMklkM
D ......
D .....
D ......
D .....
M form_name> sdasadasdMklkM
D ......
M form_name> sdasadasdMklkM
i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies
9. Shell Programming and Scripting
Hi
I have a requirement, where i will receive multiple files in a folder (say: /fol1/fol2/). There will be at least 14 to 16 files. The size of the files will different, some may be 80GB or 90GB, some may be less than 5 GB (and the size of the files are very unpredictable). But the names of the... (10 Replies)
Discussion started by: kpk_ds
10 Replies
10. UNIX for Beginners Questions & Answers
I am using below code to split files based on blank lines but it does not work.
awk 'BEGIN{i=0}{RS="";}{x="F"++i;}{print > x;}'
Your help would be highly appreciated
find attachment of sample.txt file (2 Replies)
Discussion started by: imranrasheedamu
2 Replies
COL(1) BSD General Commands Manual COL(1)
NAME
col -- filter reverse line feeds from input
SYNOPSIS
col [-bfhpx] [-l num]
DESCRIPTION
The col utility filters out reverse (and half reverse) line feeds so that the output is in the correct order with only forward and half for-
ward line feeds, and replaces white-space characters with tabs where possible. This can be useful in processing the output of nroff(1) and
tbl(1).
The col utility reads from the standard input and writes to the standard output.
The options are as follows:
-b Do not output any backspaces, printing only the last character written to each column position.
-f Forward half line feeds are permitted (``fine'' mode). Normally characters printed on a half line boundary are printed on the fol-
lowing line.
-h Do not output multiple spaces instead of tabs (default).
-l num Buffer at least num lines in memory. By default, 128 lines are buffered.
-p Force unknown control sequences to be passed through unchanged. Normally, col will filter out any control sequences from the input
other than those recognized and interpreted by itself, which are listed below.
-x Output multiple spaces instead of tabs.
In the input stream, col understands both the escape sequences of the form escape-digit mandated by Version 2 of the Single UNIX
Specification (``SUSv2'') and the traditional BSD format escape-control-character. The control sequences for carriage motion and their ASCII
values are as follows:
ESC-BELL reverse line feed (escape then bell).
ESC-7 reverse line feed (escape then 7).
ESC-BACKSPACE half reverse line feed (escape then backspace).
ESC-8 half reverse line feed (escape then 8).
ESC-TAB half forward line feed (escape than tab).
ESC-9 half forward line feed (escape then 9). In -f mode, this sequence may also occur in the output stream.
backspace moves back one column (8); ignored in the first column
carriage return (13)
newline forward line feed (10); also does carriage return
shift in shift to normal character set (15)
shift out shift to alternate character set (14)
space moves forward one column (32)
tab moves forward to next tab stop (9)
vertical tab reverse line feed (11)
All unrecognized control characters and escape sequences are discarded.
The col utility keeps track of the character set as characters are read and makes sure the character set is correct when they are output.
If the input attempts to back up to the last flushed line, col will display a warning message.
ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of col as described in environ(7).
EXIT STATUS
The col utility exits 0 on success, and >0 if an error occurs.
SEE ALSO
colcrt(1), expand(1), nroff(1), tbl(1)
STANDARDS
The col utility conforms to Version 2 of the Single UNIX Specification (``SUSv2'').
HISTORY
A col command appeared in Version 6 AT&T UNIX.
BSD May 10, 2015 BSD