Howdy folks, I've got a very large plain text file that I need to split into many smaller files. My script-fu is not powerful enough for this, so any assistance is much appreciated.
The file is a database dump from Cyrus IMAP server. It's basically a bunch of emails (thousands) all concatenated into one huge file. There is a delimiter line between each email. It looks something like this
So as you can see, the start of each email is preceded with a line that begins with "--dump".
What I'm looking for, is:
1. To split this monolithic file into many smaller files, where each smaller file contains a single email.
2. Where each smaller file should contain all of the lines of text after a "--dump" delimiter, up until the next "--dump" delimiter (or end of file).
3. And the "--dump" delimiter line itself should not be included in each smaller file.
I feel like some awk/grep/sed magic could do this, but I'm not enough of a wizard to write this script.
Could one of you shad some light on this:
I need to split the file by determining the record count and than splitting it up into 4 files. Please note, this is not a fixed record length but rather a "|" delimited file.
I am not sure as how to handle reminder/offset for the 4th file.
For... (4 Replies)
Hi all, I need help to split a tab-delimited list into separate files by the filename-field. The list is already sorted ascendingly by filename, an example list would look like this;
filename001 word1 word2
filename001 word3 word4
filename002 word1 word2
filename002 word3 word4... (4 Replies)
Hello
We have a text file with 400,000 lines and need to split into multiple files each with 5000 lines ( will result in 80 files)
Got an idea of using head and tail commands to do that with a loop but looked not efficient.
Please advise the simple and yet effective way to do it.
TIA... (3 Replies)
Hello Gurus,
I have a text file containing nearly 12,000 tab delimited characters with 4000 rows. If the file size is small, excel can convert the text into coloumns. However, the file that I have is very big. Can some body help me in solving this problem?
The input file example,
... (6 Replies)
Hello,
Please help me. I have hundreds of text files composed of several rows of information and I need to separate each row into a new text file. I was trying to figure out how to split the text file into different text files, based on each row of text in the original text file. Here is an... (2 Replies)
I have a text file with irregular spacing between values which makes it really difficult to manipulate. Is there an easy way to convert it into a space delimited text file so that all the spaces, double spaces, triple spaces, tabs between numbers are converted into spaces. The file looks like this:... (5 Replies)
hi
i have a requirement to input a string to a shell script and to split the string to multiple fields,
the string is copied from a row of three columns (name,age,address) in an excel sheet.
the three columns (from excel) are seperated with a tab when pasted in the command prompt, but when the ... (2 Replies)
Hi, I have a rquirement in unix as below .
I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column.
ex:
Input Text file:
1|A|apple
2|B|bottle
excel file to be generated as output as... (9 Replies)
Hi,
I have a requirement that has 50-60 million records that we need to split a delimited string (Delimeter is newline) into rows.
Source Date:
SerialID UnidID GENRE
100 A11 AAAchar(10)BBB
200 B11 CCCchar(10)DDD(10)ZZZZ
Field 'GENRE' is a string with new line as delimeter and not sure... (5 Replies)
Hello,
I have some large text files that look like,
putrescine
Mrv1583 01041713302D
6 5 0 0 0 0 999 V2000
2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies
LEARN ABOUT FREEBSD
split
SPLIT(1) BSD General Commands Manual SPLIT(1)NAME
split -- split a file into pieces
SYNOPSIS
split -d [-l line_count] [-a suffix_length] [file [prefix]]
split -d -b byte_count[K|k|M|m|G|g] [-a suffix_length] [file [prefix]]
split -d -n chunk_count [-a suffix_length] [file [prefix]]
split -d -p pattern [-a suffix_length] [file [prefix]]
DESCRIPTION
The split utility reads the given file and breaks it up into files of 1000 lines each (if no options are specified), leaving the file
unchanged. If file is a single dash ('-') or absent, split reads from the standard input.
The options are as follows:
-a suffix_length
Use suffix_length letters to form the suffix of the file name.
-b byte_count[K|k|M|m|G|g]
Create split files byte_count bytes in length. If k or K is appended to the number, the file is split into byte_count kilobyte
pieces. If m or M is appended to the number, the file is split into byte_count megabyte pieces. If g or G is appended to the num-
ber, the file is split into byte_count gigabyte pieces.
-d Use a numeric suffix instead of a alphabetic suffix.
-l line_count
Create split files line_count lines in length.
-n chunk_count
Split file into chunk_count smaller files.
-p pattern
The file is split whenever an input line matches pattern, which is interpreted as an extended regular expression. The matching line
will be the first line of the next output file. This option is incompatible with the -b and -l options.
If additional arguments are specified, the first is used as the name of the input file which is to be split. If a second additional argument
is specified, it is used as a prefix for the names of the files into which the file is split. In this case, each file into which the file is
split is named by the prefix followed by a lexically ordered suffix using suffix_length characters in the range ``a-z''. If -a is not speci-
fied, two letters are used as the suffix.
If the prefix argument is not specified, the file is split into lexically ordered files named with the prefix ``x'' and with suffixes as
above.
ENVIRONMENT
The LANG, LC_ALL, LC_CTYPE and LC_COLLATE environment variables affect the execution of split as described in environ(7).
EXIT STATUS
The split utility exits 0 on success, and >0 if an error occurs.
SEE ALSO csplit(1), re_format(7)STANDARDS
The split utility conforms to IEEE Std 1003.1-2001 (``POSIX.1'').
HISTORY
A split command appeared in Version 3 AT&T UNIX.
BUGS
The maximum line length for matching patterns is 65536.
BSD May 9, 2013 BSD