Sponsored Content
Top Forums Shell Programming and Scripting split file by delimiter with csplit Post 302680257 by drl on Wednesday 1st of August 2012 02:28:59 PM
Old 08-01-2012
Hi.

There is some preliminary coding to display my environment. The core of the solution is:
Code:
bunch input file into super lines -> split into separate files of n (=2 in this demo) -> expand the individual files

The bunching is done with an awk script, gather. Each "bunch" starts with (is separated by), RS=">", the Record Separator. For all the newlines in the bunch, replace the newlines with "=". That creates "super lines".

The split works on lines, so that it does not know that there are really many lines in a super line. For each group of n (=2) super lines, split will write to a new file, xaa, xab, etc.

For all those files xa*, replace "=" with newlines, and re-write the files.

Note that this is a demonstration of a technique. There are certainly other solutions.

Viola! ... cheers, drl
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

split string with multibyte delimiter

Hi, I need to split a string, either using awk or cut or basic unix commands (no programming) , with a multibyte charectar as a delimeter. Ex: abcd-efgh-ijkl split by -efgh- to get two segments abcd & ijkl Is it possible? Thanks A.H.S (1 Reply)
Discussion started by: azmathshaikh
1 Replies

2. UNIX for Dummies Questions & Answers

Split files using Csplit

I have an excel file with more than 65K records... Since excel does not take more than 65K records i wan to split the file and send it as two excel files... Could some help me how to use the csplit by specifiying the no of records (7 Replies)
Discussion started by: savitha
7 Replies

3. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies

4. Shell Programming and Scripting

How to split a string with no delimiter

Hi; I want to write a shell script that will split a string with no delimiter. Basically the script will read a line from a file. For example the line it read from the file contains: 99234523 These values are never the same but the length will always be 8. How do i split this... (8 Replies)
Discussion started by: saint34
8 Replies

5. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

6. Shell Programming and Scripting

Split file into multiple files using delimiter

Hi, I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file. http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html I have used the below code to arrange... (6 Replies)
Discussion started by: vel4ever
6 Replies

7. Shell Programming and Scripting

How to target certain delimiter to split text file?

Hi, all. I have an input file. I would like to generate 3 types of output files. Input: LG10_PM_map_19_LEnd_1000560 LG10_PM_map_6-1_27101856 LG10_PM_map_71_REnd_20597718 LG12_PM_map_5_chr_118419232 LG13_PM_map_121_24341052 LG14_PM_1a_456799 LG1_MM_scf_5a_opt_abc_9029993 ... (5 Replies)
Discussion started by: huiyee1
5 Replies

8. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

9. UNIX for Beginners Questions & Answers

Shell script to Split matrix file with delimiter into multiple files

I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like: ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3 AA;ax;ay;az;01;02;03;04;05;06;07;08;09 BB;bx;by;bz;03;05;33;44;15;26;27;08;09 I want to split this table in to multiple files: ... (1 Reply)
Discussion started by: trymega
1 Replies
split(1)						      General Commands Manual							  split(1)

NAME
split - split a file into pieces SYNOPSIS
line_count] suffix_length] [file [name]] nsuffix_length] [file [name]] Obsolescent n] [file [name]] DESCRIPTION
reads file and writes it in pieces (default 1000 lines) onto a set of output files. The name of the first output file is name with appended, and so on lexicographically, up to (only ASCII letters are used, a maximum of 676 files). If no output name is given, is the default. If no input file is given, or if is given instead, the standard input file is used. Options recognizes the following command-line options and arguments: The input file is split into pieces line_count lines in size. suffix_length letters are used to form the suffix of the output filenames. This option allows creation of more than 676 output files. The output file names created cannot exceed the maximum file name length allowed in the directory containing the files. The input file is split into pieces n bytes in size. The input file is split into pieces n x 1024 bytes in size. No space separates the n from the The input file is split into pieces n x 1048576 bytes in size. No space separates the n from the The input file is split into pieces n lines in size. This option is obsolescent and is equivalent to using the option. EXTERNAL INFLUENCES
Environment Variables determines the locale for the interpretation of text as single- and/or multi-byte characters. determines the language in which messages are displayed. If or is not specified in the environment or is set to the empty string, the value of is used as a default for each unspecified or empty variable. If is not specified or is set to the empty string, a default of "C" (see lang(5)) is used instead of If any internationalization variable contains an invalid setting, behaves as if all internationalization variables are set to "C". See environ(5). International Code Set Support Single- and multi-byte character code sets are supported. SEE ALSO
csplit(1). STANDARDS CONFORMANCE
split(1)
All times are GMT -4. The time now is 03:47 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy