How to split large file with different record delimiter?


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users How to split large file with different record delimiter?
# 1  
Old 01-29-2017
How to split large file with different record delimiter?

Hi,
I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues.

If the record delimiter is unix new line, I could use split command either with option l or b.

The problem is that the line terminator is |##|
How to use split command in this regards or do we have any other alternative?
Thanks
# 2  
Old 01-29-2017
If you have GNU awk or mawk, you could try first converting it to regular newline termination like so:
Code:
gawk 1 RS='[|]##[|]' file > newfile

Perhaps that would solve your memory issues as well.
# 3  
Old 01-29-2017
Hi, My data has unix new line terminators in the data lelvel. I should not convert the line terminator. I have to split the file into equal parts with out line terminator conversion
# 4  
Old 01-29-2017
Could you post a sample of your input file?
# 5  
Old 01-30-2017
Code:
awk 'NR==FNR {records=NR; next}
FNR==1 {
   Split=(records % Split) ? (int(records/Split)+1) : (records/Split);
   split_file="split_file1";
}
{
 printf $0 RS > split_file;
 if (! (FNR % Split)) {
   if (split_file) close(split_file);
   split_file="split_file" 1 + ++file_c;
 }
}
' RS="[|]##[|]" datafile Split=4 datafile

# 6  
Old 01-31-2017
Quote:
Originally Posted by Scrutinizer
Could you post a sample of your input file?
Here is format of the file.

| - field delimiter
|##| - record delimiter

Code:
100|COL1|COL2|COL3|##|200|COL1|COL2|COL3|##|300|COL1|COL2|COL3|##|400|COL1|COL2|COL3|##|500|COL1|COL2|COL3

Thanks



Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 01-31-2017 at 06:22 AM.. Reason: Added CODE tags.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to check record delimiter of a file ?

My requirment is for every record of a particular file I've to check for a record delimeter (e.g. "\n") and if any row doesn't have "\n" then report it in error file . Please suggest me to go through this. (4 Replies)
Discussion started by: manab86
4 Replies

2. Shell Programming and Scripting

How to target certain delimiter to split text file?

Hi, all. I have an input file. I would like to generate 3 types of output files. Input: LG10_PM_map_19_LEnd_1000560 LG10_PM_map_6-1_27101856 LG10_PM_map_71_REnd_20597718 LG12_PM_map_5_chr_118419232 LG13_PM_map_121_24341052 LG14_PM_1a_456799 LG1_MM_scf_5a_opt_abc_9029993 ... (5 Replies)
Discussion started by: huiyee1
5 Replies

3. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies

4. Shell Programming and Scripting

Split file into multiple files using delimiter

Hi, I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file. http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html I have used the below code to arrange... (6 Replies)
Discussion started by: vel4ever
6 Replies

5. Shell Programming and Scripting

split file by delimiter with csplit

Hello, I want to split a big file into smaller ones with certain "counts". I am aware this type of job has been asked quite often, but I posted again when I came to csplit, which may be simpler to solve the problem. Input file (fasta format): >seq1 agtcagtc agtcagtc ag >seq2 agtcagtcagtc... (8 Replies)
Discussion started by: yifangt
8 Replies

6. Shell Programming and Scripting

How to delete 1 record in large file!

Hi All, I'm a newbie here, I'm just wondering on how to delete a single record in a large file in unix. ex. file1.txt is 1000 records nikki1 nikki2 nikki3 what i want to do is delete the nikki2 record in file1.txt. is it possible? Please advise, Thanks, (3 Replies)
Discussion started by: nikki1200
3 Replies

7. Shell Programming and Scripting

split record based on delimiter

Hi, My inputfile contains field separaer is ^. 12^inms^ 13^fakdks^ssk^s3 23^avsd^ 13^fakdks^ssk^a4 I wanted to print only 2 delimiter occurence i.e 12^inms^ 23^avsd^ (4 Replies)
Discussion started by: Jairaj
4 Replies

8. Shell Programming and Scripting

Split Large File

HI, i've to split a large file which inputs seems like : Input file name_file.txt 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00003|CCCC|MAIL|DATEOFBIRTHT|.......... (1 Reply)
Discussion started by: AMARA
1 Replies

9. Shell Programming and Scripting

Pivot variable record length file and change delimiter

Hi experts. I got a file (500mb max) and need to pivot it (loading into ORCL) and change BLANK delimiter to PIPE |. Sometimes there are multipel BLANKS (as a particular value may be BLANK, or simply two BLANKS instead of one BLANK). thanks for your input! Cheers, Layout... (3 Replies)
Discussion started by: thomasr
3 Replies

10. Shell Programming and Scripting

Split A Large File

Hi, I have a large file(csv format) that I need to split into 2 files. The file looks something like Original_file.txt first name, family name, address a, b, c, d, e, f, and so on for over 100,00 lines I need to create two files from this one file. The condition is i need to ensure... (4 Replies)
Discussion started by: nbvcxzdz
4 Replies
Login or Register to Ask a Question