split file problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting split file problem
# 1  
Old 09-02-2009
split file problem

Hi All,

I have a parent file with 10 million records. I want to split the parent file in to child files.

Each child file contains 5000 records.

I am using the following command for splitting:

split -5000 parentfile.txt childfile.1

It will split the Parent file as childfile.1aa, childfile.1ab, ... childfile.1zz
It will split only 676 files (=> 3.38 million records only) .

I am not able to split the parentfile into childfile (More than 676 files) at single shot. Please provide your suggestions for splitting.


Regards
Hanuma
# 2  
Old 09-02-2009
"man split" it says "up to a maximum of 676 files" (26x26=676), but in some unixes you can have powers of 26 more.

If you don't have the "-a suffix_length" switch to "split" (which would fix the problem) and assuming you have unlimited disc space, I guess you could do two passes.

Split to 500 files of 20,000 lines.
Split each of the 500 files into 4 parts of 5,000 lines.
# 3  
Old 09-02-2009
Quote:
Originally Posted by methyl
"man split" it says "up to a maximum of 676 files" (26x26=676), but in some unixes you can have powers of 26 more.

If you don't have the "-a suffix_length" switch to "split" (which would fix the problem) and assuming you have unlimited disc space, I guess you could do two passes.

Split to 500 files of 20,000 lines.
Split each of the 500 files into 4 parts of 5,000 lines.
I think splitting 2 times is the double work & there is no manual activity also. script will run in the cronjob.
# 4  
Old 09-02-2009
What Operating System and version do you use?
Does your version of the "split" command have the "-a" switch? See "man split".

Splitting he files twice is indeed twice the work. Using shell script for serious data processing has its merits for quick development of one-off jobs and prototypes. For a regular job on this scale personally I would choose a high level language which has no issues in closing file descriptors mid-process.

I am intrigued why you would want to break 10,000,000 records into 5,000 record chunks? To my mind it just creates complications in volume processing.
# 5  
Old 09-02-2009
Quote:
Originally Posted by methyl
What Operating System and version do you use?
Does your version of the "split" command have the "-a" switch? See "man split".

Splitting he files twice is indeed twice the work. Using shell script for serious data processing has its merits for quick development of one-off jobs and prototypes. For a regular job on this scale personally I would choose a high level language which has no issues in closing file descriptors mid-process.

I am intrigued why you would want to break 10,000,000 records into 5,000 record chunks? To my mind it just creates complications in volume processing.
Other integration applications are not supported for the large size files. For that purpose we will split the large file into small once's and pass it.

We are using the SunSolaris5.10
# 6  
Old 09-02-2009
Please look at your "man split" and advise whether you have the "-a" switch available. This would allow you to extent the range of suffixes by more multiples of 26 .
# 7  
Old 09-03-2009
Quote:
Originally Posted by methyl
Please look at your "man split" and advise whether you have the "-a" switch available. This would allow you to extent the range of suffixes by more multiples of 26 .
Thanks for your support.
"-a" option will work successfully. But we are not able to fix the "-a suffixlength". Any dynamic updation of the suffixlength for every time.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

2. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

3. Shell Programming and Scripting

Split file based on file size in Korn script

I need to split a file if it is over 2GB in size (or any size), preferably split on the lines. I have figured out how to get the file size using awk, and I can split the file based on the number of lines (which I got with wc -l) but I can't figure out how to connect them together in the script. ... (6 Replies)
Discussion started by: ssemple2000
6 Replies

4. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

5. Shell Programming and Scripting

Split a file into multiple files based on first two digits of file.

Hi , I do have a fixedwidth flatfile that has data for 10 different datasets each identified by the first two digits in the flatfile. 01 in the first two digit position refers to Set A 02 in the first two digit position refers to Set B and so on I want to genrate 10 different files from my... (6 Replies)
Discussion started by: okkadu
6 Replies

6. Shell Programming and Scripting

Split File by Pattern with File Names in Source File... Awk?

Hi all, I'm pretty new to Shell scripting and I need some help to split a source text file into multiple files. The source has a row with pattern where the file needs to be split, and the pattern row also contains the file name of the destination for that specific piece. Here is an example: ... (2 Replies)
Discussion started by: cul8er
2 Replies

7. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

8. Shell Programming and Scripting

problem with awk for file split

Hi all, i have a .ksh script which is, among other stuff, splitting a file and saveing the filenames into variables for further processing: # file split before ftp and put result filenames into variables if ]; then awk '{close(f);f=$1}{sub("^","");print > f".TXT"}' $_ftpfile set B*.TXT... (0 Replies)
Discussion started by: spidermike
0 Replies

9. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies

10. UNIX for Dummies Questions & Answers

Problem in split command

I want to split a file containing millions of records. I am issuing the command split -l 20000 filename which will split the file in 20K records each. It works fine except in some files, data after one particular field is lost( the field with space). Say the record is ... (4 Replies)
Discussion started by: superprogrammer
4 Replies
Login or Register to Ask a Question