Split a file into 10 different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split a file into 10 different files
# 1  
Old 12-19-2017
Split a file into 10 different files

OS : RHEL 6.7
Shell : bash

I have a text file with 5.97 million lines.

I want to split this big file into 12 different files (in sequential order) so that each file will contain roughly 500K lines. I tried the following awk command after googling. But, it just created 2 files (huge_data.txt11 and huge_data.txt12) from the source file.

Any idea how I can split the file into 12 different files?



Code:
$ wc -l huge_data.txt
5970387 huge_data.txt

$ awk -vLN=500000 '{print > ("huge_data.txt" 12-(NR>LN))}' huge_data.txt
$
$ ls -lh
total 6.5G
-rw-rw-r-- 1 appusr appusr 3.3G Dec 16 17:04 huge_data.txt
-rw-rw-r-- 1 appusr appusr 3.0G Dec 19 11:45 huge_data.txt11
-rw-rw-r-- 1 appusr appusr 276M Dec 19 11:45 huge_data.txt12
$
$
$ wc -l huge_data.txt11
5470387 huge_data.txt11
$
$ wc -l huge_data.txt12
500000 huge_data.txt12
$

# 2  
Old 12-19-2017
Hi,

I would advise that you look at the man pages for your system, you could try man split it's nearly always there.

To put it all back together look at man cat for starters.

Regards

Gull04
# 3  
Old 12-19-2017
I agree with gull04 that split is a better way to do this (without reinventing the wheel). If you must do it with awk, you might want to try something more like:
Code:
awk -v LN=500000 '
!((NR - 1) % LN) {
	if(NR > 1) close(f)
	f = sprintf("huge_data%03d.txt", 1 + int((NR - 1) / LN))
}
{	print > f
}' huge_data.txt

With the filenames generated by this script, you can split a file into up to 1000 files and easily process them in sequential order.

If someone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
# 4  
Old 12-19-2017
Hi,

Just a quick update, now that I have a bit of time.

Code:
-bash-3.2$ ls -l test*
-rw-r-----   1 e415243  other    4037689 Dec 19 14:35 test_file_01.txt
-bash-3.2$ wc -l test*
  139186 test_file_01.txt
-bash-3.2$ split -l 14000 test_file_01.txt
-bash-3.2$ ls x*
xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj
-bash-3.2$ cat xa* >> test_file_02.txt
-bash-3.2$ diff test_file_01.txt test_file_02.txt
-bash-3.2$

As you'll be able to see the file was split by lines and joined up using cat, adjust the values to suit.

Regards

Gull04
This User Gave Thanks to gull04 For This Post:
# 5  
Old 12-20-2017
Thank You Don, gull
For some reason, Clicking on 'Thanks' button is not getting reflected except for the last post by gull.
I am using google chrome, later, I will try from Firefox
# 6  
Old 12-21-2017
I appreciate your thanks even if the Thanks button isn't working.

It seems that some code that is used to apply Thanks is using some old code that has been deprecated and is starting to fail with newer revisions of some browsers. When the Thanks button disappears in a post, but your user name doesn't appear in the list of users that have said Thank You, sometimes you can copy the URL that was generated when you hit the Thanks button into a new tab in your browser and send it off and get it to apply your Thanks to that post.

The code for this site is being upgraded from PHP5.3.x to PHP7 (a long, tedious process). When that has been completed, everything should be working again (and running on a new faster server); but we don't have a completion date for that project yet.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split file into 20000 files

I want to split one files (>200000000 lines) into 20000 files, when I use spilt -l 23360 -d file it shows output file suffixes exhausted, seems the maximum numbers is 100.....how to solve it? (1 Reply)
Discussion started by: wanliushao
1 Replies

2. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

3. Shell Programming and Scripting

Split a file into multiple files based on first two digits of file.

Hi , I do have a fixedwidth flatfile that has data for 10 different datasets each identified by the first two digits in the flatfile. 01 in the first two digit position refers to Set A 02 in the first two digit position refers to Set B and so on I want to genrate 10 different files from my... (6 Replies)
Discussion started by: okkadu
6 Replies

4. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

5. UNIX for Advanced & Expert Users

Split a big file into two others files

Hello, i have a very big file that has more then 80 MBytes (100MBytes). So with my CVS Application I cannot commit this file (too Big) because it must have < 80 MBytes. How can I split this file into two others files, i think the AIX Unix command : split -b can do that, buit how is the right... (2 Replies)
Discussion started by: steiner
2 Replies

6. Shell Programming and Scripting

How to split a file into exactly two files by timestamp?

2009-10-29 03:39:11,720 INFO - Optimize cache for minimal puts: disabled 2009-10-29 03:39:11,720 INFO - Structured second-level cache entries: disabled 2009-10-29 03:39:22,687 WARN - Problem starting service jboss.web.deployment:war=dt-sp-fabric-delegate-ws-war-3.5.0.war,id=1483428821... (3 Replies)
Discussion started by: maheshshinde
3 Replies

7. Shell Programming and Scripting

split a file into many files

Hello, Here is another one. The file type is almost same, many lines and many fields. What I need to do is to extract each line of old file and make it a new file, and in the new file, the field1 will be file name and the rest of field will be transpose to line. Say, 1, field1 field2 ... (8 Replies)
Discussion started by: ssshen
8 Replies

8. UNIX for Dummies Questions & Answers

split a file into a specified number of files

I have been googling on the 'split' unix command to see if it can split a large file into 'n' number of files. Can anyone spare an example or a code snippet? Thanks, - CB (2 Replies)
Discussion started by: ChicagoBlues
2 Replies

9. Shell Programming and Scripting

Split A File Into 2 Files

i WANT TO SPLIT A FILE WHICH HAS 250 COLUMNS. and the delimiter is '|'. So , can somebody help me with the command i have to use to split the file into two files. thanks (7 Replies)
Discussion started by: dummy_needhelp
7 Replies

10. UNIX for Dummies Questions & Answers

Split a file into 2 or more files

Dear friends: I have a datafile contains 1 to 40 lines or i can be varied between 1 to 40. I want to split the datafile into smaller files: if the datafile has 40 lines or more, file1 contains line 1 to 12 file2 contains line 13 to 25 file3 contains line 26 to 28 file4 contains line 29... (4 Replies)
Discussion started by: bobo
4 Replies
Login or Register to Ask a Question