The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
how to split a file aarif UNIX for Dummies Questions & Answers 2 03-01-2008 03:36 PM
Split files using Csplit savitha UNIX for Dummies Questions & Answers 7 12-01-2007 11:55 AM
Split file mpang_ Shell Programming and Scripting 3 09-12-2006 08:37 PM
Split a file Reza Nazarian UNIX for Dummies Questions & Answers 1 08-09-2006 06:01 AM
multiple pattern split in perl umen Shell Programming and Scripting 3 08-01-2006 02:43 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 12-13-2007
madhunk madhunk is offline
Registered User
  
 

Join Date: Nov 2005
Posts: 91
Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this.

For example:

Code:
split -l 3000000 filename.txt
This is very slow and it splits the file with 3 million records in each file. But I would like to give the number of files as a parameter and output the user defined file names and not xaa, xab and so on.

I am also trying awk and I know it will be very fast and simple. I read the forum and they are all splitting the files on a specific pattern and I don't require any pattern.

Please give me your input on this..
  #2 (permalink)  
Old 12-13-2007
Smiling Dragon's Avatar
Smiling Dragon Smiling Dragon is offline Forum Advisor  
Disorganised User
  
 

Join Date: Nov 2007
Location: New Zealand
Posts: 921
I would have thought dd would be a more appropriate choice for this?
  #3 (permalink)  
Old 12-14-2007
madhunk madhunk is offline
Registered User
  
 

Join Date: Nov 2005
Posts: 91
If you can recommend a fast way like awk, that would be very much appreciated. The split taking up a lot of time.
  #4 (permalink)  
Old 12-14-2007
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
  
 

Join Date: Feb 2004
Location: NM
Posts: 5,643
If disk i/o is not making split "too slow" then try awk. But you should consider that a big I/O request queue length on that filesystem is a likely candidate for slow splitting, rather than split being a bad performer.
awk version of split:
Code:
awk ' {
          if(NR<300000) { print $0 > "smallfile1"}
          if (NR>300000 && NR < 600000) { print $0 > "smallfile2" }
          if (NR>60000) {print $0 > "smallfile3" }
       }'  bigfile
  #5 (permalink)  
Old 12-14-2007
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
addict
  
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,794
Another approach - you can pass multiple arguments and control the filenames:

Code:
awk 'FNR == 1 { c = 1 }
{ close(FILENAME c-1)
	print > (FILENAME (!(FNR%30000000) ? ++c : c))
}'  file_1 file_2 ... file_n
or:

Code:
awk 'FNR == 1 { c = 1 }
	      { print > (FILENAME c) }
!FNR%30000000 { close(FILENAME c); ++c }
' file_1 file_2 ... file_n

Use nawk or /usr/xpg4/bin/awk on Solaris.

Last edited by radoulov; 12-14-2007 at 08:32 PM..
  #6 (permalink)  
Old 12-17-2007
madhunk madhunk is offline
Registered User
  
 

Join Date: Nov 2005
Posts: 91
Thank you Radoulov...When I ran your code, it is saying file1, file2 or file3 is not found. It seems like the code is assuming that those are the input files. However, Jim's code is working fine.

The whole environment is on Windows. But I am using MKS Tool kit and invoking bash shell to execute awk. Never worked on Windows before and it is not quite nice..
  #7 (permalink)  
Old 12-17-2007
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
addict
  
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,794
Quote:
Originally Posted by madhunk View Post
Thank you Radoulov...When I ran your code, it is saying file1, file2 or file3 is not found. It seems like the code is assuming that those are the input files. However, Jim's code is working fine.

The whole environment is on Windows. But I am using MKS Tool kit and invoking bash shell to execute awk. Never worked on Windows before and it is not quite nice..
Sorry,
just realized I misread your question
(you don't want to pass multiple input files).
Sponsored Links
Closed Thread

Bookmarks

Tags
linux

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 09:57 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language translation by Google.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0