Visit The New, Modern Unix Linux Community


Split a large file in n records and skip a particular record


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split a large file in n records and skip a particular record
# 1  
Split a large file in n records and skip a particular record

Hello All,
I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using
Code:
sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'

Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record starts with "3" i.e. I need to split the file by further reading it until I encounter a record that does not start with 3.

Below is the sample of the huge file.
Code:
  100000035900015300007538   172359500000000000AA000000000Y000000000Y00
  100000035900015300007538   172359500000000000AA000000000Y000000000Y00
  100000035900015300007538   1166231200000000000AA000000000Y000000000Y00
  200000035900015300007538   11029684830A   000000000Y000000000Y01YA 
  200000035900015300007538   0127862850000000000000Y000000000Y00YY 
  200000035900015300007538   01282938700000000000AA000000000Y000000000Y00    
  300000035900015300007538   01282938701025828658A   000000000Y000000000Y01   
  300000035900015300007538   1282938700000000000AA000000000Y000000000Y00
  300000035900015300007538   1282938703028860515A   000000000Y000000000Y03   
  100000035900015300007538   172359500000000000AA000000000Y000000000Y00Y     
  100000035900015300007538   172359500000000000AA000000000Y000000000Y00Y        
  200000035900015300007538   1166231201029684830A   000000000Y000000000Y01YA 
  200000035900015300007538   01278628500000000000AA000000000Y000000000Y00YY

Any help is much appreciated.

Last edited by ibmtech; 11-27-2013 at 05:25 PM..
# 2  
Try this, note I've added a close statement as if you get larger files you may run out of open file handles (depending on your OS and awk version):

Code:
sed '1d;$d;' <filename> | awk 'NR%5000==1{N++} N&&!/^\s*3/{if(x) close(x);x="F"++i;N=0}{print > x}'


Last edited by Chubler_XL; 11-27-2013 at 08:50 PM.. Reason: Updated to support blanks at front of record
This User Gave Thanks to Chubler_XL For This Post:
# 3  
Thanks 'Chubler',
When I put my actual file name and try to run, I am getting the below.
Code:
sed '1d;$d;' fiscal13 | awk 'NR%5==1{N++} N&&!/^\s*3/{if(x) close(x);x="Fa"++i;N=0}{print > x}'
awk: 0602-576 A print or getline function must have a file name.
 The input line number is 1.
 The source line number is 1.


Any help is much appreciated!
# 4  
Try nawk.
# 5  
Is nawk for AIX? I think its for Solaris.

FYI, I am using AIX. (7.1).

Thanks,
# 6  
nawk is available on many systems, but I think I've spotted the error now:

Code:
awk 'BEGIN{x="F"++i } NR%5==1{N++} N&&!/^\s*3/{if(x) close(x);x="Fa"++i;N=0}{print > x}'

This User Gave Thanks to Corona688 For This Post:
# 7  
Try :

Code:
$ awk 'NR==1 || NR % 5000 == 1 && !/^\s*3/{close(f);f="File_"++i".tmp"}{print >f}' file

Let me know if I missed something.

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #689
Difficulty: Medium
The HP-35, the world's first handheld scientific calculator, introduced the classical two-level RPN in 1972.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Trying To Split a Large File

Trying to split a 35gb file into 1000mb parts. My research shows I should you this. split -b 1000m file.txt and my return is "split: cannot open 'crunch1.txt' for reading: No such file or directory" so I tried split -b 1000m Documents/Wordlists/file.txt and I get nothing other than the curser just... (3 Replies)
Discussion started by: sub terra
3 Replies

2. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

3. Shell Programming and Scripting

How to split one record to multiple records?

Hi, I have one tab delimited file which is having multiple store_ids in first column seprated by pipe.I want to split the file on the basis of store_id(separating 1st record in to 2 records ). I tried some more options like below with using split,awk etc ,But not able to get proper output. can... (1 Reply)
Discussion started by: jaggy
1 Replies

4. UNIX for Dummies Questions & Answers

Using awk to skip record in file

I need to amend the code blow such that it reads a "black list" before the "print" statement; if "substr($1,1,6)" is found in the "blacklist" it will ignore that record and continue. the code is from an awk script that is being called from shell script which passes the input values. BEGIN { "date... (5 Replies)
Discussion started by: bazel
5 Replies

5. UNIX for Dummies Questions & Answers

Split single record to multiple records

Hi Friends, source .... col1,col2,col3 a,b,1;2;3 here colom delimeter is comma(,). here we dont know what is the max length of col3 means now we have 1;2;3 next time i will receive 1;2;3;4;5;etc... required output .............. col1,col2,col3 a,b,1 a,b,2 a,b,3 please give me... (5 Replies)
Discussion started by: bab.galary
5 Replies

6. Shell Programming and Scripting

How to delete 1 record in large file!

Hi All, I'm a newbie here, I'm just wondering on how to delete a single record in a large file in unix. ex. file1.txt is 1000 records nikki1 nikki2 nikki3 what i want to do is delete the nikki2 record in file1.txt. is it possible? Please advise, Thanks, (3 Replies)
Discussion started by: nikki1200
3 Replies

7. Shell Programming and Scripting

Split a single record to multiple records & add folder name to each line

Hi Gurus, I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name. I have a dir. in which... (20 Replies)
Discussion started by: ram2581
20 Replies

8. Shell Programming and Scripting

Split a large file

I have a 3 GB text file that I would like to split. How can I do this? It's a giant comma-separated list of numbers. I would like to make it into about 20 files of ~100 MB each, with a custom header and footer. The file can only be split on commas, but they're plentiful. Something like... (3 Replies)
Discussion started by: CRGreathouse
3 Replies

9. Shell Programming and Scripting

Split Large File

HI, i've to split a large file which inputs seems like : Input file name_file.txt 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00003|CCCC|MAIL|DATEOFBIRTHT|.......... (1 Reply)
Discussion started by: AMARA
1 Replies

10. Shell Programming and Scripting

Split A Large File

Hi, I have a large file(csv format) that I need to split into 2 files. The file looks something like Original_file.txt first name, family name, address a, b, c, d, e, f, and so on for over 100,00 lines I need to create two files from this one file. The condition is i need to ensure... (4 Replies)
Discussion started by: nbvcxzdz
4 Replies

Featured Tech Videos