The problem is that all the conditions must be true to change filename so if record 5000 starts with "3" its not tested again on record 5001.
My solution failed because record 1 had "3" so no starting filename was set.
My understanding Corona's code
1.It sets x in beginning
2.when NR becomes 6 remainder will be 1 and N will be incremented
3. if N is set and line doesn't start with 3 is true, it checks whether x is set or not, if x is set close x, increment i x will be the new file, reset N.
4. Last write line to file x
My code
1. NR == 1 , close f, since f is not set, no effect on close(f), increment i thats 1 and f will be the name of file., instead of BEGIN block I used NR==1
2. when NR becomes 5001 remainder will be 1, and check whether line starts with digit 3 if not close f, increment i, file name will be changed
3. write line to file f
My understanding Corona's code
1.It sets x in beginning
2.when NR becomes 6 remainder will be 1 and N will be incremented
3. if N is set and line doesn't start with 3 is true, it checks whether x is set or not, if x is set close x, increment i x will be the new file, reset N.
4. Last write line to file x
My code
1. NR == 1 , close f, since f is not set, no effect on close(f), increment i thats 1 and f will be the name of file., instead of BEGIN block I used NR==1
2. when NR becomes 5001 remainder will be 1, and check whether line starts with digit 3 if not close f, increment i, file name will be changed
3. write line to file f
let me know if my understanding is wrong.
Hi Akshay,
Yes, you set x in the beginning; that isn't the problem. The problem is that if line (5000 * x) + 1 starts with a 3 you won't attempt to switch files until you have added another 5000 lines to the file. The request is to print 5000 lines per file but add single lines to a file such that the 1st line in an output file will never start with a 3 (with the possible exception of the first file).
Another (more complicated, but more efficient) way to do this is:
I use the Korn shell, but any shell that recognizes basic Bourne shell syntax will also work for this script.
This script is more efficient because it only reads the input file once. Rather than using sed to delete the 1st and last line and awk to split the remaining lines, this script just uses awk to skip the 1st and last lines and split the other lines.
It also uses Fxx as the output file name format in case the input is a little more than 50000 lines which would produce F1, F2, ... F10. Using two digits means that the output file names will sort in sequence instead of having to worry about special handling for F1, F10, F2, F3, ... F9.
If you name this script tester, make it executable, and invoke it as follows:
it should split the submitter's real input file into approximately 5000 line chunks.
If the test input file is named file and you invoke the script as follows:
it will produces 3 files; F01 containing: F02containing:
and F03 containing:
(The lpf=3 operand overrides the default 5000 lines per file setting set in the BEGIN clause.) Note that F02 contains 5 lines instead of 3 to avoid splitting files in the middle of a multi-line record (assuming that a line starting with a 3 is some kind of continuation line in a multi-line record) but 5 is not a multiple of 3.
These 2 Users Gave Thanks to Don Cragun For This Post:
Trying to split a 35gb file into 1000mb parts. My research shows I should you this. split -b 1000m file.txt and my return is "split: cannot open 'crunch1.txt' for reading: No such file or directory" so I tried split -b 1000m Documents/Wordlists/file.txt and I get nothing other than the curser just... (3 Replies)
Hi,
I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues.
If the record delimiter is unix new line, I could use split command either with option l or b.
The problem is that the line terminator is |##|
How to use... (5 Replies)
Hi,
I have one tab delimited file which is having multiple store_ids in first column seprated by pipe.I want to split the file on the basis of store_id(separating 1st record in to 2 records ).
I tried some more options like below with using split,awk etc ,But not able to get proper output. can... (1 Reply)
I need to amend the code blow such that it reads a "black list" before the "print" statement; if "substr($1,1,6)" is found in the "blacklist" it will ignore that record and continue. the code is from an awk script that is being called from shell script which passes the input values.
BEGIN { "date... (5 Replies)
Hi Friends,
source
....
col1,col2,col3
a,b,1;2;3
here colom delimeter is comma(,).
here we dont know what is the max length of col3 means now we have 1;2;3 next time i will receive 1;2;3;4;5;etc...
required output
..............
col1,col2,col3
a,b,1
a,b,2
a,b,3
please give me... (5 Replies)
Hi All,
I'm a newbie here, I'm just wondering on how to delete a single record in a large file in unix.
ex.
file1.txt is 1000 records
nikki1
nikki2
nikki3
what i want to do is delete the nikki2 record in file1.txt. is it possible?
Please advise,
Thanks, (3 Replies)
Hi Gurus,
I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name.
I have a dir. in which... (20 Replies)
I have a 3 GB text file that I would like to split. How can I do this?
It's a giant comma-separated list of numbers. I would like to make it into about 20 files of ~100 MB each, with a custom header and footer. The file can only be split on commas, but they're plentiful.
Something like... (3 Replies)
HI,
i've to split a large file which inputs seems like :
Input file name_file.txt
00001|AAAA|MAIL|DATEOFBIRTHT|.......
00001|AAAA|MAIL|DATEOFBIRTHT|.......
00002|BBBB|MAIL|DATEOFBIRTHT|.......
00002|BBBB|MAIL|DATEOFBIRTHT|.......
00003|CCCC|MAIL|DATEOFBIRTHT|.......... (1 Reply)
Hi,
I have a large file(csv format) that I need to split into 2 files. The file looks something like
Original_file.txt
first name, family name, address
a, b, c,
d, e, f,
and so on for over 100,00 lines
I need to create two files from this one file. The condition is i need to ensure... (4 Replies)