Split a large file in n records and skip a particular record


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split a large file in n records and skip a particular record
# 15  
Old 12-01-2013
Thank you Don Cragun.
# 16  
Old 12-02-2013
Hello All,
I do appreciate all of your inputs, but again I have another little complicated thing to add.


My code was starting with something as below,
sed '1d;$d;' XXXXXX
I was deleting the header and tail from the huge file. But I need to add that header and tail to each file.

Any help is appreciated.
# 17  
Old 12-02-2013
In Don's solution add this

Code:
awk -v head="$(head -1 file)" '
function nf() {
        x = sprintf("F%02d", ++ofc)
        print head >x
        cnt = 0
}
.................
.................
.................
' file

# 18  
Old 12-02-2013
If that's a huge file, head -1 might not be the most efficient ansatz. Why not sth like
Code:
BEGIN {lpf=5000} NR==1 {head = $0; nf()}

For the tail, I'm not sure if it's more efficient to open the huge file "from the end" like tac does or to reopen all output files produced so far and append the tail (e.g. with echo >>).
# 19  
Old 12-02-2013
Thanks for the input guys.
# 20  
Old 12-02-2013
The head and tail utilities should be pretty efficient (and not read the entire file) to extract the 1st and last lines, respectively, of your input file, but if I had been given this set of requirements to start with, I would have done something more like:
Code:
#!/bin/ksh
IAm=${0##*/}
if [ $# -gt 2 ] || [ $# -lt 1 ]
then    printf "Usage: %s file [lines_per_file]\n" "$IAm" >&2
        exit 1
fi
file=${1}
lpf=${2:-5000}  # Set lines to be included in each output file.
awk -v lpf="${lpf}" '
function nf(fn) {
        x = sprintf("F%02d", fn)
}
NR == 1 {
        # Save header from 1st line.
        h = $0
        next
}
NR > 2 {if(cnt == 0) {
                # Get next output file and add the header line.
                nf(++ofc)
                print h > x
                cnt = 2 # Reserve space for trailer line to be added later.
        }
        # Add previous input line to current output file.
        print last > x
        cnt++
}
{       # Save current line.  Do not print it yet so we can skip the last line.
        # When we hit EOF, last will contain the trailer to be added to all of
        # the output files.
        last = $0
}
cnt >= lpf && ! /^ *3/ {
        # If we have a full file and current line does not start with a 3,
        # close current output file and clear output line count.
        close(x)
        cnt = 0
}
END {   # Add trailer line to all output files.
        while(ofc) {
                nf(ofc--)
                print last >> x
                close(x)
        }
}' "$file"

Depending on which version of awk you're using, you could comment out the first close(x) statement to avoid having to reopen the files as long as you don't run out of file descriptors. If you try it and you get a diagnostic about too many open files, put that close statement back in.
These 3 Users Gave Thanks to Don Cragun For This Post:
# 21  
Old 12-03-2013
Thanks a lot Don!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Trying To Split a Large File

Trying to split a 35gb file into 1000mb parts. My research shows I should you this. split -b 1000m file.txt and my return is "split: cannot open 'crunch1.txt' for reading: No such file or directory" so I tried split -b 1000m Documents/Wordlists/file.txt and I get nothing other than the curser just... (3 Replies)
Discussion started by: sub terra
3 Replies

2. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

3. Shell Programming and Scripting

How to split one record to multiple records?

Hi, I have one tab delimited file which is having multiple store_ids in first column seprated by pipe.I want to split the file on the basis of store_id(separating 1st record in to 2 records ). I tried some more options like below with using split,awk etc ,But not able to get proper output. can... (1 Reply)
Discussion started by: jaggy
1 Replies

4. UNIX for Dummies Questions & Answers

Using awk to skip record in file

I need to amend the code blow such that it reads a "black list" before the "print" statement; if "substr($1,1,6)" is found in the "blacklist" it will ignore that record and continue. the code is from an awk script that is being called from shell script which passes the input values. BEGIN { "date... (5 Replies)
Discussion started by: bazel
5 Replies

5. UNIX for Dummies Questions & Answers

Split single record to multiple records

Hi Friends, source .... col1,col2,col3 a,b,1;2;3 here colom delimeter is comma(,). here we dont know what is the max length of col3 means now we have 1;2;3 next time i will receive 1;2;3;4;5;etc... required output .............. col1,col2,col3 a,b,1 a,b,2 a,b,3 please give me... (5 Replies)
Discussion started by: bab.galary
5 Replies

6. Shell Programming and Scripting

How to delete 1 record in large file!

Hi All, I'm a newbie here, I'm just wondering on how to delete a single record in a large file in unix. ex. file1.txt is 1000 records nikki1 nikki2 nikki3 what i want to do is delete the nikki2 record in file1.txt. is it possible? Please advise, Thanks, (3 Replies)
Discussion started by: nikki1200
3 Replies

7. Shell Programming and Scripting

Split a single record to multiple records & add folder name to each line

Hi Gurus, I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name. I have a dir. in which... (20 Replies)
Discussion started by: ram2581
20 Replies

8. Shell Programming and Scripting

Split a large file

I have a 3 GB text file that I would like to split. How can I do this? It's a giant comma-separated list of numbers. I would like to make it into about 20 files of ~100 MB each, with a custom header and footer. The file can only be split on commas, but they're plentiful. Something like... (3 Replies)
Discussion started by: CRGreathouse
3 Replies

9. Shell Programming and Scripting

Split Large File

HI, i've to split a large file which inputs seems like : Input file name_file.txt 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00003|CCCC|MAIL|DATEOFBIRTHT|.......... (1 Reply)
Discussion started by: AMARA
1 Replies

10. Shell Programming and Scripting

Split A Large File

Hi, I have a large file(csv format) that I need to split into 2 files. The file looks something like Original_file.txt first name, family name, address a, b, c, d, e, f, and so on for over 100,00 lines I need to create two files from this one file. The condition is i need to ensure... (4 Replies)
Discussion started by: nbvcxzdz
4 Replies
Login or Register to Ask a Question