Breaking large file into small files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Breaking large file into small files
# 8  
Old 03-06-2015
Hello emily,

Not sure about your complete requirement, could you please try following and let me know if this helps.
Code:
echo $3 | awk '{FILENAME=$3"_"int((NR-1)/200)".txt";print >> FILENAME}'

You can replace this command with the shown one.


Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 9  
Old 03-06-2015
The awk variable FILENAME is provided by awk and contains the name of the input file that is currently being processed. Redefining it is not a good idea. Try something like this instead:
Code:
awk '{outfile=FILENAME int((NR-1)/200) ".txt";print >> outfile}' $3

Note, however, that both your script and the above script consume a file descriptor for each output file created and don't free any file descriptors until awk exits. If you need to create several files, you may have to close files when you're done writing to them to avoid a "too many open files" error. Even if you don't "have to", it is usually a good habit to close files you no longer need open. And, if you have a lot of files with numbers in them that might be more than one digit, you may want to add some leading zeros so the files will appear in numeric order when output by ls...
Code:
awk '
BEGIN {	outfile = sprintf("%s%03d.txt", FILENAME, 0))
}
{	print > outfile
}
(NR % 200) == 0 {
	close(outfile)
	outfile = sprintf("%s%03d.txt", FILENAME, int(NR/200))
}' $3

And, just out of curiosity, why does your script bother defining:
Code:
PATHNAME=$1
CONSTANT=rfio:
GREP=$2
OUTPUT=$3

when none of them are ever referenced in your script?

Note that I also changed the print >> outfile to print > outfile. If you ever need to update the split files due to an update in a base file, you will want to overwrite the old files instead of append to the en of them. (Note, however, that this won't remove any trailing files that may no longer be needed if your updated base file is smaller than it was before.) If that is a concern, you could add a line to your script before invoking awk:
Code:
# Remove any earlier versions of the split output files.
rm -f ${3}[0-9][0-9][0-9].txt

This User Gave Thanks to Don Cragun For This Post:
# 10  
Old 03-06-2015
Hello Ravinder and Don,
Here is my modified script [1] and the output. Why I am getting filename like:
Code:
-rw-r--r-- 1 emily af-cms  54400  6. Mr 11:18 _0.txt



[1]
Code:
#!/bin/bash                                                                                                                  

OUTPUT=InputFile_
GREP=root
EOSPATH="srm://dcache-se-cms.desy.de:8443SingleMu
onGun/SingleMuMinus_Fall14_FlatPt-0to200_MCRUN2_72_V3_GEN_SIM_DIGI_RECO_L1/150127_084421/"
FILEPATH[1]=$EOSPATH/0001
FILEPATH[2]=$EOSPATH/0002
#FILEPATH[3]=$EOSPATH/0003                                                                                                   
#FILEPATH[4]=$EOSPATH/0004                                                                                                   

## copy the FileName from eos to $3                                                                                          
for FileNameIndx in "${FILEPATH[@]}"
  do
    if [[ ! -e "dest_path/$FileNameIndx" ]]; then
        echo "Copying fileName \"$FileNameIndx  | grep root\" to $OUTPUT"
        Index=$(echo $FileNameIndx | awk '{split($FileNameIndx, a, "000"); print "000"a[2]}')
        srmls $FileNameIndx --count 99999 --offset 2 | grep $GREP | awk -F'tier2' '{print string path $GREP}' string="" path\
=""  > $OUTPUT$Index
        FINALFILE=$OUTPUT$Index
        echo $FINALFILE
        echo "progressing ... please be patient..."

        awk '                                                                                                                
        BEGIN {outfile = sprintf("%s_%01d.txt", FILENAME, 0)                                                                 
}                                                                                                                            
{print > outfile                                                                                                             
}                                                                                                                            
(NR % 200) == 0 {                                                                                                            
close(outfile)                                                                                                               
outfile = sprintf("%s_%01d.txt", FILENAME, int(NR/200))                                                                      
}'  $FINALFILE

    fi
done

It is working, but giving the output like:
Code:
-rwxr-xr-x 1 emily af-cms   1820  6. Mr 11:18 copyTextFromCastor.sh
-rw-r--r-- 1 emily af-cms 271184  6. Mr 11:18 InputFile_0001
-rw-r--r-- 1 emily af-cms  54400  6. Mr 11:18 InputFile_0001_1.txt
-rw-r--r-- 1 emily af-cms  54400  6. Mr 11:18 InputFile_0001_2.txt
-rw-r--r-- 1 emily af-cms  54400  6. Mr 11:18 InputFile_0001_3.txt
-rw-r--r-- 1 emily af-cms  53584  6. Mr 11:18 InputFile_0001_4.txt
-rw-r--r-- 1 emily af-cms 271456  6. Mr 11:18 InputFile_0002
-rw-r--r-- 1 emily af-cms  54400  6. Mr 11:18 _0.txt
-rw-r--r-- 1 emily af-cms  54400  6. Mr 11:18 InputFile_0002_1.txt
-rw-r--r-- 1 emily af-cms  54400  6. Mr 11:18 InputFile_0002_2.txt
-rw-r--r-- 1 emily af-cms  54400  6. Mr 11:18 InputFile_0002_3.txt
-rw-r--r-- 1 emily af-cms  53856  6. Mr 11:18 InputFile_0002_4.txt


Last edited by emily; 03-06-2015 at 06:32 AM..
# 11  
Old 03-06-2015
Sorry. My mistake. FILENAME isn't defined yet in the BEGIN clause...

Change:
Code:
        BEGIN {outfile = sprintf("%s_%01d.txt", FILENAME, 0)

to:
Code:
        NR==1 {outfile = sprintf("%s_%01d.txt", FILENAME, 0)

This User Gave Thanks to Don Cragun For This Post:
# 12  
Old 03-06-2015
In awk, FILENAME is only defined after the first file has been opened, which is after the BEGIN section has been finished. Within the BEGIN section FILENAME is empty.
This User Gave Thanks to RudiC For This Post:
# 13  
Old 03-06-2015
working fine..Smilie

thanks everyone for your useful suggestions

Last edited by emily; 03-06-2015 at 07:38 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split large file into 24 small files on one hour basis

I Have a large file with 24hrs log in the below format.i need to split the large file in to 24 small files on one hour based.i.e ex:from 09:55 to 10:55,10:55-11:55 can any one help me on this.! ... (20 Replies)
Discussion started by: Raghuram717
20 Replies

2. Shell Programming and Scripting

Split a large array into small chunks

Hi, I need to split a large array "@sharedArray" into 10 small arrays. The arrays should be like @sharedArray1,@sharedArray2,@sharedArray3...so on.. Can anyone help me with the logic to do so :(:confused: (6 Replies)
Discussion started by: rkrish
6 Replies

3. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies

4. UNIX for Advanced & Expert Users

Splitting a file into small files

Hi Folks, Please help me in solving the problem. I want to write script in order to split a file into small pieces and send it automatically through mail. Ex. The file name is CALM*.txt . It is around 50 MB. I want to split the file into 20 MB 2-3 smaller files and send (like uuencode) it... (6 Replies)
Discussion started by: piyushbhashkar
6 Replies

5. Shell Programming and Scripting

Breaking the files as 10k recs. per file

Hi, I have a code as given below Set -A _Category="A\ B\ C" for _cat in ${_Category} do sed -e "s:<TABLE_NAME>:${_cat}:g" \ -e "s:<date>:${_dt}:g" \ ${_home}/skl/sq1.sql >> ${_dest}/del_${_dt}.sql fi ... (4 Replies)
Discussion started by: mr_manii
4 Replies

6. Shell Programming and Scripting

Breaking one file into many files based on first column?

Hi, I have a file that looks like this (tab deliminited). MAT1 YKR2 3 MAT1 YMR1 2 MAT1 YFG2 2 MAT2 YLM4 4 MAT2 YHL2 1 BAR1 YKR2 3 BAR1 YFR1 4 BAR1 YMR1 1 What I want to do is break this file down into multiple files. So the result will look like this: File 1... (2 Replies)
Discussion started by: kylle345
2 Replies

7. Shell Programming and Scripting

script to splite large file to number of small files

Dear All, Could you please help me to split a file contain around 240,000,000 line to 4 files all equally likely , note that we need to maintain that the end of each file should started by start flage (MSISDN) and ended by end flag (End), also the number of the line between the... (10 Replies)
Discussion started by: ahmed.gad
10 Replies

8. Shell Programming and Scripting

Split large file and add header and footer to each small files

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (7 Replies)
Discussion started by: ashish4422
7 Replies

9. Shell Programming and Scripting

Split a file into 16 small files

Hi I want to split a file that has 'n' number of records into 16 small files. Can some one suggest me how to do this using Unix script? Thanks rrkk (10 Replies)
Discussion started by: rrkks
10 Replies

10. Shell Programming and Scripting

Splitting large file into small files

Hi, I need to split a large file into small files based on a string. At different palces in the large I have the string ^Job. I need to split the file into different files starting from ^Job to the last character before the next ^Job. Also all the small files should be automatically named.... (4 Replies)
Discussion started by: dncs
4 Replies
Login or Register to Ask a Question