Sponsored Content
Top Forums Shell Programming and Scripting Split File based on number of rows Post 302926293 by kpk_ds on Sunday 23rd of November 2014 02:43:28 AM
Old 11-23-2014
Hi Chubler_XL

Thanks, I will try this now. Will let you know how it works. I will put this in a script and can call from my DataStage process. Let me write this as a script and then I will post the results here.

In the mean while if others have any other suggestions, please keep posting, I will try everything. This is really great.

Thanks again for every ones help.

---------- Post updated 11-23-14 at 02:06 AM ---------- Previous update was 11-22-14 at 07:38 PM ----------

Hi, this is what I have done.
I'm using a 3 GB file to test my process. But the script hangs after: "echo "Checking ${2} file size now:""

Not sure what to do, please correct me if I've done something wrong.
Here is the script:

Code:
 
#!/bin/bash
# usage:

# check for input:
if [ ! $# == 3 ]; then
  echo "Input Parameter missing."
fi
#Main Logic Begins:
clear
#echo Input Parameters:
echo "**********************************************************************************************"
echo "Main Source file is located in: $1 \n"
echo "Currently processing file: $2 \n"
echo "All the split files will be located at: $3 \n"
echo "**********************************************************************************************"
#Check if Split file directory exists:
if [ -d "${3}" ];
then
 
 echo "Split file directory Exist, So deleting Directory and its contents \n"
 rm -rf ${3};
else
 echo "No Split file directory present \n";
fi
 
# Create New directory to place split files
echo
echo "Create New directory to place split files. \n"
mkdir ${3}
chmod 777 ${3}
if [ -d "${3}" ];
then
 echo "Split file directory created successfully \n"
 echo "Split file directory Permission set to 777 \n"
else
 echo "Split File Directory creation failed \n";
fi
# Check input file size:
echo "Checking ${2} file size now:"
for ifile in ${2}
do
 ipsize=$(istat "$ifile" | awk '/Length/ {print $(NF-1)}')
 echo "Total file size in Byetes: $ipsize \n"
 if [ $ipsize -gt 1000000000 ]
 then
  lines=$(wc -l < "$iflie")
  let avg=ipsize/lines
  let splitcount=5000000000/avg
  split -l $splitcount -a1 -verbose "$ifile" "${3}/TT_$2"
 fi
done

echo "Total Row Count in ${2}: $lines \n"
echo "Average Row lenght in ${2}: $avg \n"
echo "Row count per split file is: $splitcount \n"
echo "Total split files and row counts \n"
wc -l ${3}/TT_$2*

---------- Post updated at 02:43 AM ---------- Previous update was at 02:06 AM ----------

Hi

I made some changes to the script, since the split command didn't work properly, now its working fine:

Code:
 
#!/bin/bash
# usage:
# sh ./[script] [inputfile] [row count]
# check for input:
if [ ! $# == 3 ]; then
  echo "Input Parameter missing."
fi
#Main Logic Begins:
clear
#echo Input Parameters:
echo "**********************************************************************************************"
echo "Main Source file is located in: $1 \n"
echo "Currently processing file: $2 \n"
echo "All the split files will be located at: $3 \n"
echo "**********************************************************************************************"
#Check if Split file directory exists:
if [ -d "${3}" ];
then
 
 echo "Split file directory Exist, So deleting Directory and its contents \n"
 rm -rf ${3};
else
 echo "No Split file directory present \n";
fi
 
# Create New directory to place split files
echo
echo "Create New directory to place split files. \n"
mkdir ${3}
chmod 777 ${3}
if [ -d "${3}" ];
then
 echo "Split file directory created successfully \n"
 echo "Split file directory Permission set to 777 \n"
else
 echo "Split File Directory creation failed \n";
fi
# Check input file size:
echo "Checking ${2} file size now:"
for ifile in ${2}
do
 ipsize=$(istat "$ifile" | awk '/Length/ {print $(NF-1)}')
 echo "Total file size in Byetes: $ipsize \n"
 
 if [ $ipsize -gt 1000000000 ]
 then
 
   lines=$(wc -l < "$ifile")
   echo "Total Row Count in ${2}: $lines \n"
 
  let avg=`expr ${ipsize} / ${lines}`
   echo "Average Row lenght in ${2}: $avg \n"  
 
  let splitcount=1000000000/avg
   echo "Row count per split file is: $splitcount \n"  
 
  split -l $splitcount "$ifile" "${3}/TT_$2"
  #-a1 --verbose 
 echo "Total split files and row counts \n"
 wc -l ${3}/TT_$2*

 fi
done

and then I get the following results:

Code:
**********************************************************************************************
Main Source file is located in: /some/dir/path
Currently processing file: inputfile.dat
All the split files will be located at: /some/dir/path/splitdir
**********************************************************************************************
Split file directory Exist, So deleting Directory and its contents

Create New directory to place split files.
Split file directory created successfully
Split file directory Permission set to 777
Checking inputfile.dat file size now:
Total file size in Byetes: 3329056768
Total Row Count in inputfile.dat:  2684723
Average Row lenght in inputfile.dat: 1240
Row count per split file is: 806451
Total split files and row counts
  806451 /some/dir/path/splitdir/TT_inputfile.dataa
  806451 /some/dir/path/splitdir/TT_inputfile.datab
  806451 /some/dir/path/splitdir/TT_inputfile.datac
  265370 /some/dir/path/splitdir/TT_inputfile.datad
 2684723 total

Can somebody help me how to add additional features like, log all the messages or steps, then if the file size is less than 1 GB, then I want to send a note that file size is less than 1GB and exit. Also when ever this script fails, I want to capture all the steps that were executed, and then send it in email.

thanks
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting file based on number of rows

Hi, I'm, new to shell scripting, I have a requirement where I have to split an incoming file into separate files each containing a maximum of 3 million rows. For e.g: if my incoming file say In.txt has 8 mn rows then I need to create 3 files, in which two will 3 mn rows and one will contain 2... (2 Replies)
Discussion started by: wahi80
2 Replies

2. Shell Programming and Scripting

split based on the number of characters

Hello, if i have file like this: 010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010000890306946317387 05306977313623 0520080417010520ISMS SMT... (6 Replies)
Discussion started by: chriss_58
6 Replies

3. Shell Programming and Scripting

Split File Based on Line Number Pattern

Hello all. Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need. My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to... (11 Replies)
Discussion started by: shankster
11 Replies

4. Shell Programming and Scripting

Split single file into multiple files based on the number in the column

Dear All, I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...): ATOM 1 N GLY A 1 -3.198 27.537 -5.958 1.00 0.00 N ATOM 2 CA GLY A 1 -2.199 28.399 -6.617 1.00 0.00 ... (3 Replies)
Discussion started by: tomasl
3 Replies

5. Shell Programming and Scripting

Average calculation based on number of rows

Dear users, I need your support, I have a file like this: 272134.548 6680572.715 272134.545 6680572.711 272134.546 6680572.713 272134.548 6680572.706 272134.545 6680572.721 272134.543 6680572.710 272134.544 6680572.715 272134.543 6680572.705 272134.540 6680572.720 272134.544... (10 Replies)
Discussion started by: Gery
10 Replies

6. UNIX for Dummies Questions & Answers

count number of rows based on other column values

Could anybody help with this? I have input below ..... david,39 david,39 emelie,40 clarissa,22 bob,42 bob,42 tim,32 bob,39 david,38 emelie,47 what i want to do is count how many names there are with different ages, so output would be like this .... david,2 emelie,2 clarissa,1... (3 Replies)
Discussion started by: itsme999
3 Replies

7. UNIX for Dummies Questions & Answers

Sum the rows number based on first field string value

Hi, I have a file like this one h1 4.70650E-04 4.70650E-04 4.70650E-04 h2 1.92912E-04 1.92912E-04 1.92912E-04 h3A 3.10160E-11 2.94562E-11 2.78458E-11 h4 0.00000E+00 0.00000E+00 0.00000E+00 h1 1.18164E-12 2.74150E-12 4.35187E-12 h1 7.60813E-01 7.60813E-01 7.60813E-01... (5 Replies)
Discussion started by: f_o_555
5 Replies

8. UNIX for Dummies Questions & Answers

Command to split the files based on the number of lines in it

Hello Friends, Can anyone help me for the below requirement. I am having a file called Input.txt. My requirement is first check the count that is wc -l input.txt If the result of the wc -l Input.txt is less than 10 then don't split the Input.txt file. Where as if Input.txt >= 10 the split... (12 Replies)
Discussion started by: malaya kumar
12 Replies

9. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies

10. UNIX for Dummies Questions & Answers

Split file based on number of blank lines

Hello All , I have a file which needs to split based on the blank lines Name ABC Address London Age 32 (4 blank new line) Name DEF Address London Age 30 (4 blank new line) Name DEF Address London (8 Replies)
Discussion started by: Pratik4891
8 Replies
All times are GMT -4. The time now is 12:15 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy