on AIX you could try something like the below script.
Code:
cd /fol1/fol2
for file in *.dat
do
SIZE=$(istat "$file"| awk '/Length/ {print $(NF-1)}')
if [ $SIZE -gt 90000000000 ]
then
LINES=$(wc -l < "$file")
let AVG=SIZE/LINES
let SPLIT=5000000000/AVG
split -l $SPLIT -a1 --verbose "$file" "TT_$file"
fi
done
Note this reads the file twice (once to calculate the number of lines and once to split the file up), also using the average line size is quite inaccurate and shell integer arithmetic could make this even more hit-and-miss. You may be better off writing a tailor split command in awk.
This User Gave Thanks to Chubler_XL For This Post:
Thanks, I will try this now. Will let you know how it works. I will put this in a script and can call from my DataStage process. Let me write this as a script and then I will post the results here.
In the mean while if others have any other suggestions, please keep posting, I will try everything. This is really great.
Thanks again for every ones help.
---------- Post updated 11-23-14 at 02:06 AM ---------- Previous update was 11-22-14 at 07:38 PM ----------
Hi, this is what I have done.
I'm using a 3 GB file to test my process. But the script hangs after: "echo "Checking ${2} file size now:""
Not sure what to do, please correct me if I've done something wrong.
Here is the script:
Code:
#!/bin/bash
# usage:
# check for input:
if [ ! $# == 3 ]; then
echo "Input Parameter missing."
fi
#Main Logic Begins:
clear
#echo Input Parameters:
echo "**********************************************************************************************"
echo "Main Source file is located in: $1 \n"
echo "Currently processing file: $2 \n"
echo "All the split files will be located at: $3 \n"
echo "**********************************************************************************************"
#Check if Split file directory exists:
if [ -d "${3}" ];
then
echo "Split file directory Exist, So deleting Directory and its contents \n"
rm -rf ${3};
else
echo "No Split file directory present \n";
fi
# Create New directory to place split files
echo
echo "Create New directory to place split files. \n"
mkdir ${3}
chmod 777 ${3}
if [ -d "${3}" ];
then
echo "Split file directory created successfully \n"
echo "Split file directory Permission set to 777 \n"
else
echo "Split File Directory creation failed \n";
fi
# Check input file size:
echo "Checking ${2} file size now:"
for ifile in ${2}
do
ipsize=$(istat "$ifile" | awk '/Length/ {print $(NF-1)}')
echo "Total file size in Byetes: $ipsize \n"
if [ $ipsize -gt 1000000000 ]
then
lines=$(wc -l < "$iflie")
let avg=ipsize/lines
let splitcount=5000000000/avg
split -l $splitcount -a1 -verbose "$ifile" "${3}/TT_$2"
fi
done
echo "Total Row Count in ${2}: $lines \n"
echo "Average Row lenght in ${2}: $avg \n"
echo "Row count per split file is: $splitcount \n"
echo "Total split files and row counts \n"
wc -l ${3}/TT_$2*
---------- Post updated at 02:43 AM ---------- Previous update was at 02:06 AM ----------
Hi
I made some changes to the script, since the split command didn't work properly, now its working fine:
Code:
#!/bin/bash
# usage:
# sh ./[script] [inputfile] [row count]
# check for input:
if [ ! $# == 3 ]; then
echo "Input Parameter missing."
fi
#Main Logic Begins:
clear
#echo Input Parameters:
echo "**********************************************************************************************"
echo "Main Source file is located in: $1 \n"
echo "Currently processing file: $2 \n"
echo "All the split files will be located at: $3 \n"
echo "**********************************************************************************************"
#Check if Split file directory exists:
if [ -d "${3}" ];
then
echo "Split file directory Exist, So deleting Directory and its contents \n"
rm -rf ${3};
else
echo "No Split file directory present \n";
fi
# Create New directory to place split files
echo
echo "Create New directory to place split files. \n"
mkdir ${3}
chmod 777 ${3}
if [ -d "${3}" ];
then
echo "Split file directory created successfully \n"
echo "Split file directory Permission set to 777 \n"
else
echo "Split File Directory creation failed \n";
fi
# Check input file size:
echo "Checking ${2} file size now:"
for ifile in ${2}
do
ipsize=$(istat "$ifile" | awk '/Length/ {print $(NF-1)}')
echo "Total file size in Byetes: $ipsize \n"
if [ $ipsize -gt 1000000000 ]
then
lines=$(wc -l < "$ifile")
echo "Total Row Count in ${2}: $lines \n"
let avg=`expr ${ipsize} / ${lines}`
echo "Average Row lenght in ${2}: $avg \n"
let splitcount=1000000000/avg
echo "Row count per split file is: $splitcount \n"
split -l $splitcount "$ifile" "${3}/TT_$2"
#-a1 --verbose
echo "Total split files and row counts \n"
wc -l ${3}/TT_$2*
fi
done
and then I get the following results:
Code:
**********************************************************************************************
Main Source file is located in: /some/dir/path
Currently processing file: inputfile.dat
All the split files will be located at: /some/dir/path/splitdir
**********************************************************************************************
Split file directory Exist, So deleting Directory and its contents
Create New directory to place split files.
Split file directory created successfully
Split file directory Permission set to 777
Checking inputfile.dat file size now:
Total file size in Byetes: 3329056768
Total Row Count in inputfile.dat: 2684723
Average Row lenght in inputfile.dat: 1240
Row count per split file is: 806451
Total split files and row counts
806451 /some/dir/path/splitdir/TT_inputfile.dataa
806451 /some/dir/path/splitdir/TT_inputfile.datab
806451 /some/dir/path/splitdir/TT_inputfile.datac
265370 /some/dir/path/splitdir/TT_inputfile.datad
2684723 total
Can somebody help me how to add additional features like, log all the messages or steps, then if the file size is less than 1 GB, then I want to send a note that file size is less than 1GB and exit. Also when ever this script fails, I want to capture all the steps that were executed, and then send it in email.
Depending on which shell you use and what commands are installed on your system, and positional parameter 1 holding the file name, 2 the target work file size, you could use for items (in post #7)
Code:
2. - 5.: read LN CH << EOF
$(wc -lc < $1)
EOF
6. - 7.: split -l$(($2 *LN / CH)) -d $1 workfile
8.: echo $(($(stat -c"%s+" workfile*) 0))
10.: rename the workfiles when finished with them so the next iteration will pick up from where you left.
Hello All ,
I have a file which needs to split based on the blank lines
Name ABC
Address London
Age 32
(4 blank new line)
Name DEF
Address London
Age 30
(4 blank new line)
Name DEF
Address London (8 Replies)
Hi
i have requirement like below
M <form_name> sdasadasdMklkM
D ......
D .....
M form_name> sdasadasdMklkM
D ......
D .....
D ......
D .....
M form_name> sdasadasdMklkM
D ......
M form_name> sdasadasdMklkM
i want split file based on line number by finding... (10 Replies)
Hello Friends,
Can anyone help me for the below requirement.
I am having a file called Input.txt.
My requirement is first check the count that is wc -l input.txt
If the result of the wc -l Input.txt is less than 10 then don't split the Input.txt file. Where as if Input.txt >= 10 the split... (12 Replies)
Could anybody help with this?
I have input below .....
david,39
david,39
emelie,40
clarissa,22
bob,42
bob,42
tim,32
bob,39
david,38
emelie,47
what i want to do is count how many names there are with different ages, so output would be like this ....
david,2
emelie,2
clarissa,1... (3 Replies)
Dear users,
I need your support, I have a file like this:
272134.548 6680572.715
272134.545 6680572.711
272134.546 6680572.713
272134.548 6680572.706
272134.545 6680572.721
272134.543 6680572.710
272134.544 6680572.715
272134.543 6680572.705
272134.540 6680572.720
272134.544... (10 Replies)
Dear All,
I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...):
ATOM 1 N GLY A 1 -3.198 27.537 -5.958 1.00 0.00 N
ATOM 2 CA GLY A 1 -2.199 28.399 -6.617 1.00 0.00 ... (3 Replies)
Hello all.
Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need.
My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to... (11 Replies)
Hello,
if i have file like this:
010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010000890306946317387 05306977313623 0520080417010520ISMS SMT... (6 Replies)
Hi,
I'm, new to shell scripting, I have a requirement where I have to split an incoming file into separate files each containing a maximum of 3 million rows.
For e.g: if my incoming file say In.txt has 8 mn rows then I need to create 3 files, in which two will 3 mn rows and one will contain 2... (2 Replies)