hi all,
I was able to do a script to gather a few files and sort them.
here it is:
my problems:
- when lines in each file exceeds a few millions "NR" instead of having the normal number, so i can apply sort, it gets in scientific notation and I'm not able to guarantee the lines order;
- the server as a I/0 charge very big so i should be able to do all the process only in memory (there are processors without charge and memory).
- can i receive the several gzcat input into only one awk script? or it is not possible?
- can i use pipe to send the previous result to the next instruction without writing to the "final" file?
- when it gets to sort instruction I/0 use goes from 30% to 100% and memory use stays the same, why?
can someone help me out on any of this question?
it is getting really hard for a newbie like me to get a solution my problems because a system that should take one day doing his operations is taking 5 days and i'm trying to get solutions in areas that i really don't understand for now.
- when lines in each file exceeds a few millions "NR" instead of having the normal number, so i can apply sort, it gets in scientific notation and I'm not able to guarantee the lines order;
That's really pesky. You can avoid the scientific format with printf but if the line numbers exceed the capacity of the data type used internally by awk for integers, the output will be bogus.
So the only workaround I can suggest is to switch to Perl in order to solve this. There is a script a2p in the Perl distribution which can convert awk scripts to Perl scripts, although I hear it's not perfect.
Quote:
Originally Posted by naoseionome
- the server as a I/0 charge very big so i should be able to do all the process only in memory (there are processors without charge and memory).
I'm sorry, I can't find a question in that. Can you rephrase?
Quote:
Originally Posted by naoseionome
- can i receive the several gzcat input into only one awk script? or it is not possible?
The scripts seem to be different for each file, so it seems a bit dubious. Certainly you could try to refactor the code to reduce duplication. It seems hard to write an awk script which could decide which fields to select purely based on the looks of the input (remember, file names are not visible when you receive data from a pipe), but if you know how to do that, by all means give it a try. Perhaps you could marshal the output from gzcat into a form where you can also include headers with information about which field numbers to use, or something. (Think XML format, although you don't have to use the specifics of XML, of course. Something simple like a prefix on each line which says which fields to look at is probably a lot easier to code and understand.)
Quote:
Originally Posted by naoseionome
- can i use pipe to send the previous result to the next instruction without writing to the "final" file?
Group the commands into a subshell and pipe the output from that shell to sort.
Quote:
Originally Posted by naoseionome
- when it gets to sort instruction I/0 use goes from 30% to 100% and memory use stays the same, why?
I usually find printf "%.f\n",variablename does the trick in awk.
sort usually has a command-line option to change the amount of memory it will allocate... usually the default is quite small, so you may see some benefit by increasing it. You can also sometimes control where it will store temporary files, so you may be able to specify some faster disks, or some that do not contain the original data so that they are not competing with each other. See man sort for details...
hi,
I'm doing some changes already.
I'm running the test now with printf and trying to figure the amount of memory for sort (1 or 2 or 3 Gb :P)
I wanted to say that hard disk is working in the maximum but there is memory and processor available! I will start using a bit of the available memory for sort.
I planning on doing gunzip to the files in the biggining. This way i can send all the files into the same script and i just need to do an if for each file, like: IF FILENAME== /*line*/ "line code". this way i can send the result to sort instead of writting the "final" file.
hi,
I'm doing some changes already.
I'm running the test now with printf and trying to figure the amount of memory for sort (1 or 2 or 3 Gb :P)
Some architectures/OS' allow only 2GB of memory per process. Keep that in mind.
Also, if your system starts swapping, you'll lose the memory advantage.
Quote:
I wanted to say that hard disk is working in the maximum but there is memory and processor available! I will start using a bit of the available memory for sort.
You can also sort within awk. Just load in the values into a hashed array and foreach() the values to get them out. It uses more memory, but less CPU time.
In that case, using an external sort would be better.
Quote:
I planning on doing gunzip to the files in the biggining. This way i can send all the files into the same script and i just need to do an if for each file, like: IF FILENAME== /*line*/ "line code". this way i can send the result to sort instead of writting the "final" file.
Hi,
I have got a 9.3GB file and it is taking 1h 8min to sort file using the following code:
sort -T /directory1 -t | -k9,9 -k8,8n /directory1/file1 > /directory2/file2
Is there a faster way of doing it please?
Thanks
Shash (10 Replies)
Hello All,
I have the below excerpt of code in my shell script and it taking long time to complete, though it prints the output quickly. Is there a way to make it come out once it finds the first instance as the file size of 4.7 GB it could be going through all lines of the data file to find for... (3 Replies)
Hi,
I have the code below as
cat <filename> | tr '~' '\n' | sed '/^$/ d' | sed "s/*/|/g" > <filename>
awk -F\| -vt=`date +%m%d%y%H%M%S%s` '$1=="ST",$1=="SE"{if($1=="ST"){close(f);f="214_edifile_"t"" ++i} ; $1=$1; print>f}' OFS=\| <filename>
This script replaces some characters and... (4 Replies)
hi I am having a performance issue with the following requirement
i have to create a permutation and combination on a set of three files
such that each record in each file is picked and the output is redirected in
a specific format but it is taking around 70 odd hours to prepare a
combination... (7 Replies)
Hi Gurus,
I am beginner in solaris and want to know what are the things we need to check for performance monitoring on our solairs OS.
for DISK,CPU and MEMORY.
Also how we do ipforwarding in slaris
Many thanks for your help
Pradeep P (4 Replies)
Hi,
The below awk script is taking about 1 hour to fetch just 11 records(columns). There are about 48000 records. The script file name is take_first_uniq.sh
#!/bin/ksh
if
then
while read line
do
first=`echo $line | awk -F"|" '{print $1$2$3}'`
while read line2
do... (4 Replies)
I am trying to write a script that prompts users for date and time, then process the gzip file into awk. During the ksh part of the script another file is created and needs to be processed with a different set of pattern matches then I need to combine the two in the end. I'm stuck at the part... (6 Replies)
Hi,
on a linux server I have the following :
vmstat 2 10
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 4 0 675236 39836 206060 1617660 3 3 3 6 8 7 1 1 ... (1 Reply)
We have a AIX v5.3 on a p5 system with a poor performing Ingres database.
We added one CPU to the system to see if this would help. Now there are two CPU's.
with sar and topas -P I see good results: CPU usage around 30%
with topas I only see good results in the process output screen, the... (1 Reply)
Hello all,
I just stuck up in an uncertain situation related to network performance...
I am trying to access one of my remote client unix machine from a distant location..
The client machine is Ultra-5_10 , with SunOS 5.5.1
The ndd result ( hme1 )shows that the machine is hooked to a... (5 Replies)