HELP: If Doesn't Work in AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting HELP: If Doesn't Work in AWK
# 1  
Old 09-06-2010
HELP: If Doesn't Work in AWK

Hi!

I have a somehow big file (almost 3000 lines long and thirteen columns). Some lines have no value at all or, at least, are incomplete. The columns' values that have no data are marked with a "-" and the corresponding line (the line that owns that value) should be discarded and not used.

Here you are an image of what the data is:

http://img689.imageshack.us/img689/5793/imagetih.png

Due to text formatting problems, any excerpt of the real data cannot be posted here. Sorry.

The first and second columns doesn't interest us, since they are just countries' names (first column) and year (the second one).

I take two columns per calculation and if some column value is missing (if it has a "-"), then, the corresponding line (the line that owns that value) must be discarded.

In order to discard the line, I use a simple conditional statement:

Code:
{if($COL1 != "-" && $COL2 != "-") { print $COL1,"\t",$COL2; } }

That is, if some column has a "-", the value (or line) is not used. $COL1 and $COL2 are defined through the command line.

Using the third and the thirteenth columns of the dataset aforementioned, I got this as a result:

http://img409.imageshack.us/img409/927/image2w.png

It didn't discard the lines with a no value marker (the "-"). Why?

If you wanna try the AWK code and the dataset, please, download them from here:

2shared - download data_and_program.zip

Then, use:
Code:
 awk -f ./reg.awk COL1=3 COL2=13 all.dat

Please, don't worry about the comments on the code.

Since the download link is somehow hidden, please, see here an image showing where is the download link to get the file I uploaded:

http://img811.imageshack.us/img811/4670/image3yb.png

Here you are the complete code (don't worry about the Brazilian Portuguese comments):

Code:
#!/usr/bin/awk
BEGIN {
	sumx = 0; # Somatório da 1a coluna (x) de dados
	sumy = 0; # Somatório da 2a coluna (y) de dados
	mx   = 0; # Média de x
	my   = 0; # Média de y
	sdx  = 0; # Desvio Padrão de x
	sdy  = 0; # Desvio Padrão de y
	i    = 0; # Contador
	cov  = 0; # Covariância
	r    = 0; # Coeficiente de Correlação de Pearson
	r2   = 0; # Coeficiente de Determinação
	aux1 = 0; # Variável Auxiliar 1 no Cálculo do Modelo Linear
	aux2 = 0; # Variável Auxiliar 2 no Cálculo do Modelo Linear
	aux3 = 0; # Variável Auxiliar 3 no Cálculo do Modelo Linear
	aux4 = 0; # Variável Auxiliar 4 no Cálculo do Modelo Linear
	tr   = 0; # Valor do Trend do Modelo Linear
	sl   = 0; # Valor do Slope do Modelo LInear
	pop  = 0; # Quantidade de Amostras (População)
}

{if($COL1 !="-" && $COL2 !="-") # ***** THIS IS THE IF STATEMENT THAT DOES NOT WORK!!!! *****
{
	print $COL1,"\t",$COL2;
	if(min1 == "")
	{
		min1 = max1 = $COL1;  # Valores mínimo (min1) e máximo (max1) da 1a coluna (x)
	}
	if($COL1 > max1)
	{
		max1 = $COL1;
	}
	if($COL1 < min1)
	{
		min1 = $COL1;
	}

	if(min2 == "")
	{
		min2 = max2 = $COL2; # Valores mínimo (min2) e máximo (max2) da 2a  coluna (y)
	}
	if($COL2 > max2)
	{
		max2 = $COL2;
	}
	if($COL2 < min2)
	{
		min2 = $COL2;
	}

	sumx += $COL1;
	sumy += $COL2;
	vetx[i] = $COL1;
	vety[i] = $COL2;
	i++;
}}

END {
	pop = i;
	i = 0;
	mx = sumx/pop;
	my = sumy/pop;
	for(i = 0; i < pop; i++)
	{
		sdx += (vetx[i] - mx)*(vetx[i] - mx);
		sdy += (vety[i] - my)*(vety[i] - my);
	}

	for(i = 0; i < pop; i++)
	{
		cov += (vetx[i] - mx)*(vety[i] - my);
	}

	for(i=0; i< pop; i++)
	{
	    aux1 += vety[i];
	    aux2 += vetx[i]*vetx[i];
	    aux3 += vetx[i];
	    aux4 += vetx[i]*vety[i];
	}

	tr  = (aux1*aux2 - aux3*aux4)/(pop*aux2 - aux3*aux3);
	sl  = (pop*aux4 - aux3*aux1)/(pop*aux2 - aux3*aux3);
	sdx = sqrt(sdx/pop);
	sdy = sqrt(sdy/pop);
	cov = cov/pop;
	r   = cov/(sdx*sdy);
	r2  = r*r;

printf "\n População \t= %d\n\n Coluna X\t\t\t\t Coluna Y\n\n Mínimo  \t= %.4f\t\t Mínimo  \t= %.4f\n Máximo  \t= %.4f\t\t Máximo  \t= %.4f\n Amplitude  \t= %.4f\t\t Amplitude  \t= %.4f\n\n Máx/Mín \t= %.4f\t\t Máx/Mín \t= %.4f\n Míx/Mán \t= %.4f\t\t Míx/Mán \t= %.4f\n\n Média  \t= %.4f\t\t Média  \t= %.4f\n DesPad  \t= %.4f\t\t DesPad  \t= %.4f\n\n Soma   \t= %.4f\t\t Soma   \t= %.4f\n\n Outros Parâmetros:\n\n Covariância \t\t= %.4f\n Corr. de Pearson \t= %.4f\n Coef. de Determinação \t= %.4f\n Trend = %.4f\n Slope = %.4f\n y = %.4f + %.4f*x\n\n", pop, min1, min2, max1, max2, max1 - min1, max2 - min2, max1/min1, max2/min2, min1/max1, min2/max2, mx, my, sdx, sdy, sumx, sumy, cov, r, r2, tr, sl, tr, sl;}

Many thanks in advance!

Marcelo
# 2  
Old 09-06-2010
It appears that you are trying to take COL1 and COL2 from the environment; I don't see them defined in your programme, so that was my assumption. Im not sure that is possible. I don't usually use #!/bin/awk, and in the few tests I tried with the version I have installed here (GNU Awk 3.1.6) I was not able to use COL1 or COL2 from the environment in this manner.

I generally wrap my awk programmes inside of a shell script, and then the variables from the environment can be passed in. Something like this small example will print lines from your file that don't have a dash in the desired columns:

Code:
#/usr/bin/env ksh
awk -v col1=$COL1 -v col2=$COL2 '
{
     if( $(col1) != "-"  &&  $(col2) != "-" )
           print;
}' input-file

Sorry if I've misunderstood.
# 3  
Old 09-06-2010
I test your code in my env (CYGWIN +GNU AWK 3.1.8) without problem.

Agree with agama, use below code, you needn't update anything in reg.awk
Code:
c1=3 
C2=13

awk -v COL1=$c1 -v COL2=$c2  -f ./reg.awk all.dat

# 4  
Old 09-06-2010
Still Not Working. . .

Hi!

Thank you for your reply!

I tried your suggestion here and, unfortunately, it did not work. . .

See here the output:

http://img36.imageshack.us/img36/9224/imagefd.png

I used the code you provided from the command line:

Code:
awk -v col1=$3 -v col2=$13 '{ if($(col1) != "-" && $(col2) != "-") { print; }}' all.dat

On the code above, I set up col1 with the third column and col2 with the thirteenth column of the dataset I have here. Therefore, if anyone of those columns have a "-", then, that column value should be discarded, as well as the corresponding line.

As you can see on the image, there are lines that have a thirteenth column (the last column) with a "-" and are printed and they should not be printed.

Quote:
It appears that you are trying to take COL1 and COL2 from the environment;
Then, what should I do to set up COL1 and COL2 from the environment?

Thank you so much for your reply!

Marcelo

---------- Post updated at 11:41 PM ---------- Previous update was at 11:28 PM ----------

Quote:
Originally Posted by rdcwayx
I test your code in my env (CYGWIN +GNU AWK 3.1.8) without problem.

Agree with agama, use below code, you needn't update anything in reg.awk
Code:
c1=3 
C2=13

awk -v COL1=$c1 -v COL2=$c2  -f ./reg.awk all.dat

Hi!

I tried what you said and I got this as a result:

http://img20.imageshack.us/img20/4959/image2ke.png

I used this code from the command line:

Code:
awk -v COL1=$3 -v COL2=$13  -f ~/reg.awk all.dat

As you can see on the image, there are characters missing on the countries' names and the first and second columns are mixed, let alone that there are values of the thirteenth column that are printed even though those values are the no-value marker "-".

Thank you for your reply!

Marcelo
# 5  
Old 09-07-2010
I think the problem is that you don't need the $ with the -v assignments to hard assign them values. Try it this way:
Code:
awk -v col1=3 -v col2=13 '{ if($(col1) != "-" && $(col2) != "-") { print; }}' all.dat


If that doesn't work, then I'd replace the print statement with this to see what awk sees exactly in the 3rd and 13th columns, but try the previous change first.
Code:
     printf( "%d=(%s)  %d=(%s)\n", col1, $(col1), col2, $(col2) );

to see exactly what awk beleives to be in the fields designated by col1 and col2.

You can take the values from the environment, but you'll have to assign them using the -v options. There are other ways, but they involve messy quoting and can be the cause of odd problems as a result.
# 6  
Old 09-07-2010
Quote:
Originally Posted by Marcelo de Brit
Hi!

Thank you for your reply!

Code:
awk -v COL1=$3 -v COL2=$13  -f ~/reg.awk all.dat

Marcelo
Code:
awk -v COL1=3 -v COL2=13  -f ~/reg.awk all.dat

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

-ne 0 doesn't work -le does

Hi, I am using korn shell. until ] do echo "\$# = " $# echo "$1" shift done To the above script, I passed 2 parameters and the program control doesn't enter inside "until" loop. If I change it to until ] then it does work. Why numeric comparison is not working with -ne and works... (3 Replies)
Discussion started by: ab_2010
3 Replies

2. Shell Programming and Scripting

Why my awk doesn't work?

root@SDP_Wildcat_Pass-3-C1:~# cat /proc/driver/rtc rtc_time : 05:29:40 rtc_date : 2014-12-19 alrm_time : 01:51:53 alrm_date : 2014-12-20 alarm_IRQ : no alrm_pending : no update IRQ enabled : no periodic IRQ enabled : no periodic IRQ... (4 Replies)
Discussion started by: yanglei_fage
4 Replies

3. UNIX for Dummies Questions & Answers

Why doesn't this work?

find . -name "05_scripts" -type d -exec mv -f {}/'*.aep\ Logs' {}/.LogFiles \; Returns this failure: mv: rename ./019_0120_WS_WH_gate_insideTEST/05_scripts/*.aep\ Logs to ./019_0120_WS_WH_gate_insideTEST/05_scripts/.LogFiles/*.aep\ Logs: No such file or directory I don't know why it's trying... (4 Replies)
Discussion started by: scribling
4 Replies

4. Shell Programming and Scripting

Awk -- why doesn't my min work?

So, I have a files with entries in this format: servername,username,sub_username,useless_datapoint,mail_size So, a few sample lines: server_a,bob,jane,useless,112351 server_a,bob,jim,useless,421193 server_a,bob,bob,useless,0029385 server_a,karen,will,useless,112351... (3 Replies)
Discussion started by: treesloth
3 Replies

5. Shell Programming and Scripting

Awk split doesn't work for empty delimiter

Does anyone know how will I make awk's split work with empty or null separator/delimiter? echo ABCD | awk '{ split($0,arr,""); print arr; }' I need output like: A B C D I am under HP-UX (5 Replies)
Discussion started by: Orbix
5 Replies

6. Shell Programming and Scripting

Awk: Can anyone tell me why this doesn't work?

If there exists a field in stdin, print it, otherwise, print hello..... These print nothing: cat /dev/null | awk '{if ( length > 0 ) print $1; else print "hello"}' cat /dev/null | awk '{if ( $1 ) print $1; else print "hello"}'But the scripts work if I run them directly in a terminal: ... (8 Replies)
Discussion started by: ksheller
8 Replies

7. Shell Programming and Scripting

awk -v -- Why doesn't my example work?

Hi. I've been playing around a bit. This isn't for any practical purpose-- it's really just a theoretical exercise. I wrote this little thing: foreach num ( 6 5 4 ) awk -v "number=$num" 'BEGIN{for(x=0;x<$number;x++) printf "-"; printf "\n"}' end I would expect the following output: ... (3 Replies)
Discussion started by: treesloth
3 Replies

8. UNIX for Dummies Questions & Answers

Script doesn't work, but commands inside work

Howdie everyone... I have a shell script RemoveFiles.sh Inside this file, it only has two commands as below: rm -f ../../reportToday/temp/* rm -f ../../report/* My problem is that when i execute this script, nothing happened. Files remained unremoved. I don't see any error message as it... (2 Replies)
Discussion started by: cheongww
2 Replies

9. Shell Programming and Scripting

Why doesn't this work?

cat .servers | while read LINE; do ssh jason@$LINE $1 done exit 1 ./command.ksh "ls -l ~jason" Why does this ONLY iterate on the first server in the list? It's not doing the command on all the servers in the list, what am I missing? Thanks! JP (2 Replies)
Discussion started by: jpeery
2 Replies
Login or Register to Ask a Question