awk string-function


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk string-function
# 1  
Old 07-01-2014
awk string-function

Sorry for setting my foot as just a technical user on holy ground here again asking and learning. After tries with strings and arrays I decided to go for an if-else-if-ladder for a database, just because it looks a little easier to me, but as it happens, my result is not the desired one.
So here is the scheme for the if-else-if-ladder in awk. Given a database.txt with four rows, the fourth one is my aim.
The last two lines of the database.txt
Code:
webster    3:48, 29.06.2014 Jun
webster    3:50, 29.06.2014 Jun

That's the scheme for the if-else-if-ladder

Code:
 if(conditional-expression1)
    {action1 ; action 2;}
    else if(conditional-expression2)
    {action1 ; action 2;}
    else if(conditional-expression3)
    {action1 ; action 2;}
    .
    .
    else
    action n;

That's my first step into it.
Code:
#declaring myarray as row four with the months
awk '(string["JanFebMarAprMayJunJulAugSepOctNovDec"]=$4); 

#defining mysubstring to be substring(named submonth) of string with startposition and length
mysubstring=submonth[string("JanFebMarAprMayJunJulAugSepOctNovDec",1,3)],

Code:
#guess here my trouble starts, no matter what if-condition, the interpreter complains about a syntax error
#it should start in row four, position one, maximumlength, matching for example [Jun]
#choosing submonth as a term, because substring is a named function
#so I put the following line, not sure about it, 'cause it doesn't work
b=split(submonth,array,search)
search="submonth"
(mysubstring($4,1,3)==submonth[Jun])
#and by now it gets something like a for-condition rather then the if-else-if-ladder
#followed by the two actions. 
($4>max) max==$4 {OFMT="%6f"; ORS ":"; print max; sum=sum+$4};
{OFMT="%-7.4f\n"; ORS ":"; print sum/NR}  ; '                 database.txt

but the sad thing stdout tells me something like this: There is no function for string defined. How to define it? I guess it would be split. While setting split, the interpreter tells me that 'search' still is no definiton for the string.

Can anybody give me a hint on that, regards and thanks in advance.
# 2  
Old 07-02-2014
Firstly I think you need to get some terms right. You talk about "rows" but I feel you mean fields. For the last line of your database.txt we have:

Code:
field 1 = "webster"
field 2 = "3:50,"
field 3 = "29.06.2014"
field 4 = "Jun"

index() is the function that matches strings and returns the integer position of the string within the text. I can demonstrate this with these simple awk programs:

Code:
$ awk 'BEGIN { print index("JanFebMarAprMayJunJulAugSepOctNovDec", "Jan") }'
1

$ awk 'BEGIN { print index("JanFebMarAprMayJunJulAugSepOctNovDec", "Jun") }'
16

$ awk 'BEGIN { print index("JanFebMarAprMayJunJulAugSepOctNovDec", "Zoo") }'
0

So here you can see we are getting close to finding the month number for a string and zero for an invalid month.

From here we could add a couple of dummy characters (ones we never expect to see in field number 4 like space) to the front of the string. Now index of Jan will be 3 and Feb 6 and Mar 9, etc. so if we divide by 3 we will have the month number:

Code:
$ awk 'BEGIN { print index("  JanFebMarAprMayJunJulAugSepOctNovDec", "Oct") / 3}'
10


Now lets put this together into a working demo awk program with it's own user-defined get_month_num() function. This should help you get along the way to doing what you want:
Code:
awk '
function get_month_num(mth) {
    return index("  JanFebMarAprMayJunJulAugSepOctNovDec", mth) / 3
}
{
   print "Month number for row " NR " is " get_month_num($4)
   if(get_month_num($4) > 5) {
       print "  This month is later than May"
    } else {
       print "   This month is May or earlier"
    }
}' database.txt

# 3  
Old 07-02-2014
Quote:
Originally Posted by 1in10
Sorry for setting my foot as just a technical user on holy ground here again asking and learning. After tries with strings and arrays I decided to go for an if-else-if-ladder for a database, just because it looks a little easier to me, but as it happens, my result is not the desired one.
So here is the scheme for the if-else-if-ladder in awk. Given a database.txt with four rows, the fourth one is my aim.
The last two lines of the database.txt
Code:
webster    3:48, 29.06.2014 Jun
webster    3:50, 29.06.2014 Jun

That's the scheme for the if-else-if-ladder

Code:
 if(conditional-expression1)
    {action1 ; action 2;}
    else if(conditional-expression2)
    {action1 ; action 2;}
    else if(conditional-expression3)
    {action1 ; action 2;}
    .
    .
    else
    action n;

That's my first step into it.
Code:
#declaring myarray as row four with the months
awk '(string["JanFebMarAprMayJunJulAugSepOctNovDec"]=$4); 

#defining mysubstring to be substring(named submonth) of string with startposition and length
mysubstring=submonth[string("JanFebMarAprMayJunJulAugSepOctNovDec",1,3)],

Code:
#guess here my trouble starts, no matter what if-condition, the interpreter complains about a syntax error
#it should start in row four, position one, maximumlength, matching for example [Jun]
#choosing submonth as a term, because substring is a named function
#so I put the following line, not sure about it, 'cause it doesn't work
b=split(submonth,array,search)
search="submonth"
(mysubstring($4,1,3)==submonth[Jun])
#and by now it gets something like a for-condition rather then the if-else-if-ladder
#followed by the two actions. 
($4>max) max==$4 {OFMT="%6f"; ORS ":"; print max; sum=sum+$4};
{OFMT="%-7.4f\n"; ORS ":"; print sum/NR}  ; '                 database.txt

but the sad thing stdout tells me something like this: There is no function for string defined. How to define it? I guess it would be split. While setting split, the interpreter tells me that 'search' still is no definiton for the string.

Can anybody give me a hint on that, regards and thanks in advance.
The first code segment shown in red above defines an array named string and sets the element of that array that has a string containing the abbreviated names of the months as a subscript to the 4th field on each line read from a file named database.txt and if that line has four or more fields and the 4th field does not evaluate to zero, awk copies that line to standard output.

The second code segment shown in red above calls a function named string() with three arguments. But, as awk told you, you have not defined a function named string() and the awk language does't provide a function named string(). Since awk doesn't know what the function string() is supposed to do, it can't run your script and it prints a diagnostic message

You have shown us code that isn't doing what you want it to do. But, you have not told us what you are trying to do, you have only shown us two lines from your four line input file, and you have not shown us what output you are trying to produce.

You have said that you want an if-else-if-ladder, but you haven't said what you want that if-else-if-ladder to do. We can't help you write code if we don't know what that code is supposed to do.

I do not understand how "Webster" or "Jun" is a length and what the maximum value of a string is in this context.

Show us ALL four lines of your input file.
Show us the output you are trying to produce.
Explain to us (in English) what your script is supposed to do to convert your sample input into that desired output.
# 4  
Old 07-02-2014
second attempt

@Don Cragun It is just about the last column of the database.txt, that is by chance row four or $4, not any of the fields ahead.
This is truly all of the input, due to the last run of the script. My database is just like this. This is the input-file, named database.txt


Code:
webster    3:48,  29.06.2014 Jun
webster    3:50,  29.06.2014 Jun
webster    23:11, 02.07.2014 Jul
webster    3:45,  02.07.2014 Jul

That means:
user, uptime, num-date, abbreviated name of the month.
Part of my confusion is due to three idioms used here, locales.gen for three different users, english, portuguese and german.
Preferably I work on the login with locales.gen in german. And from squeeze to wheezy it might has changed its location.
Reading your answer I am defining an array instead of a string. Furthermore the element of that array again is a string. And this is due to these square [ ] brackets!? So I learn that square brackets [] create, execute and show me the func. I use deliberately func, for learning now function is a restricted term in awk. **(1)observation see below example**
Cutting a long story short, this is what it should do:
1. create a string with the full names of the month
Code:
submonth=(date +%B)
#re-using the locale variable submonth for full name of the month, this is literally the fourth row or column in the 
#database.txt-file or input-file
awk  -v submonth=$(date +%B) '
#creating the string (" ..") and setting him equal to row or column number four of the database.txt or the input-file, 
#here the german version

string("JanuarFebruarMärzAprilMaiJuniJuliAugustSeptemberOktoberNovemberDezember")=$4;

2. make submonth("Januar.....Dezember",1,9) with string,startpos,maxlen for the substring.
Code:
mysubstring=submonth($4,1,9),

3. set the first if-statement with the substring, thats where my embarrassment starts. So this is pseudo-code, but I'll put it as code.
Code:
if mysubstring("Januar") in submonth
{execute function1 ; function 2} and print result 1 + result 2 to stdout ;
else if mysubstring("Februar") in submonth
    {execute function1 ; function 2} and print result 1 + result 2 to stdout;
        else if mysubstring("März") in submonth
            {execute function1 ; function 2} and print result 1 + result 2 to stdout ;

and so on, until it reaches (Dez) to finish the column $4. Which by coincidence is field four of all FNR or $0.
I am aware of this prosa, thats why I ask here.
What is called func(tion) in this here, to me is any calculation, as mentioned above, but it seems the statement print is a function too.

**(1)observation**
here is an example, that even I understood. The usage of a string and square brackets for an array with a function called search.

Code:
#!/usr/bin/awk -f
BEGIN {
# this script breaks up the sentence into words, using 
# a space as the character separating the words

string="January February March April May June July August September November December ?";
    search=" ";
    n=split(string,array,search);
    for (i=1;i<=n;i++) {
        printf("Word[%d]=%s\n",i,array[i]);
    }
    exit;
}

# 5  
Old 07-03-2014
It is obvious that we have a huge language barrier here. My understanding of German is minimal. (I'm fluent in English, Standardese, and several computer languages.)

Row 4 of your database:
Code:
webster    3:48,  29.06.2014 Jun
webster    3:50,  29.06.2014 Jun
webster    23:11, 02.07.2014 Jul
webster    3:45,  02.07.2014 Jul

is the line shown in red. The 4th column in your database is list of the abbreviated month names in the 4th field (or column) in your database.

$0 expands to the contents of the line that is currently being processed by awk.

$4 expands to the contents of the 4th field on the line that is currently being processed by awk.

FNR is the line number of the line that is currently being processed within the file that is currently being read by awk.

The pseudo-code you have shown us is confusing us more than helping us understand what you are trying to do. With the four line database shown above, what output are you trying to produce?

Are the abbreviated month names in your database English (or C Locale) abbreviations or German abbreviations?

What awk functions are you trying to define? What arguments is each function supposed to take? What output is each function supposed to produce?

You seem to want to print "result 1" and "result 2"? Are these literal strings? If not, where to they come from?
# 6  
Old 07-03-2014
function call, integers

Details turning up to be bigger than they where before.
Referring to my pseudo-code-snippet the if-else-if-ladder shall
Code:
find mysubstring("Januar") in submonth
    {execute function 1; execute function 2} and print the result of both to stdout;
    else if mysubstring("Februar") in submonth
                   {execute function 1; execute function 2} and print the result of both to stdout;

On each machine there are only the abbreviated names of the month iwithin the database.txt-file of each user. So if user carla switches on, there is the portuguese abbreviated name written to her own input-file, while some frank uses on his account the english version, and me myself is using the german on my account. There is no server running here. I want to measure the total of running uptime and the average of uptime. Since I am still miles away to pump this into each of there accounts or joining it. So reading again about rows, FNR and NR, it is about the fourth field $4 in all entries or all the database.txt-input. The result of both functions, one and two, a decimal point integers, coming from a function-call within that curly { } brackets.
I try to reproduce the syntax by defininig the function that itself is called as function but named anything else
Code:
function 
    f_summing_up($4>max,max==$4,sum=sum+$4)    

#column $4 is greater than maximum and is itself 
#the maximum, f_suming_up
#handles the parameters for the first calculation    

        {OFMT="%6f"; ORS ":"; printf max; sum=sum+$4};  
#printing the default format with variable ORS":"

    f_average(sum=sum+$4)/NR))      
           
#division of sum of column four $4 
#by the number of entries
#which may contains the error, that NR refers to all NR but 
#but not the specific ones of column $4   

        {OFMT="%-7.4f\n"; ORS ":"; print sum/NR}

# 7  
Old 07-04-2014
Quote:
Originally Posted by 1in10
Details turning up to be bigger than they where before.
Referring to my pseudo-code-snippet the if-else-if-ladder shall
Code:
find mysubstring("Januar") in submonth
    {execute function 1; execute function 2} and print the result of both to stdout;
    else if mysubstring("Februar") in submonth
                   {execute function 1; execute function 2} and print the result of both to stdout;

On each machine there are only the abbreviated names of the month iwithin the database.txt-file of each user. So if user carla switches on, there is the portuguese abbreviated name written to her own input-file, while some frank uses on his account the english version, and me myself is using the german on my account. There is no server running here. I want to measure the total of running uptime and the average of uptime. Since I am still miles away to pump this into each of there accounts or joining it. So reading again about rows, FNR and NR, it is about the fourth field $4 in all entries or all the database.txt-input. The result of both functions, one and two, a decimal point integers, coming from a function-call within that curly { } brackets.
I try to reproduce the syntax by defininig the function that itself is called as function but named anything else
Code:
function 
    f_summing_up($4>max,max==$4,sum=sum+$4)    

#column $4 is greater than maximum and is itself 
#the maximum, f_suming_up
#handles the parameters for the first calculation    

        {OFMT="%6f"; ORS ":"; printf max; sum=sum+$4};  
#printing the default format with variable ORS":"

    f_average(sum=sum+$4)/NR))      
           
#division of sum of column four $4 
#by the number of entries
#which may contains the error, that NR refers to all NR but 
#but not the specific ones of column $4   

        {OFMT="%-7.4f\n"; ORS ":"; print sum/NR}

Most of what you have said above makes absolutely no sense to me.

I repeat: With the following four lines in your database:
Code:
webster    3:48,  29.06.2014 Jun
webster    3:50,  29.06.2014 Jun
webster    23:11, 02.07.2014 Jul
webster    3:45,  02.07.2014 Jul

exactly what output do you hope to produce?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

String Function in UNIX

Hi - Have file1 which has the below date 08/31/2018 And also have file2 which has the below texts ASOF:<CMODate> FUND I need to read the second file if it has colon (:) then move the date from first file to second file like this ASOF:08/31/2018 have used cut -d":" -f1 and moved the... (2 Replies)
Discussion started by: Mohan0509
2 Replies

2. Shell Programming and Scripting

Need help on awk for printing the function name inside each function

Hi, I am having script which contains many functions. Need to print each function name at the starting of the function. Like below, functionname() { echo "functionname" commands.... } I've tried like below, func=`grep "()" scriptname | cut -d "(" -f1` for i in $func do nawk -v... (4 Replies)
Discussion started by: Sumanthsv
4 Replies

3. Shell Programming and Scripting

How do I get the first string value from function?

Hello All, I am trying to get the value "node01_mymachine" and disregard the rest of the returned string (command ran*) from myscript.sh $ myscript.sh GetNodeName node01_mymachine Command ran successfully. If I called from another script like this: anyprocess=`myscript.sh... (2 Replies)
Discussion started by: msetjadi
2 Replies

4. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

5. Shell Programming and Scripting

Awk problem: How to express the single quote(') by using awk print function

Actually I got a list of file end with *.txt I want to use the same command apply to all the *.txt Thus I try to find out the fastest way to write those same command in a script and then want to let them run automatics. For example: I got the file below: file1.txt file2.txt file3.txt... (4 Replies)
Discussion started by: patrick87
4 Replies

6. Programming

Is this string splitting function OK?

Hello, I am recently working on an application that sends large strings accross a network very often. These then need to be broken up first with '!' and then with ','. My current function (below) works fine for this when not too much data is being sent across the network but segfaults when a... (4 Replies)
Discussion started by: kpedersen
4 Replies

7. Shell Programming and Scripting

function returns string

Can I create a function to return non-interger value in shell script? for example, function getcommand () { echo "read command" read command echo $command } command=$(getcommand) I tried to do something as above. The statement echo "read command" does not show up. ... (5 Replies)
Discussion started by: lalelle
5 Replies

8. Shell Programming and Scripting

Passing string from function with '*'

Hi I have a shell function which returns string(ksh). The string is an sql statement. This statement can have '*' in its content (i.e. select 100 / 2 *100 from dual). When this happens ret_str will have contents of current directry I run the script from build in sql. Is there any way to fix it... (2 Replies)
Discussion started by: zam
2 Replies

9. Programming

string returning function

I have two string returning function in ESQL/C char *segment_name(lbuffer) char *lbuffer; {..... and char *get_bpdvalue(f_name) char *f_name; {...... both declared above main() char *get_bpdvalue(); char *segment_name(); my problem is segment_name works on sprintf and strcpy... (5 Replies)
Discussion started by: jisc
5 Replies

10. Programming

string function

I have a question concerning string functions. I have not been able to locate a function that does what I want, so I fugured I'd ask before I wrote on myself. Is there a function to which I can pass 2 strings (character string a and character string b) and have it tell me if string b appears... (7 Replies)
Discussion started by: jalburger
7 Replies
Login or Register to Ask a Question