Home
Man
Search
Today's Posts
Register

BSD, Linux, and UNIX shell scripting Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Shell script to apply functions to multiple columns dynamically

Tags
functions, script, shell, shell script, solved

Login to Reply

 
Thread Tools Search this Thread
# 8  
Old 1 Week Ago
Quote:
Originally Posted by RudiC
Please be aware that the md5sum of '10,abc' will NEVER be 73aca49763216fb96bbc2acef7b60afb as it is case sensitive.
Looks like you want a comma included. Try
Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = ""
                 for (i=1; i<=CNT; i++) TMP = TMP "," $(COL[i])
                 ("echo -n " substr (TMP, 2) " | md5sum") | getline MD5
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME" file
ID|NAME|AGE|GENDER|HASHED COLUMNS|HASHVALUE
10|ABC|30|M|ID,NAME|73aca49763216fb96bbc2acef7b60afb
20|DEF|20|F|ID,NAME|9d6555fe65eb60b2f7d9174b56f667f5

Yes thank-you, you are right I should be more careful with the cases but the output now is as expected
Also can the last line MCOL="ID,NAME" be parameterized example

Code:
mcols=$1
filename=$2
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = ""
                 for (i=1; i<=CNT; i++) TMP = TMP "," $(COL[i])
                 ("echo -n " substr (TMP, 2) " | md5sum") | getline MD5
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="$(mcols)" file

and call the script like sh myscript.sh "ID,NAME"(I think it is a silly question but never the less I am asking)

Also I am trying to understand your code line by line could you point me how debug this code I am not asking you to explain line by line but can you pls point me towards the direction for me to better understand the code.

Thanks.
# 9  
Old 1 Week Ago
Quote:
Originally Posted by mkathi
Yes thank-you, you are right I should be more careful with the cases
Yes.

Quote:
Also can the last line MCOL="ID,NAME" be parameterized example . . . and call the script like sh myscript.sh "ID,NAME"(I think it is a silly question but never the less I am asking)
How about just trying and using MCOL="$1"? The syntax you used is called "command substitution".
Quote:
Also I am trying to understand your code line by line could you point me how debug this code I am not asking you to explain line by line but can you pls point me towards the direction for me to better understand the code.
Thanks.
When operating on the first data line, I collect the target column numbers as indices into an array. For all remaining lines, I assemble these column values separated by commas into a TMP variable, execute echo ... | md5sum on it, getline the result into a variable, and, after some massaging, print the desired outout line.
Aside, if your file has more lines than awk allows for open files, you'd need to close the system calls after each use...
# 10  
Old 1 Week Ago
Quote:
Originally Posted by RudiC
Yes.

How about just trying and using MCOL="$1"? The syntax you used is called "command substitution".
When operating on the first data line, I collect the target column numbers as indices into an array. For all remaining lines, I assemble these column values separated by commas into a TMP variable, execute echo ... | md5sum on it, getline the result into a variable, and, after some massaging, print the desired outout line.
Aside, if your file has more lines than awk allows for open files, you'd need to close the system calls after each use...
Yes I will try command substitution when I reach to work tomorrow online unix terminals are giving me a hard time.

Thanks for explaining the code after a lot of googling I am at the stage where i can understand 80% of the code written except why "," MCOL "," ~ "," $1 "," the "," in this if statement but I am learning awk and will figure it out soon.

I don't quite understand what
Quote:
you'd need to close the system calls after each use...
actually means is it a syntax or is it like open and closing cursors in plsql( sorry bas example but sql is the only language i am comfortable for now)

Thanks.
# 11  
Old 1 Week Ago
Quote:
Originally Posted by mkathi
...

why "," MCOL "," ~ "," $1 "," the "," in this if statement
Print out the two and compare / apply the matching operator ~ .


Quote:
actually means is it a syntax or is it like open and closing cursors in plsql( sorry bas example but sql is the only language i am comfortable for now)

...
awk allows for a not too small but limited number of open files, of which each echo ... | md5sum consumes one. So, once you reach that limit, action needs to be taken.
# 12  
Old 1 Week Ago
Quote:
Originally Posted by RudiC
Print out the two and compare / apply the matching operator ~ .

makes sense thanks

awk allows for a not too small but limited number of open files, of which each echo ... | md5sum consumes one. So, once you reach that limit, action needs to be taken.
my files reea hude ranging from 10000 rec to 30000000 but I am going to use one file at time i mean open one file at a time will that still be a issue.
# 13  
Old 1 Week Ago
That script consumes an "open file" for every line in your input file. 10000 may but 3000000 definitively will be too many. Try



Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = "echo -n " $(COL[1]) ","
                 for (i=2; i<CNT; i++) TMP = TMP $(COL[i]) ","
                 TMP = TMP $(COL[CNT]) " | md5sum"
                 TMP | getline MD5
                 close (TMP)
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
 ' OFS="|" MCOL="ID,NAME,AGE" file


or


Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = "echo -n "
                 for (i=1; i<=CNT; i++) TMP = TMP $(COL[i]) ","
                 sub (/,$/, " | md5sum", TMP)
                 TMP | getline MD5
                 close (TMP)
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME,AGE" file

# 14  
Old 1 Week Ago
Quote:
Originally Posted by RudiC
That script consumes an "open file" for every line in your input file. 10000 may but 3000000 definitively will be too many. Try



Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = "echo -n " $(COL[1]) ","
                 for (i=2; i<CNT; i++) TMP = TMP $(COL[i]) ","
                 TMP = TMP $(COL[CNT]) " | md5sum"
                 TMP | getline MD5
                 close (TMP)
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
 ' OFS="|" MCOL="ID,NAME,AGE" file


or


Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = "echo -n "
                 for (i=1; i<=CNT; i++) TMP = TMP $(COL[i]) ","
                 sub (/,$/, " | md5sum", TMP)
                 TMP | getline MD5
                 close (TMP)
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME,AGE" file

Thanks i will have a chance to run this tomorrow against a large dataset and i will get back with the results..
note: this script taught me a lot about indexes thanks for that.
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How can I apply 'date' command to specific columns, in a BASH script? richardsantink Shell Programming and Scripting 7 06-15-2018 03:12 AM
Read Two Columns - Apply Condition on Six other columns jacobs.smith Shell Programming and Scripting 2 01-02-2015 06:26 PM
Creating IN list in PLSQL script dynamically by using shell script LoneRanger Shell Programming and Scripting 1 07-28-2014 04:02 PM
Functions in a shell script SkySmart Shell Programming and Scripting 4 05-22-2014 08:02 AM
How to run multiple functions in Background in UNIX Shell Scripting? karthikram Shell Programming and Scripting 8 06-15-2013 01:04 AM
Shell script variable names created dynamically u671296 Shell Programming and Scripting 2 10-04-2011 10:53 AM
Passing filename dynamically in SPOOL of SQL*PLUS in shell script shekharjchandra Shell Programming and Scripting 2 04-30-2011 01:47 AM
how to create the files dynamically in c shell script jdsignature88 Shell Programming and Scripting 1 07-07-2010 09:41 AM
using library functions in shell script Irishboy24 Shell Programming and Scripting 1 08-27-2009 02:59 PM
Dynamically creating text files using shell script KiranKumarKarre Shell Programming and Scripting 2 05-07-2009 09:02 AM
Shell script dynamically case in VAR ACTGADE Shell Programming and Scripting 2 10-20-2008 01:38 PM
Managing dynamically multiple shell gonzo38 Shell Programming and Scripting 0 08-08-2008 04:42 AM
Calling shell functions from another shell script jisha Shell Programming and Scripting 6 04-05-2008 05:29 PM
Shell script functions r_subrahmanian Shell Programming and Scripting 5 12-13-2005 02:05 PM
reallocating structures dynamically in functions cezaryn Programming 0 05-28-2003 12:12 AM


All times are GMT -4. The time now is 12:35 PM.

Unix & Linux Forums Content Copyright 1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password