Shell script to apply functions to multiple columns dynamically


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shell script to apply functions to multiple columns dynamically
# 8  
Old 11-11-2018
Quote:
Originally Posted by RudiC
Please be aware that the md5sum of '10,abc' will NEVER be 73aca49763216fb96bbc2acef7b60afb as it is case sensitive.
Looks like you want a comma included. Try
Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = ""
                 for (i=1; i<=CNT; i++) TMP = TMP "," $(COL[i])
                 ("echo -n " substr (TMP, 2) " | md5sum") | getline MD5
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME" file
ID|NAME|AGE|GENDER|HASHED COLUMNS|HASHVALUE
10|ABC|30|M|ID,NAME|73aca49763216fb96bbc2acef7b60afb
20|DEF|20|F|ID,NAME|9d6555fe65eb60b2f7d9174b56f667f5

Yes thank-you, you are right I should be more careful with the cases but the output now is as expected
Also can the last line MCOL="ID,NAME" be parameterized example

Code:
mcols=$1
filename=$2
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = ""
                 for (i=1; i<=CNT; i++) TMP = TMP "," $(COL[i])
                 ("echo -n " substr (TMP, 2) " | md5sum") | getline MD5
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="$(mcols)" file

and call the script like sh myscript.sh "ID,NAME"(I think it is a silly question but never the less I am asking)

Also I am trying to understand your code line by line could you point me how debug this code I am not asking you to explain line by line but can you pls point me towards the direction for me to better understand the code.

Thanks.
# 9  
Old 11-11-2018
Quote:
Originally Posted by mkathi
Yes thank-you, you are right I should be more careful with the cases
Yes.

Quote:
Also can the last line MCOL="ID,NAME" be parameterized example . . . and call the script like sh myscript.sh "ID,NAME"(I think it is a silly question but never the less I am asking)
How about just trying and using MCOL="$1"? The syntax you used is called "command substitution".
Quote:
Also I am trying to understand your code line by line could you point me how debug this code I am not asking you to explain line by line but can you pls point me towards the direction for me to better understand the code.
Thanks.
When operating on the first data line, I collect the target column numbers as indices into an array. For all remaining lines, I assemble these column values separated by commas into a TMP variable, execute echo ... | md5sum on it, getline the result into a variable, and, after some massaging, print the desired outout line.
Aside, if your file has more lines than awk allows for open files, you'd need to close the system calls after each use...
# 10  
Old 11-12-2018
Quote:
Originally Posted by RudiC
Yes.

How about just trying and using MCOL="$1"? The syntax you used is called "command substitution".
When operating on the first data line, I collect the target column numbers as indices into an array. For all remaining lines, I assemble these column values separated by commas into a TMP variable, execute echo ... | md5sum on it, getline the result into a variable, and, after some massaging, print the desired outout line.
Aside, if your file has more lines than awk allows for open files, you'd need to close the system calls after each use...
Yes I will try command substitution when I reach to work tomorrow online unix terminals are giving me a hard time.

Thanks for explaining the code after a lot of googling I am at the stage where i can understand 80% of the code written except why "," MCOL "," ~ "," $1 "," the "," in this if statement but I am learning awk and will figure it out soon.

I don't quite understand what
Quote:
you'd need to close the system calls after each use...
actually means is it a syntax or is it like open and closing cursors in plsql( sorry bas example but sql is the only language i am comfortable for now)

Thanks.
# 11  
Old 11-12-2018
Quote:
Originally Posted by mkathi
...

why "," MCOL "," ~ "," $1 "," the "," in this if statement
Print out the two and compare / apply the matching operator ~ .


Quote:
actually means is it a syntax or is it like open and closing cursors in plsql( sorry bas example but sql is the only language i am comfortable for now)

...
awk allows for a not too small but limited number of open files, of which each echo ... | md5sum consumes one. So, once you reach that limit, action needs to be taken.
# 12  
Old 11-12-2018
Quote:
Originally Posted by RudiC
Print out the two and compare / apply the matching operator ~ .

makes sense thanks

awk allows for a not too small but limited number of open files, of which each echo ... | md5sum consumes one. So, once you reach that limit, action needs to be taken.
my files reea hude ranging from 10000 rec to 30000000 but I am going to use one file at time i mean open one file at a time will that still be a issue.
# 13  
Old 11-12-2018
That script consumes an "open file" for every line in your input file. 10000 may but 3000000 definitively will be too many. Try



Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = "echo -n " $(COL[1]) ","
                 for (i=2; i<CNT; i++) TMP = TMP $(COL[i]) ","
                 TMP = TMP $(COL[CNT]) " | md5sum"
                 TMP | getline MD5
                 close (TMP)
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
 ' OFS="|" MCOL="ID,NAME,AGE" file


or


Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = "echo -n "
                 for (i=1; i<=CNT; i++) TMP = TMP $(COL[i]) ","
                 sub (/,$/, " | md5sum", TMP)
                 TMP | getline MD5
                 close (TMP)
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME,AGE" file

# 14  
Old 11-12-2018
Quote:
Originally Posted by RudiC
That script consumes an "open file" for every line in your input file. 10000 may but 3000000 definitively will be too many. Try



Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = "echo -n " $(COL[1]) ","
                 for (i=2; i<CNT; i++) TMP = TMP $(COL[i]) ","
                 TMP = TMP $(COL[CNT]) " | md5sum"
                 TMP | getline MD5
                 close (TMP)
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
 ' OFS="|" MCOL="ID,NAME,AGE" file


or


Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = "echo -n "
                 for (i=1; i<=CNT; i++) TMP = TMP $(COL[i]) ","
                 sub (/,$/, " | md5sum", TMP)
                 TMP | getline MD5
                 close (TMP)
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME,AGE" file

Thanks i will have a chance to run this tomorrow against a large dataset and i will get back with the results..
note: this script taught me a lot about indexes thanks for that.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to print multiple required columns dynamically in a file using the header name?

Hi All, i am trying to print required multiple columns dynamically from a fie. But i am able to print only one column at a time. i am new to shell script, please help me on this issue. i am using below script awk -v COLT=$1 ' NR==1 { for (i=1; i<=NF; i++) { ... (2 Replies)
Discussion started by: balu1234
2 Replies

2. UNIX for Beginners Questions & Answers

How to apply the update statement in multiple servers on multiple dbs at a time .?

Hi , Can any please help the below requirement on all multiple servers and multiple dbs. update configuration set value='yes' ;1) the above statement apply on 31 Databases at a time on different Ip address eg : 10.104.1.12 (unix ip address ) the above ip box contains 4 db's eg : db... (2 Replies)
Discussion started by: venkat918
2 Replies

3. Shell Programming and Scripting

How can I apply 'date' command to specific columns, in a BASH script?

Hi everyone, I have a situation in which I have multiple (3 at last count) date columns in a CSV file (, delim), which need to be changed from: January 1 2017 (note, no comma after day) to: YYYY-MM-DD So far, I am able to convert a date using: date --date="January 12, 1990" +%Y-%m-%d ... (7 Replies)
Discussion started by: richardsantink
7 Replies

4. Shell Programming and Scripting

Read Two Columns - Apply Condition on Six other columns

Hello All, Here is my input univ1 chr1 100 200 - GeneA 500 1 0 0.1 0.2 0.3 0.4 0.5 univ1 chr1 100 200 - GeneA 600 1 0 0.0 0.0 0.0 0.0 0.1 univ1 chr1 100 200 - GeneA 700 1 0 0.4 0.4 ... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

5. Shell Programming and Scripting

Creating IN list in PLSQL script dynamically by using shell script

Hi all, I have a PLSQL script which has a IN list where it takes some ids as input. For example SELECT * FROM EMPLOYEE WHERE EMPLOYEE_ID IN (comma separated list ) I want to run this quest inside a shell script but I would like to prepare the IN list dynamically where the employee ids... (1 Reply)
Discussion started by: LoneRanger
1 Replies

6. Shell Programming and Scripting

How to run multiple functions in Background in UNIX Shell Scripting?

Hi, I am using ksh , i have requirement to run 4 functions in background , 4 functions call are available in a case that case is also in function, i need to execute 1st function it should run in background and return to case and next i will call 2nd function it should run in background and... (8 Replies)
Discussion started by: karthikram
8 Replies

7. Shell Programming and Scripting

how to create the files dynamically in c shell script

how can i CREATE a txt file dynamically in c shell: for instance: #! /bin/csh for each i (*) cat>file$i.txt for each j do .... (1 Reply)
Discussion started by: jdsignature88
1 Replies

8. Shell Programming and Scripting

Shell script dynamically case in VAR

Hallo, I am working on a kdialog. This shall be able to load the required commands from a .conf file. First step runs good by loading the entries (selectabel entries) in a variable: MIRRORSELECT=$(kdialog --radiolist "Select your nearest mirror" $VAR1) The kdialog is accordingly correct... (2 Replies)
Discussion started by: ACTGADE
2 Replies

9. Shell Programming and Scripting

Managing dynamically multiple shell

I want to launch some shell scripts. I would have the possibility to change the number of shell scripts launched dynamically by modifying a variable, or a configuration file. For example, I start to launch 4 scripts at the same time, and after that, by modifying a variable, 6 scripts are... (0 Replies)
Discussion started by: gonzo38
0 Replies

10. Programming

reallocating structures dynamically in functions

I've recently started using structures, but I am having problems in allocating the structure dynamically. In the code below if i allocate the structure in the main program it works fine, and i get the expected output. However if i use the function rper below to increase the size of the structure i... (0 Replies)
Discussion started by: cezaryn
0 Replies
Login or Register to Ask a Question