Shell script to apply functions to multiple columns dynamically


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shell script to apply functions to multiple columns dynamically
# 1  
Old 11-10-2018
Shell script to apply functions to multiple columns dynamically

Hello,

I have a requirement to apply hashing algorithm on flat file on one or more columns dynamically based on header

sample input file
Code:
ID|NAME|AGE|GENDER
10|ABC|30|M
20|DEF|20|F

say if i want multiple columns based on the header example id,name or id,age or name,gender and hash and store the output value in a new column for each row

example
Code:
echo -n '10,abc'|md5sum =hashvalue

expected sample output:
Code:
ID|NAME|AGE|GENDER|HASHEDCOLUMNS|HASHVALUE
10|abc|30|M|ID,NAME|hashvalue
20|def|20|F|ID,NAME|hashvalue

or

Code:
ID|NAME|AGE|GENDER|HASHEDCOLUMNS|HASHVALUE
10|abc|30|M|ID,gender|hashvalue
20|def|20|F|ID,gender|hashvalue

I am having hard time visualizing this
I have written something but i dont know if its going to work or not really lost here.

Code:
#!/usr/bin/sh

target="id"   # here i m passing the header column to be md5 applied ideally this is parameterized and can send more than one column name seperated by comma "
target_idx=   # the column number of that field

{
  # reading header.
  IFS=, read -r -a header
  for idx in "${!header[@]}"; do  # and look for the target field in it
    [[ ${header[$idx]} = $target ]] && { target_idx=$idx; break; }
  done
  # then, iterate over lines
  while IFS=, read -r -a line; do
    a=echo-n ${line[$target_idx]}|md5sum # if my input in target is "id"
  done
} <testfile.txt

echo$line|"$a"

Thanks,
kathi

Last edited by RudiC; 11-11-2018 at 08:37 AM..
# 2  
Old 11-10-2018
Quote:
Originally Posted by mkathi
say if i want multiple columns based on the header example id,name or id,age or name,gender and hash and store the output value in a new column for each row
If you want multiple columns you need to make the variable target_id and target_idx both arrays, no?

Then there is this:
Code:
  # then, iterate over lines
  while IFS=, read -r -a line; do
    a=echo-n ${line[$target_idx]}|md5sum # if my input in target is "id"
  done
} <testfile.txt

echo$line|"$a"

You iterate over the input lines, yes, but you produce output (the echo-line) only once: after you have read all the input. This must be wrong, no? Apart from the fact that after "echo" there needs to be a space (like after any command), but i suppose that was simply a typing error.

Also this line is not correct:
Code:
a=echo-n ${line[$target_idx]}|md5sum # if my input in target is "id"

It should perhaps read:
Code:
a=$(echo -n ${line[$target_idx]}|md5sum) # if my input in target is "id"

For parametrisation via the commandline i suggest to read the man page for getopts. I have just written an example in another thread you might want to use as a starting point.

I hope this helps.

bakunin
# 3  
Old 11-10-2018
Hello you are right

I have to pass the input as arrays yes and i did make a typo near the echo

yes the echo at the end is outside the loop so it will only print out the last row i think

issue is even if i pass the target as variable the code is only picking the first value passed generating the md5 and outputting the md5 for that line. I am trying to figure out how can i loop through all the columns passed inside the array and concatenate them using, and then calculate their md5 and then write it back to the file as a new column.

I know its a strange requirement Please point me to the write direction

I m trying something like this with no avail


Code:
#!/usr/bin/sh

target=("id,name")  
target_idx=()   # the column number of that field
{
  # reading header.
for i in $(echo $target | sed "s/,/ /g");do # trying to pass the comma seperated values inside target variable
  IFS=, read -r -a header
  for idx in "${!header[@]}"; do  # and look for the target field in it
    [[ ${header[$idx]} = $target ]] && { target_idx=$idx; break; }
  done
 done
  # then, iterate over lines
  while IFS=, read -r -a line; do
    a=$(echo -n ${line[$target_idx]}|md5sum)
    echo "$line"
  done
}

Moderator's Comments:
Mod Comment edit by bakunin: i think you posted this to the wrong thread. Pushing it to the right one.

Last edited by bakunin; 11-11-2018 at 03:16 AM..
# 4  
Old 11-10-2018
Hello you are right

I have to pass the input as arrays yes and i did make a typo near the echo

yes the echo at the end is outside the loop so it will only print out the last row i think

issue is even if i pass the target as variable the code is only picking the first value passed generating the md5 and outputting the md5 for that line. I am trying to figure out how can i loop through all the columns passed inside the array and concatenate them using, and then calculate their md5 and then write it back to the file as a new column.

I know its a strange requirement Please point me to the write direction

I m trying something like this with no avail


Code:
#!/usr/bin/sh

target=("id,name")  
target_idx=()   # the column number of that field
{
  # reading header.
for i in $(echo $target | sed "s/,/ /g");do # trying to pass the comma seperated values inside target variable
  IFS=, read -r -a header
  for idx in "${!header[@]}"; do  # and look for the target field in it
    [[ ${header[$idx]} = $target ]] && { target_idx=$idx; break; }
  done
 done
  # then, iterate over lines
  while IFS=, read -r -a line; do
    a=$(echo -n ${line[$target_idx]}|md5sum)
    echo "$line"
  done
}

------ Post updated at 01:45 AM ------

Just a thought may be i should change my thinking. Just throwing it out i still did not figure out how to write the code but i may have a starting point

step one is if i can concatenate all the column names i send as a parameter to the script and then apply md5 function on it and then store the result of the output in a column in the end does this make any sense.
# 5  
Old 11-11-2018
Not sure I fully understand what you're after, but try
Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = ""
                 for (i=1; i<=CNT; i++) TMP = TMP $(COL[i])
                 ("echo -n " TMP " | md5sum") | getline MD5
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME" file
ID|NAME|AGE|GENDER|HASHED COLUMNS|HASHVALUE
10|ABC|30|M|ID,NAME|f2e88b4a581adf6c7833d95799b7a08c
20|DEF|20|F|ID,NAME|09b01f45ed9092b4c3c155911571bf31


and report back how far it gets you...
# 6  
Old 11-11-2018
Quote:
Originally Posted by RudiC
Not sure I fully understand what you're after, but try
Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = ""
                 for (i=1; i<=CNT; i++) TMP = TMP $(COL[i])
                 ("echo -n " TMP " | md5sum") | getline MD5
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME" file
ID|NAME|AGE|GENDER|HASHED COLUMNS|HASHVALUE
10|ABC|30|M|ID,NAME|f2e88b4a581adf6c7833d95799b7a08c
20|DEF|20|F|ID,NAME|09b01f45ed9092b4c3c155911571bf31

and report back how far it gets you...
hello firstly thanks for the time.

Yes this is exactly what I am looking for but the generated hash values do not match in the above output for example the first row
Code:
10|ABC|30|M|ID,NAME|f2e88b4a581adf6c7833d95799b7a08c

the hash value is f2e88b4a581adf6c7833d95799b7a08c but here I am expecting output hash value as 73aca49763216fb96bbc2acef7b60afb i.e

Code:
10|ABC|30|M|ID,NAME|73aca49763216fb96bbc2acef7b60afb

the way i arrived to this value is

echo -n '10,abc'|md5sum and this returns me 73aca49763216fb96bbc2acef7b60afb

thanks

Last edited by RudiC; 11-11-2018 at 01:18 PM..
# 7  
Old 11-11-2018
Please be aware that the md5sum of '10,abc' will NEVER be 73aca49763216fb96bbc2acef7b60afb as it is case sensitive.
Looks like you want a comma included. Try
Code:
awk -F\| '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0, "HASHED COLUMNS", "HASHVALUE"
                 next
                }
                {TMP = ""
                 for (i=1; i<=CNT; i++) TMP = TMP "," $(COL[i])
                 ("echo -n " substr (TMP, 2) " | md5sum") | getline MD5
                 sub (/ *-/, "", MD5)
                 print $0, MCOL, MD5
                }
' OFS="|" MCOL="ID,NAME" file
ID|NAME|AGE|GENDER|HASHED COLUMNS|HASHVALUE
10|ABC|30|M|ID,NAME|73aca49763216fb96bbc2acef7b60afb
20|DEF|20|F|ID,NAME|9d6555fe65eb60b2f7d9174b56f667f5

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to print multiple required columns dynamically in a file using the header name?

Hi All, i am trying to print required multiple columns dynamically from a fie. But i am able to print only one column at a time. i am new to shell script, please help me on this issue. i am using below script awk -v COLT=$1 ' NR==1 { for (i=1; i<=NF; i++) { ... (2 Replies)
Discussion started by: balu1234
2 Replies

2. UNIX for Beginners Questions & Answers

How to apply the update statement in multiple servers on multiple dbs at a time .?

Hi , Can any please help the below requirement on all multiple servers and multiple dbs. update configuration set value='yes' ;1) the above statement apply on 31 Databases at a time on different Ip address eg : 10.104.1.12 (unix ip address ) the above ip box contains 4 db's eg : db... (2 Replies)
Discussion started by: venkat918
2 Replies

3. Shell Programming and Scripting

How can I apply 'date' command to specific columns, in a BASH script?

Hi everyone, I have a situation in which I have multiple (3 at last count) date columns in a CSV file (, delim), which need to be changed from: January 1 2017 (note, no comma after day) to: YYYY-MM-DD So far, I am able to convert a date using: date --date="January 12, 1990" +%Y-%m-%d ... (7 Replies)
Discussion started by: richardsantink
7 Replies

4. Shell Programming and Scripting

Read Two Columns - Apply Condition on Six other columns

Hello All, Here is my input univ1 chr1 100 200 - GeneA 500 1 0 0.1 0.2 0.3 0.4 0.5 univ1 chr1 100 200 - GeneA 600 1 0 0.0 0.0 0.0 0.0 0.1 univ1 chr1 100 200 - GeneA 700 1 0 0.4 0.4 ... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

5. Shell Programming and Scripting

Creating IN list in PLSQL script dynamically by using shell script

Hi all, I have a PLSQL script which has a IN list where it takes some ids as input. For example SELECT * FROM EMPLOYEE WHERE EMPLOYEE_ID IN (comma separated list ) I want to run this quest inside a shell script but I would like to prepare the IN list dynamically where the employee ids... (1 Reply)
Discussion started by: LoneRanger
1 Replies

6. Shell Programming and Scripting

How to run multiple functions in Background in UNIX Shell Scripting?

Hi, I am using ksh , i have requirement to run 4 functions in background , 4 functions call are available in a case that case is also in function, i need to execute 1st function it should run in background and return to case and next i will call 2nd function it should run in background and... (8 Replies)
Discussion started by: karthikram
8 Replies

7. Shell Programming and Scripting

how to create the files dynamically in c shell script

how can i CREATE a txt file dynamically in c shell: for instance: #! /bin/csh for each i (*) cat>file$i.txt for each j do .... (1 Reply)
Discussion started by: jdsignature88
1 Replies

8. Shell Programming and Scripting

Shell script dynamically case in VAR

Hallo, I am working on a kdialog. This shall be able to load the required commands from a .conf file. First step runs good by loading the entries (selectabel entries) in a variable: MIRRORSELECT=$(kdialog --radiolist "Select your nearest mirror" $VAR1) The kdialog is accordingly correct... (2 Replies)
Discussion started by: ACTGADE
2 Replies

9. Shell Programming and Scripting

Managing dynamically multiple shell

I want to launch some shell scripts. I would have the possibility to change the number of shell scripts launched dynamically by modifying a variable, or a configuration file. For example, I start to launch 4 scripts at the same time, and after that, by modifying a variable, 6 scripts are... (0 Replies)
Discussion started by: gonzo38
0 Replies

10. Programming

reallocating structures dynamically in functions

I've recently started using structures, but I am having problems in allocating the structure dynamically. In the code below if i allocate the structure in the main program it works fine, and i get the expected output. However if i use the function rper below to increase the size of the structure i... (0 Replies)
Discussion started by: cezaryn
0 Replies
Login or Register to Ask a Question