Help with Shell Scrip in Masking particular columns in .csv file or .txt file using shell script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with Shell Scrip in Masking particular columns in .csv file or .txt file using shell script
# 1  
Old 11-06-2016
Help with Shell Scrip in Masking particular columns in .csv file or .txt file using shell script

Hello Unix Shell Script Experts,

I have a script that would mask the columns in .csv file or .txt file.

First the script will untar the .zip files from Archive folder and processes into work folder and finally pushes the masked .csv files into Feed folder.

Two parameters are passed

1) Line of Business example : VA

and 2) Date YYYYMMDD : 20161101

The script will read the data from a table in a Database where it will have the position of the columns to be masked for all the .csv or .txt files.


objective here is to mask all the .csv or .txt files that are delimited with comma /pipe which has columns like tax_id and DOB columns.

It performs for one line of business properly across folders like untaring and generating the masked files with columns masked

however for other line of business it is not masking correctly at the positions

DOB format : 9999-12-12 (All dob columns will have this value)
Tax id : xxxx-xx-xx (All tax related id columns will have this format value)

Example : 4 and 30 position columns are TAX_ID and DOB then the script must mask these two columns.

There can be more columns to be masked.

say 10th 11th 12 th 40 th 120th positions so all these columns need to be masked


Script has 5 arrays each of which will store columns that are to be masked.

Problem
In the array am declaring all the columns of all the .csv files that are to be masked.
Array 1 contains all Date of Birth related columns
Array 2 contains all Tax id related columns.
Now one of the .csv file has columns like DATE-OF-BIRTH and TAX-ID
How to add the above 2 columun in the array along with other columns
I have tried by adding quotes single and double but did not work.

Finally the script is not behaving the way it is expected.
The objective of the script must be like : whenever it identifies the columns relsted to Dateof Birth or Tax id which will be hard coded then those must be masked.

Attaching the code for reference

Please let me know
Thanks
Mahesh G



Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 11-06-2016 at 10:02 AM.. Reason: Added ICODE tags.
# 2  
Old 11-06-2016
This request is very difficult to read/interpret. Examples of input and desired output data, demonstrating the logics to be applied, would definitely help. And, maybe a demo run with some defined values, e.g. for filesetcode.

As a general remark, that script seems a bit overcomplicated...

And, what do you mean with "mask" - eliminate column? Overwrite with a constant?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 11-06-2016
Data Sample

say one of the .csv file contains two columns DOB and TAX at 5 th and 10 th position. The script has these two column names hard coded.

Now the .csv file has data as shown below:

DOB format : 9999-12-12 (All dob columns will have this value)

Tax id : xxxx-xx-xx (All tax related id columns will have this format value)

Before masking

Code:
19990205  123468
20150314  524278
20061231  569874

Now after masking

Code:
9999-12-12 xxx-xx-xx
9999-12-12 xxx-xx-xx
9999-12-12 xxx-xx-xx


Please let me know if more information is required


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 11-06-2016 at 11:47 AM.. Reason: Added CODE tags.
# 4  
Old 11-06-2016
Quote:
Originally Posted by Mahesh G
Data Sample

say one of the .csv file contains two columns DOB and TAX at 5 th and 10 th position. The script has these two column names hard coded.

Now the .csv file has data as shown below:

DOB format : 9999-12-12 (All dob columns will have this value)

Tax id : xxxx-xx-xx (All tax related id columns will have this format value)

Before masking

Code:
19990205  123468
20150314  524278
20061231  569874

Now after masking

Code:
9999-12-12 xxx-xx-xx
9999-12-12 xxx-xx-xx
9999-12-12 xxx-xx-xx


Please let me know if more information is required


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!
This just doesn't make sense. Why should masking a field change an eight digit field to a ten character field containing eight digits and two hyphens??? Why should masking a field change a six digit field to a nine character field containing seven <x>s and two hyphens???

Your text says that a column with the text DOB (apparently in the 1st line of the field) identifies a field number and every that field should be changed on every line in the file to 9999-12-12. So, you are asking us to change the heading in that field from DOB to 9999-12-12??? Do you really want 9999-12-12, or should it be 9999-12-31???

Please show us some representative sample input data (with sanitized, but not masked, data) in CODE tags AND show the the corresponding output that should be produced by your script in CODE tags.

Note that you refer to DOB, dob, Date of Birth, Dateof Birth, and DATE-OF-BIRTH in the above quoted text; I assume that you realize that all five of these column headings are different and none of them will compare equal to any of the others!

Are we supposed to assume that all of the files you will be processing will have the same fields in the same order? Or is each file different?

You say that you have <comma> and <vertical-bar> separated files. Does a single file ever contain both delimiters? If you have a <comma> delimited file, will the data in that file ever contain a <vertical-bar> character as data (not as a field delimiter)?
# 5  
Old 11-06-2016
Quote:
This just doesn't make sense. Why should masking a field change an eight digit field to a ten character field containing eight digits and two hyphens??? Why should masking a field change a six digit field to a nine character field containing seven <x>s and two hyphens???
It was for an example I referred above.
say if the column name in the .csv file is TaxID and it has value as shown below
Code:
TaxID
1259755
5485656
4874455

Then aftermasking it is expected that the inside .csv file under the column TXID
Code:
xxx-xx-xx
xxx-xx-xx
xxx-xx-xx

Please note it is not that a six digit field to a nine character field containing seven <x>s and two hyphens.. In future the Tax id values can be more than 10 digits. so here the intent is if the column name is Tax_ID then mask. There can be more than one tax related column in the .csv file. Example : TaxID_1 , TaxID_2, Tax_ID3 and so on. All we are looking here is the column names that will be hard coded in the script.

How to handle if a column name is like TAX-ID or DATE-OF-BIRTH?

The script reads the input from a table as I said earlier which will have the file name ,the column names and the position of the columns where masking needs to happen.


Quote:
Your text says that a column with the text DOB (apparently in the 1st line of the field) identifies a field number and every that field should be changed on every line in the file to 9999-12-12. So, you are asking us to change the heading in that field from DOB to 9999-12-12??? Do you really want 9999-12-12, or should it be 9999-12-31???
Example : yes you are right 9999-12-31 should be the one. It was type error my end earlier. It is not the heading am asking to replace but for the values in that column. As I said the table will indicate the position number of the column that needs to be masked.


Quote:
Please show us some representative sample input data (with sanitized, but not masked, data) in CODE tags AND show the the corresponding output that should be produced by your script in CODE tags.

Note that you refer to DOB, dob, Date of Birth, Dateof Birth, and DATE-OF-BIRTH in the above quoted text; I assume that you realize that all five of these column headings are different and none of them will compare equal to any of the others!
The .csv file can contain more than 1 column that are related to DOB, Date of Birth or DATE-OF-BIRTH.



Quote:
Are we supposed to assume that all of the files you will be processing will have the same fields in the same order? Or is each file different?

You say that you have <comma> and <vertical-bar> separated files. Does a single file ever contain both delimiters? If you have a <comma> delimited file, will the data in that file ever contain a <vertical-bar> character as data (not as a field delimiter)?
Either at one go the file must be either comma delimited or pipe. but not both.

As I shared the script code earlier the flow goes as shown below
First the script will read the table that contains the column position for dob and tax related column . say the position is 5 and 10 for a column name DateOfBirth and Tax_Id respectively then as these two column names are hard coded in side the script it in an array. It will search for these two column names and if found then mask . Finally we will see in the out put masked file 5 and 10 positions will have 9999-12-31 and xxx-xx-xx

thanks
Mahesh G

Last edited by Don Cragun; 11-07-2016 at 12:05 AM.. Reason: Add QUOTE, CODE, and ICODE tags.
# 6  
Old 11-07-2016
You have a script. (It is grossly inefficient, throws away all diagnostic output, and never checks for errors; so you and we have no idea where something is going wrong if something is going wrong.) You have not said what it is doing wrong. You have not shown us sample CSV files from your database extractions. You have not shown us the output you are getting from your script. And, you have not shown us the output you are hoping to get from your script.

I would guess that instead of invoking awk once for each field to be updated in each file to be updated, you could just invoke awk once to update all of the fields in all of the files. And, this would make your script run much faster. But, without samples and a clear statement of what you are trying to change in your existing script; we don't know what needs to be done.

Please help us help you!
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script to filter records in a zip file that contains matching columns from another file

Not sure if this is the correct forum for this question. I have two files. file1.zip, file2 Input: file1.zip col1, col2 , col3 a , b , 0:0:0:0:0:c436:9346:d40b x, y, 0:0:0:0:0:880:39f9:c9a7 m, n , 0:0:0:0:0:80c7:9161:fe00 file2.txt col1 c4:36:93:46:d4:0b... (1 Reply)
Discussion started by: anil.v
1 Replies

2. Shell Programming and Scripting

Shell script for .Txt to .csv conversion with data processing

Hi experts, I want to convert a txt file having rows and columns (CNAI_DUMP_raw.txt) by comparing it with another text file (paramaters.txt) and generate a output in CSV which contains only 3rd column from CNAI_DUMP_raw.txt, and the columns mentioned in parameters.txt. FYI: There are two... (16 Replies)
Discussion started by: Gautam Banerjee
16 Replies

3. UNIX for Dummies Questions & Answers

C-Shell script help reading from txt file

I need to write a C-Shell script with these properties: It should accept two arguments on the command line. The first argument is the name of a file which contains a list of names, and the second argument is the name of a directory. For each file in the directory, the script should print the... (1 Reply)
Discussion started by: cerce
1 Replies

4. Shell Programming and Scripting

Shell script to send an email from the txt file

Hi Friends, Could you guys help me out of this problem... I need to send an email to all the users and the email has to be picked from the text file. text file contains the no. of records like: giridhar 224285 847333 giridhar276@gmail.com ramana 84849 33884 venkata.ramana@gmail.com... (6 Replies)
Discussion started by: giridhar276
6 Replies

5. Solaris

Get file name in shell scrip loop: bad substitution

Hi guys. Good day, morning, afternoon or night, depending on where you live. I have a script shell in which I am looping on files (absolute path) see code section above. I always have an error: bad substitution. :wall: Is it because my variable file is the index of the loop and not a normal... (4 Replies)
Discussion started by: soueric
4 Replies

6. Shell Programming and Scripting

Conversion of below Tabs Tex file into CSV format file : shell script needed

Request if some one could provide me shell script that converts the below "input file" to "CSV format file" given Name Domain Contact Phone Email Location ----------------------- ------------------------------------------------ ------- ----- ---------------------------------... (7 Replies)
Discussion started by: sreenath1037
7 Replies

7. Shell Programming and Scripting

how to combine two files into one file using shell scrip

eg. file 1 has: 4 0 8628380 653253 0 0 0 0 0 0 2 0 8626407 655222 0 0 0 0 0 0 4 0 8633729 647892 0 0 0 0 0 0 5 0 8646253 635367 0 0 0 0 0 0 file 2 has: 4798 48717 11554 5408 56487 14359 6010 58415 15220 5541 41044... (2 Replies)
Discussion started by: netbanker
2 Replies

8. Shell Programming and Scripting

Shell Script to Load data into the database using a .csv file and .ctl file

Since i'm new to scripting i'm findind it difficult to code a script. The script has to be an executable with 2 paramters passed to it.The Parameters are 1. The Control file name(.ctl file) 2. The Data file name(.csv file) Does anybody have an idea about it? :confused: (3 Replies)
Discussion started by: Csmani
3 Replies

9. AIX

How to edit txt file by shell script?

What I want to do is just delete some lines from a text file, I know it's easy using copy and redirect function, but what I have to do is edit this file (delete the lines) directly, as new lines may be added to the text file during this period. Can AIX do this ? # cat text 1:line1 2:line2... (3 Replies)
Discussion started by: dupeng
3 Replies
Login or Register to Ask a Question