Large File masking incorrectly happening Ç delimeter issue


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Large File masking incorrectly happening Ç delimeter issue
# 1  
Old 02-07-2019
Large File masking incorrectly happening Ç delimeter issue

The OS version is
Red Hat Enterprise Linux Server release 6.10
I have a script to mask some columns with **** in a data file which is delimeted with Ç ,
I am using awk for the masking , when I try to mask a small file the awk works fine and masks the required column ,
but when the file is large the masked file gets appended with â<96><92> and the masking does not happen properly.

the data in the files is as below
Code:
"D"Ç"20181224"Ç183593739656ÇÇ"C"ÇÇ865Ç"Test TEST"ÇÇÇÇÇÇÇÇÇ"1262548446"ÇÇÇ"CLIENT"Ç"Y"ÇÇÇÇÇÇÇ"009171562000"ÇÇXXXÇÇÇ4Ç"Status Not Known"Ç2738000.000000000000000ÇÇ"SSS"ÇÇÇÇ2843382.526000000000000ÇÇÇÇÇ0.050000000ÇÇÇÇÇÇÇ"912810QU51"ÇÇÇÇ"SS"Ç"SSSSSS"ÇÇÇXXXÇÇÇ"99991231"ÇÇÇÇÇÇXXXÇÇÇÇÇÇÇÇÇÇÇÇÇÇÇÇ"531648568"Ç19ÇÇ"31648568"ÇÇ"PARTY"Ç"1648568"Ç"4"Ç"COMB"Ç"D2792331"Ç"D2812619"


the script is as below
Code:
 columnArray= (30,61)
for i in "${columnArray[@]}"
do
echo "replacing values for column number  $i"
position=$i
awk -v col="$position" -v var=$replaceval -F "Ç" 'BEGIN {OFS = FS} NR==1; NR > 1 && NR < '$file_count' {$col = var; print}; END{print}'  "$filename" > "$filename_pre"_masked."$filename_ext"
chmod 775 "$filename_pre"_masked."$filename_ext"
 mv "$filename_pre"_masked."$filename_ext" "$filename"
 
 done

I have attached the complete sh file for refrence.
Any quick help appreciated.

Last edited by vbe; 02-07-2019 at 04:30 AM.. Reason: code tags
# 2  
Old 02-07-2019
Moving thread from "how to post..." to appropriate forum
To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)



Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums
# 3  
Old 02-07-2019
Quote:
Originally Posted by LinuxUser8092
... I have attached the complete sh file for refrence. ...
Sure? The variables "replaceval", "file_count", "$filename_pre", "$filename_ext", and "$filename" seem undefined.


For your problem:
- any error messages?

- for how many lines does the script work correctly? When do the errors start? Any structural difference between the last line working and the first one failing?

- that "â<96><92>" is appended where?
- "does not happen properly" means what? partial replacemant? No replacement?
# 4  
Old 02-07-2019
Code:
Sure? The variables "replaceval", "file_count", "$filename_pre", "$filename_ext", and "$filename" seem undefined.
.......

Please find my answers below

Sure? The variables "
Code:
replaceval", "file_count", "$filename_pre", "$filename_ext", and "$filename"

seem undefined.

yes they are working


For your problem:
- any error messages?
No

- for how many lines does the script work correctly?
Its a delimiter issue for few lines also its causing the issue
When do the errors start? No error Messages
Any structural difference between the last line working and the first one failing? No


- that "â<96><92>" is appended where? end of each line
- "does not happen properly" means what? partial replacemant? No replacement? No replacement

However
when I ran the below command on my data file my script works fine
Code:
iconv -f ISO-8859-1 -t UTF-8 testact.data_orig.data >testact.data

Can you explain why its working after running this command on my file.
# 5  
Old 02-07-2019
In ISO 8859, Ç is the single byte E7, not a multibyte sequence. Garbage in, garbage out.
# 6  
Old 02-08-2019
Is there any other way without running the command
Code:
iconv -f ISO-8859-1 -t UTF-8 testact.data_orig.data >testact.data

to resolve this issue.
# 7  
Old 02-08-2019
Using the ISO-8859-1 field separator of -F$'\xE7' perhaps?

That is bash syntax, other shells might need to do VAR=$(printf "\xe7") or the like to get that byte into a string so you can do -F"$VAR"

Or convince whatever's generating these files to use your preferred character set.

Last edited by Corona688; 02-08-2019 at 11:40 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Ls incorrectly says file not found in sftp.

Hi all When I sftp to an Oracle cloud server, to a folder where there are more than 10k files, and list a small subset of files, it works OK. When I try list more than 10k files, it says file not found. Example output below. (FYI ls/mget with 6k files works OK on this server) Has anyone... (3 Replies)
Discussion started by: pdinsdale
3 Replies

2. Shell Programming and Scripting

Split line of file from delimeter.

I have a below file. INPUT FILE select * from customer MERGE INTO Archive; delete from Employee; using select * from customer; delete from employee; select * from Employee; insert into employee(1,1); OUTPUT FILE select * from customer MERGE INTO Archive delete from Employee using... (5 Replies)
Discussion started by: Mohin Jain
5 Replies

3. UNIX for Dummies Questions & Answers

Large file data handling issue

I have a single record large file, semicolon ';' and pipe '|' separated. I am doing a vi on the file. It is throwing an error "File to long" I need to actually remove the last | symbol from this file. sed -e 's/\|*$//' filename is working fine for small files. But not working on this big... (13 Replies)
Discussion started by: Gurkamal83
13 Replies

4. Shell Programming and Scripting

Replacing the delimeter with other delimeter

Hi Friends, I have a file1.txt as below 29123973Ç2012-0529Ç35310124Ç000000000004762Ç00010Ç20Ç390ÇÇÇÇF 29123974Ç20120529Ç35310125Ç0000000000046770Ç00010Ç20Ç390ÇÇÇÇF 29123975Ç20120529Ç35310126Ç0000000000046804Ç00010Ç20Ç390ÇÇÇÇF 29123976Ç20120529Ç35310127Ç0000000000044820Ç00010Ç20Ç390ÇÇÇÇF i have a file2.txt... (4 Replies)
Discussion started by: i150371485
4 Replies

5. Shell Programming and Scripting

Totals in a file - incorrectly displaying

Afternoon, I have a script which creates/modifies data into a formatted csv. The trailer record should display 2 columns, the first is a static entry of "T" to identify it as a trailer record. The 2nd is a total of amounts in a column throughout the entire file. My total isn't displaying... (8 Replies)
Discussion started by: mcclunyboy
8 Replies

6. Shell Programming and Scripting

Count the delimeter from a file and delete the row if delimeter count doesnt match.

I have a file containing about 5 million rows, in the file there are some records which has extra delimiter at random position. (we dont know the positions), now we have to Count the delimeter from each row and if the count of delimeter is not matching then I want to delete those rows from the... (5 Replies)
Discussion started by: Akumar1
5 Replies

7. Shell Programming and Scripting

masking issue

Hi I am facing an issue with the below script which has the below line each field being separated with a tab. I need to mask the 8 and 7th field based on following conditions 1. 8th field is 16 in length and is numerics i will mask the middle 6 digits except the first 6 and last 4. input... (2 Replies)
Discussion started by: mad_man12
2 Replies

8. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies

9. Shell Programming and Scripting

Writing to a log file incorrectly

I have this script: #!/bin/ksh ######### Environment Setup ######### PATH=/gers/nurev/menu/pub/sbin:/gers/nurev/menu/pub/bin:/gers/nurev/menu/pub/mac :/gers/nurev/menu/adm/sbin:/gers/nurev/menu/adm/bin:/gers/nurev/menu/adm/mac:/ge... (5 Replies)
Discussion started by: heprox
5 Replies

10. AIX

Ping is happening, telnet is not happening

HI all, Ping is happening to a AIX box...but telnet is not happening... AIX box doesn't have any conslole... Please help how to resolve it. Thanks in advance .. Manu (2 Replies)
Discussion started by: b_manu78
2 Replies
Login or Register to Ask a Question