Shifting of data because of special characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shifting of data because of special characters
# 1  
Old 06-15-2018
Shifting of data because of special characters

Hi Forum.

I have a unique problem that I'm hoping someone can assist me.

I'm generating a fixed width file and one of the output column (person_name at col. pos.#483 defined as string(36) sometimes contains french characters in the name and it causes the next column of data to shift to the left.

For example - First record is valid whereas the second record contains French characters and causes a shift of the next field to the left.
Code:
...0860LNAM01Tom*Brown                             999999999.....
...0860LNAM01RENꧪL蕅SQU瞠                     999999999

To me, it appears that the person_name column is not using the full defined string(36) characters.

Any advice?

Thank you.

Last edited by pchang; 06-15-2018 at 10:42 AM.. Reason: update
# 2  
Old 06-15-2018
Can you show us the output from od -x for the above lines? That should give us the hex character codes to consider and maybe we will see something.



Thanks, in advance,
Robin
# 3  
Old 06-15-2018
Hope this helps - Thanks.

Code:
0000000 3830 3036 4e4c 4d41 3130 6f54 2a6d 7242
0000020 776f 206e 2020 2020 2020 2020 2020 2020
0000040 2020 2020 2020 2020 2020 2020 2020 3920
0000060 3939 3939 3939 3939 2020 2020 2020 2020
0000100 2020 3020 3031 3231 3030 5030 4553 464c
0000120 3030 2030 2020 2020 2020 2020 2020 2020
0000140 2020 2020 2020 2020 2020 2020 2020 2020
*
0000300 2020 2020 2050 2020 2020 2020 2020 2020
0000320 2020 2020 2020 2020 2020 2020 2020 2020
0000340 2020 2020 3020 3030 3030 3030 3030 3030
0000360 3231 3133 3332 3534 3736 2020 2020 2020
0000400 2020 2020 2020 2020 2020 2020 2020 2020
*
0000540 4520 554e 2020 2020 2020 2020 2020 2020
0000560 2020 2020 2020 2020 2020 2020 2020 2020
*
0000740 2020 2020 2020 4320 4441 2020 2020 2020
0000760 2020 2020 2020 2020 2020 2020 2020 2020
*
0001500 2020 2020 2020 2020 2020 3635 3730 3430
0001520 3638 2020 2020 2020 2020 2020 3020 3832
0001540 4c37 4441 3052 3031 3031 4831 4d4f 2045
0001560 2020 3231 2033 4241 2043 4452 2020 2020
0001600 2020 2020 2020 2020 2020 2020 2020 2020
*
0002020 2020 4f4d 544e c352 41a9 2d4c 4f4e 4452
0002040 2020 2020 2020 2020 2020 4351 4820 4831
0002060 3420 3259 2020 2020 2020 2020 2020 2020
0002100 2020 2020 2020 2020 2020 2020 2020 2020
0002120 3030 3030 3030 204e 2020 2020 2020 2020
0002140 2020 2020 2020 2020 2020 2020 2020 2020
0002160 2020 2020 2020 2020 2020 2020 3230 3430
0002200 444c 4345 4150 5350 4142 4354 2048 4154
0002220 474e 4b42 2020 2020 2020 2020 2020 2020
0002240 2020 2020 2020 2020 2020 2020 2020 2020
*
0002500 2020 2020 2020 2020 3030 3331 544c 5432
0002520 3031 3030 3130 3432 4c30 5355 3052 2031
0002540 2020 2020 2020 2020 2031 4e20 2020 2020
0002560 2020 2020 2020 2020 2020 2020 2020 2020
*
0003240 2020 2020 2020 2020 2020 2020 2020 3030
0003260 3030 3030 3030 2020 3043 3030 3030 3331
0003300 3436 3030 3130 3037 3030 3034 2020 2020
0003320 2020 2020 2020 2020 2020 2020 2020 2020
*
0004000 2020 3030 3030 2020 2020 2020 2020 2020
0004020 2020 2020 2020 2020 2020 2020 2020 2020
*
0004140 2020 2020 2020 2030 2020 2020 2020 2020
0004160 2020 2020 2020 2020 2020 2020 2020 2020
*
0004260 3020 3030 3030 3030 3030 3330 3030 3030
0004300 3030 3030 3030 3030 3030 3030 3030 3030
0004320 3030 3630 3032 3030 3030 3030 3030 3030
0004340 2030 2020 2020 2020 2020 2020 2020 2020
0004360 2020 2020 2020 2020 3020 3030 3030 3030
0004400 3030 3030 3030 3030 3030 3030 3030 3030
0004420 2030 2020 2020 2020 2020 2020 2020 2020
0004440 2020 2020 2020 2020 3020 3030 3030 3030
0004460 3030 3035 3030 3030 3030 3030 3030 3030
0004500 2030 2020 2020 2020 2020 2020 3020 3030
0004520 3030 3030 3030 3030 3039 3030 3030 3030
0004540 3030 3334 3033 3030 3030 3030 3030 3936
0004560 3035 3030 3030 3030 3030 3236 3035 3030
0004600 3030 3030 3030 3030 3030 3030 3030 3030
0004620 3030 3030 2030 2020 2020 2020 2020 2020
0004640 3020 3030 3030 3030 3030 3030 3030 3030
0004660 3030 3030 3030 3030 3030 3030 3030 3030
*
0004720 3030 3030 3030 3030 3030 3030 3030 3135
0004740 3238 3130 2038 2020 2020 2020 2020 2020
0004760 2020 2020 2020 2020 2020 2020 2020 2020
*
0005040 2020 2020 2020 2020 2020 2020 3120 3432
0005060 4c30 5355 3052 2032 2020 2020 2020 2020
0005100 2020 2020 2020 2020 2020 2020 2020 2020
*
0006600 2020 2020 2020 2020 3020 3030 3030 3030
0006620 3030 3334 3033 3030 3030 3030 3030 3334
0006640 3033 3030 3030 3030 3030 3035 3030 3030
0006660 3030 3030 3030 3030 3030 3030 3030 3030
*
0006740 3030 3030 3030 3630 3037 3030 3030 3030
0006760 3030 3630 3037 3030 3030 3030 3030 3030
0007000 3030 3030 3030 3030 3030 3030 3030 3030
*
0007040 3030 3030 2030 2020 2020 2020 2020 2020
0007060 3020 3030 3030 3030 3030 3035 2030 2020
0007100 2020 2020 2020 2020 2020 2020 2020 2020
*
0007400 2020 2020 3020 3030 5438 4e45 0d44 300a
0007420 3638 4c30 414e 304d 5231 4e45 a7ea 4caa
0007440 95e8 5385 5551 9ee7 20a0 2020 2020 2020
0007460 2020 2020 2020 2020 2020 2020 3920 3939
0007500 3939 3939 3939 2020 2020 2020 2020 2020
0007520 3020 3031 3231 3030 5030 4553 464c 3030
0007540 2030 2020 2020 2020 2020 2020 2020 2020
0007560 2020 2020 2020 2020 2020 2020 2020 2020
*
0007720 2020 2050 2020 2020 2020 2020 2020 2020
0007740 2020 2020 2020 2020 2020 2020 2020 2020
0007760 2020 3020 3030 3030 3030 3030 3030 3231
0010000 3133 3332 3534 3736 2020 2020 2020 2020
0010020 2020 2020 2020 2020 2020 2020 2020 2020
*
0010140 2020 2020 2020 2020 2020 2020 2020 4520
0010160 554e 2020 2020 2020 2020 2020 2020 2020
0010200 2020 2020 2020 2020 2020 2020 2020 2020
*
0010360 2020 2020 4320 4441 2020 2020 2020 2020
0010400 2020 2020 2020 2020 2020 2020 2020 2020
*
0011120 2020 2020 2020 2020 3335 3638 3433 3433
0011140 2020 2020 2020 2020 2020 3020 3832 4c37
0011160 4441 3052 3031 3031 4831 4d4f 2045 2020
0011200 4f4d 544e ea52 81a8 204c 2020 2020 2020
0011220 2020 2020 2020 2020 2020 2020 2020 2020
*
0011440 4f4d 544e e852 8c80 2020 2020 2020 2020
0011460 2020 2020 2020 2020 4351 4820 5431 3320
0011500 334e 2020 2020 2020 2020 2020 2020 2020
0011520 2020 2020 2020 2020 2020 2020 2020 3030
0011540 3030 3030 204e 2020 2020 2020 2020 2020
0011560 2020 2020 2020 2020 2020 2020 2020 2020
0011600 2020 2020 2020 2020 2020 3230 3430 444c
0011620 4345 4150 5350 4142 4354 2048 4154 474e
0011640 4b42 2020 2020 2020 2020 2020 2020 2020
0011660 2020 2020 2020 2020 2020 2020 2020 2020
*
0012120 2020 2020 2020 3030 3331 544c 5432 3031
0012140 3030 3130 3432 4c30 5355 3052 2031 2020
0012160 2020 2020 2020 2031 4e20 2020 2020 2020
0012200 2020 2020 2020 2020 2020 2020 2020 2020
*
0012660 2020 2020 2020 2020 2020 2020 3030 3030
0012700 3030 3030 2020 3043 3030 3030 3331 3436
0012720 3030 3130 3636 3034 3737 2020 2020 2020
0012740 2020 2020 2020 2020 2020 2020 2020 2020
*
0013420 3030 3030 2020 2020 2020 2020 2020 2020
0013440 2020 2020 2020 2020 2020 2020 2020 2020
*
0013560 2020 2020 2030 2020 2020 2020 2020 2020
0013600 2020 2020 2020 2020 2020 2020 2020 2020
*
0013660 2020 2020 2020 2020 2020 2020 2020 3020
0013700 3030 3030 3030 3030 3430 3033 3030 3030
0013720 3030 3030 3030 3030 3030 3030 3030 3030
0013740 3030 3030 3030 3030 3030 3030 3131 2032
0013760 2020 2020 2020 2020 2020 2020 2020 2020
0014000 2020 2020 2020 3020 3030 3030 3030 3030
0014020 3030 3030 3030 3030 3030 3030 3030 2030
0014040 2020 2020 2020 2020 2020 2020 2020 2020
0014060 2020 2020 2020 3020 3030 3030 3030 3030
0014100 3035 3030 3030 3030 3030 3030 3030 2030
0014120 2020 2020 2020 2020 2020 3020 3030 3030
0014140 3030 3030 3332 3036 3030 3030 3030 3030
0014160 3230 3034 3030 3030 3030 3030 3237 3036
0014200 3030 3030 3030 3030 3236 3037 3030 3030
0014220 3030 3030 3030 3030 3030 3030 3030 3030
0014240 3030 2030 2020 2020 2020 2020 2020 3020
0014260 3030 3030 3030 3030 3030 3030 3030 3030
*
0014340 3030 3030 3030 3030 3030 3030 3135 3238
0014360 3130 2038 2020 2020 2020 2020 2020 2020
0014400 2020 2020 2020 2020 2020 2020 2020 2020
*
0014460 2020 2020 2020 2020 2020 3120 3432 4c30
0014500 5355 3052 2032 2020 2020 2020 2020 2020
0014520 2020 2020 2020 2020 2020 2020 2020 2020
*
0016220 2020 2020 2020 3020 3030 3030 3030 3030
0016240 3230 3034 3030 3030 3030 3030 3230 3034
0016260 3030 3030 3030 3030 3035 3030 3030 3030
0016300 3030 3030 3030 3030 3030 3030 3030 3030
*
0016360 3030 3030 3734 3036 3030 3030 3939 3939
0016400 3939 3039 3030 3030 3030 3030 3030 3030
0016420 3030 3030 3030 3030 3030 3030 3030 3030
*
0016460 3030 2030 2020 2020 2020 2020 2020 3020
0016500 3030 3030 3030 3030 3030 2030 2020 2020
0016520 2020 2020 2020 2020 2020 2020 2020 2020
*
0017020 2020 3020 3030 5438 4e45 0d44 000a
0017035

# 4  
Old 06-15-2018
What encoding / character set do you use? What locale? Are your (text) tools multibyte encoding capable?
# 5  
Old 06-16-2018
Hi pchang...

First observations are:
  1. You seem to have a flat file of pure spaces, ASCII character 0x20.
  2. The printout is little endian.
  3. The file has 2 occasions of a WINDOWS style <CR><NL> pair. 0d44 300a and 0d44 000a , that is xx0d 0axx when reversed...
  4. The unicode characters are doing exctly what they are supposed to do and fill up your spaces, however......
  5. As those spaces determine your layout then those uncode characters eat up your layout spaces anything from 1 to 3 spaces per character; not including the displayed character itself, depending on the unicode character.

I would suggest looking into RudiC's reply as a starter point for us to carry on...

Last edited by rbatte1; 06-19-2018 at 04:00 AM.. Reason: Formatted numbered list with LIST=1 tags
# 6  
Old 06-16-2018
Quote:
Originally Posted by RudiC
What encoding / character set do you use? What locale? Are your (text) tools multibyte encoding capable?
Thanks for all of your replies.

We are using an ETL tool (informatica) to generate the file. Codepage is currently MS Windows Latin 1 and there are other codepages we can select from.

Not sure what the locale is - how do I find that out?
# 7  
Old 06-17-2018
What be the OS and shell versions? Are you on a *nix system at all? If yes, the locale command will output your settings.
There seem to be a few non-ASCII characters in your file (as expected, BTW). Try the iconv or recode *nix commands to convert from your "codepage" to your locale char encoding.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Insert and shifting data at column

Hi all, i have data like this joe : 1 :a bob : 2 :b sue : 3 :c foo : 4 :d at column 2 i want to insert TOP to the top column and at column 3 i want to insert BOTTOM to the bottom column. and the result will... (12 Replies)
Discussion started by: psychop13
12 Replies

2. UNIX for Advanced & Expert Users

special characters in IF TEST

I'm using Korn shell. I'm doing an IF TEST for lots of characters and don't know how to also check for single quote and parentheses and slash. I'm reading a file and some records have garbage characters in them. The following works, but how do I add single quote, parentheses and slash to the IF... (3 Replies)
Discussion started by: sboxtops
3 Replies

3. Shell Programming and Scripting

HOw to find special characters

I have flat file which has data like this glid¿as_liste¿025175456 How can I print these lines into new file? (4 Replies)
Discussion started by: sol_nov
4 Replies

4. Shell Programming and Scripting

Replace special characters with Escape characters?

i need to replace the any special characters with escape characters like below. test!=123-> test\!\=123 !@#$%^&*()-= to be replaced by \!\@\#\$\%\^\&\*\(\)\-\= (8 Replies)
Discussion started by: laknar
8 Replies

5. Shell Programming and Scripting

special characters

Hey guys, I'm trying to replace "]Facebook" from the text but sed 's/]Facebook/Johan/g' is not working could you please help me with that? (6 Replies)
Discussion started by: Johanni
6 Replies

6. UNIX for Dummies Questions & Answers

How to see special characters?

Hi all, I was wondering how can i see the special characters like \t, \n or anything else in a file by using Nano or any other linux command like less, more etc (6 Replies)
Discussion started by: gvj
6 Replies

7. Shell Programming and Scripting

remove special characters

hello all I am writing a perl code and i wish to remove the special characters for text. I wish to remove all extended ascii characters. If the list of special characters is huge, how can i do this using substitute command s/specialcharacters/null/g I really want to code like... (3 Replies)
Discussion started by: vasuarjula
3 Replies

8. Shell Programming and Scripting

Special characters

When I open a file in vi, I see the following characters: \302\240 Can someone explain what these characters mean. Is it ASCII format? I need to trim those characters from a file. I am doing the following: tr -d '\302\240' ---------- Post updated at 08:35 PM ---------- Previous... (1 Reply)
Discussion started by: sid1982
1 Replies

9. UNIX and Linux Applications

get rid of special characters

Hi Friends, we have recently installed RHEL4.4 and when i give the commd ls -l > tt it prints the file name with some special charactes like ^[[00m1 in the begining of the file name and at the end of the file name. I wanted to use the file names of removing it before taking the backup and... (4 Replies)
Discussion started by: vakharia Mahesh
4 Replies

10. UNIX for Dummies Questions & Answers

special characters

I have one file which is named ^? ( the DEL character ) I'd like to know how to rename or copy the file by using its i-node number TYIA (2 Replies)
Discussion started by: nawnaw
2 Replies
Login or Register to Ask a Question