Positional insertion for multibyte characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Positional insertion for multibyte characters
# 1  
Old 01-21-2016
Positional insertion for multibyte characters

Hi

I have a requirement to insert a dot "." after a position in each line, say 110th position.

For which, I have written the below command.

Code:
cat filename | sed 's/./&\./110' > new_filename

The code is working fine, but when we have multi byte (2 or 3) characters in the input file, the insertion of dot is not occurring at 110th location. Is there a way to resolve this issue ?
# 2  
Old 01-21-2016
Quote:
Originally Posted by tostay2003
Hi

I have a requirement to insert a dot "." after a position in each line, say 110th position.

For which, I have written the below command.

Code:
cat filename | sed 's/./&\./110' > new_filename

The code is working fine, but when we have multi byte (2 or 3) characters in the input file, the insertion of dot is not occurring at 110th location. Is there a way to resolve this issue ?
What operating system and version of sed are you using?

What codeset is being used to encode the multi-byte characters in your input file?

What locale was being used when you ran the command above?

What do you mean by the 110th location? Do you want to insert a period as the 111th character on the line or do you want to insert a period as the 111th byte on the line?

Using cat in this pipeline wastes system resources and slows down your script:
Code:
sed 's/./&\./110' < filename > new_filename

but fixing that won't change the problem you are reporting.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 01-21-2016
Hi,

Please find the below responses.

Quote:
What operating system and version of sed are you using?
Code:
Linux 2.6.32-573.7.1.e16.x86_64
GNU sed version 4.2.1

Quote:
What codeset is being used to encode the multi-byte characters in your input file?
Code:
I am unaware of how the source file was processed


Quote:
What locale was being used when you ran the command above?
Code:
en_US.UTF-8

Quote:
What do you mean by the 110th location? Do you want to insert a period as the 111th character on the line or do you want to insert a period as the 111th byte on the line?
Code:
I wanted the period at 110th location

# 4  
Old 01-21-2016
I'm a bit surprised as if I try to reproduce your problem, it does not seem to exist with my sed (GNU sed) 4.2.2, unless run with the C locale:
Code:
sed 'p;s/./&\./17' file
1234567890123456789
12345678901234567.89
abcdefghijklmnopqrs
abcdefghijklmnopq.rs
abc§€§€hijklmnopqrs
abc§€§€hijklmnopq.rs
äöüßÄÖÜߧ€äöüßÄÖÜߧ
äöüßÄÖÜߧ€äöüßÄÖÜ.ߧ
abcdefghijklmnopqrs
abcdefghijklmnopq.rs
LC_ALL=C sed 'p;s/./&\./17' file
1234567890123456789
12345678901234567.89
abcdefghijklmnopqrs
abcdefghijklmnopq.rs
abc§€§€hijklmnopqrs
abc§€§€hijk.lmnopqrs
äöüßÄÖÜߧ€äöüßÄÖÜߧ
äöüßÄÖÜß�.�€äöüßÄÖÜߧ
abcdefghijklmnopqrs
abcdefghijklmnopq.rs

Could you be somewhat more specific with your problem?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

How will the behaviour of multibyte char differ because of different LC_CTYPE locale?

I am comparing two multibyte characters in two different platforms having different LC_CTYPE variables, they are returning different values. One of the variable is sigma initialised to "\317\203" and the other one is empty string i.e, "" Below is the scenario of the two platforms: In... (4 Replies)
Discussion started by: baig_1988
4 Replies

2. Shell Programming and Scripting

Insertion into csv

I want to use bash to insert the variable $a into the cell A1 of a csv file mycsv.csv. How do I insert a variable into a specific cell in a csv file? (1 Reply)
Discussion started by: locoroco
1 Replies

3. Shell Programming and Scripting

PHP: preg_match_all with multibyte characters?

Hi! I'm trying to separate text into sentences, like this: $pattern = "/(|]|,)**/"; preg_match_all($pattern, $text, $matches); This works fine unless the text contains multibyte characters, like "åäö". How can I make this work with these exotic characters? An example phrase that doesn't match:... (1 Reply)
Discussion started by: Ilja
1 Replies

4. Shell Programming and Scripting

Insertion in a file

Hi All, I want to insert a line just befor the lst line in a file. please can anyone advise. Cheers, Shazin (3 Replies)
Discussion started by: Shazin
3 Replies

5. Shell Programming and Scripting

PHP: preg_match_all with multibyte characters?

Hi! I'm trying to separate text into sentences, like this: $pattern = "/(|]|,)**/"; preg_match_all($pattern, $text, $matches); This works fine unless the text contains multibyte characters, like "åäö". How can I make this work with these exotic characters? (2 Replies)
Discussion started by: Ilja
2 Replies

6. UNIX for Dummies Questions & Answers

insertion sort???

Hi, I was wondering if someone here could help me figure out what's wrong with this simple insertion sort shell script. This is the output I get when I try to run it: "23 43 22 15 63 43 23 11 10 2 ./insertion.sh: line 23: 23 43 22 15 63 43 23 11 10 2 And here's the script: ... (2 Replies)
Discussion started by: sogpop
2 Replies

7. AIX

problem with Unicode characters insertion

hi, I have a problem with unicode chars ( chinese, japanese etc ) insertion using sqlplus prompt. When i wrote a proc program for it i am able to create records. But when i fore the same query on sql prompt it stores reverse ????? ..some junk. widechar columns are mapped with NVARCHAR datatype.... (0 Replies)
Discussion started by: suman_jakkula
0 Replies

8. UNIX for Advanced & Expert Users

Insertion of Leap Second

Hi All, We are running the HP-UX 11.11 and Linux AS 3.0. so, shall we need to make any changes for leap second i.e. insert the leap second on 1st Jan 2006 or does the system have some setup which would take care of this automatically. Please advise. Regards, Inder (2 Replies)
Discussion started by: isingh786
2 Replies

9. Shell Programming and Scripting

Multibyte characters to ASCII

Hello, Is there any UNIX utility/command/executable that will convert mutlibyte characters to standard single byte ASCII characters in a given file? and Is there any UNIX utility/command/executable that will recognize multibyte characters in a given file name? The typical multibyte... (8 Replies)
Discussion started by: jerardfjay
8 Replies

10. Shell Programming and Scripting

split string with multibyte delimiter

Hi, I need to split a string, either using awk or cut or basic unix commands (no programming) , with a multibyte charectar as a delimeter. Ex: abcd-efgh-ijkl split by -efgh- to get two segments abcd & ijkl Is it possible? Thanks A.H.S (1 Reply)
Discussion started by: azmathshaikh
1 Replies
Login or Register to Ask a Question