sed replacement in unicode file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed replacement in unicode file
# 1  
Old 03-07-2009
Question sed replacement in unicode file

Hi there,
I have a file generated by a windows registry (it's unicode) and can't get to do some replacements on it. I want to join lines that end with backslash with the next one.
Code:
santiago@ks354286:~$ cat win.reg
ÿþWindows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\HARDWARE\ACPI\FACS]
"00000000"=hex:46,41,43,53,40,00,00,00,fd,01,00,00,00,00,00,00,00,00,00,00,01,\
  00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
  00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00
santiago@ks354286:~$ sed ':a /\\$/N; s/\\  //; ta' win.reg
ÿþWindows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\HARDWARE\ACPI\FACS]
"00000000"=hex:46,41,43,53,40,00,00,00,fd,01,00,00,00,00,00,00,00,00,00,00,01,\
  00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
  00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00

The output should be something like :
Code:
ÿþWindows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\HARDWARE\ACPI\FACS]
"00000000"=hex:46,41,43,53,40,00,00,00,fd,01,00,00,00,00,00,00,00,00,00,00,01,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00

How would you do that? Remember it's unicode!
Thanks for your help
Santiago
# 2  
Old 03-08-2009
Unicode has nothing to do with the problem to hand. You can easily do what you want to do using an ed script.
Code:
2,4j
2s/ //g
2s/\\//g
wq

Save the aboved ed commands to a file, e.g. ed.script, and execute as follows
Code:
ed win.reg < ed.script

# 3  
Old 03-08-2009
Thanks fpmurphy for your answer.
I'm sorry but I think that the following trace demonstrates that unicode makes a difference. win.reg is my initial unicode file (as shown by cat -v but I have colorized non printing characters). win2.reg is a copy of win.reg but in ascii format (as shown by cat -v).
It's obvious then that the sed replacement works in one case, not in the other.
Code:
santiago@ks354286:~$ cat -v win.reg
M-^?M-~W^@i^@n^@d^@o^@w^@s^@ ^@R^@e^@g^@i^@s^@t^@r^@y^@ ^@E^@d^@i^@t^@o^@r^@ ^@V^@e^@r^@s^@i^@o^@n^@ ^@5^@.^@0^@0^@^M^@
^@^M^@
^@[^@H^@K^@E^@Y^@_^@L^@O^@C^@A^@L^@_^@M^@A^@C^@H^@I^@N^@E^@\^@H^@A^@R^@D^@W^@A^@R^@E^@\^@A^@C^@P^@I^@\^@F^@A^@C^@S^@]^@^M^@
^@"^@0^@0^@0^@0^@0^@0^@0^@0^@"^@=^@h^@e^@x^@:^@4^@6^@,^@4^@1^@,^@4^@3^@,^@5^@3^@,^@4^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@f^@d^@,^@0^@1^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@1^@,^@\^@^M^@
^@ ^@ ^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@\^@^M^@
^@ ^@ ^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@,^@0^@0^@^M^@
santiago@ks354286:~$ cat -v win2.reg
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\HARDWARE\ACPI\FACS]
"00000000"=hex:46,41,43,53,40,00,00,00,fd,01,00,00,00,00,00,00,00,00,00,00,01,\
  00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
  00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00
santiago@ks354286:~$ sed ':a /\\$/N; s/\\\n  //; ta' win.reg
ÿþWindows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\HARDWARE\ACPI\FACS]
"00000000"=hex:46,41,43,53,40,00,00,00,fd,01,00,00,00,00,00,00,00,00,00,00,01,\
  00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
  00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00
santiago@ks354286:~$ sed ':a /\\$/N; s/\\\n  //; ta' win2.reg
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\HARDWARE\ACPI\FACS]
"00000000"=hex:46,41,43,53,40,00,00,00,fd,01,00,00,00,00,00,00,00,00,00,00,01,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00

I have tested your proposition though and it doesn't seem to work:
Code:
santiago@ks354286:~$ ed win2.reg.bak < ed.script
293
?
santiago@ks354286:~$ ed win.reg.bak < ed.script
599
?

Any other idea?
# 4  
Old 03-08-2009
perl -p -e 's/\\\n//g;s/ +//g' win.reg > output.txt
# 5  
Old 03-08-2009
Thanks ldapswandog,
Again, it works with ascii but not with unicode.
# 6  
Old 03-09-2009
You can try this to join the lines and remove non-ascii characters:
Code:
perl -pe 's/\000/ /g; s/[^[:ascii:]]//g; s/\\\n//g;s/ +//g' win.reg

Btw, it seem more like you have plenty of ^@ (nul) which is actually ascii. So 's/\000/ /g' is to remove the nul character. Let us know how it goes.
# 7  
Old 03-11-2009
Thanks rikxik for considering my request.

Your command doesn't work:
1) It removes ALL spaces and I only want to remove the 2 first spaces at the begining of a line if this line is merged to the previous one.
2) It doesn't join lines
3) It changes the file to ascii and I need the file to remain unicode.

Here is the output:
Code:
santiago@ks354286:~$ perl -pe 's/\000/ /g; s/[^[:ascii:]]//g; s/\\\n//g;s/ +//g' win.reg
WindowsRegistryEditorVersion5.00

[HKEY_LOCAL_MACHINE\HARDWARE\ACPI\FACS]
"00000000"=hex:46,41,43,53,40,00,00,00,fd,01,00,00,00,00,00,00,00,00,00,00,01,\
00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,\
00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00

Any other idea?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multiple Replacement in a Text File in one operation (sed/awk) ?

Hi all, Saying we have two files: 1. A "Reference File" whose content is "Variable Name": "Variable Value" 2. A "Model File" whose content is a model program in which I want to substitute "VariableName" with their respective value to produce a third file "Program File" which would be a... (4 Replies)
Discussion started by: dae
4 Replies

2. Shell Programming and Scripting

Solution for replacement of 4th column with 3rd column in a file using awk/sed preserving delimters

input "A","B","C,D","E","F" "S","T","U,V","W","X" "AA","BB","CC,DD","EEEE","FFF" required output: "A","B","C,D","C,D","F" "S", T","U,V","U,V","X" "AA","BB","CC,DD","CC,DD","FFF" tried using awk but double quotes not preserving for every field. any help to solve this is much... (5 Replies)
Discussion started by: khblts
5 Replies

3. Shell Programming and Scripting

Sed: how to use file contents in replacement string

I want to replace a string by contents of file. I am trying the following sed command: cat sample | sed "s^<enter description here>^`cat details`^" But it is not working. a=`cat details` and using $a will not help since it will affect the whitespaces. What am I missing in the above sed... (5 Replies)
Discussion started by: anand_bh
5 Replies

4. Shell Programming and Scripting

sed - replacement file path with variable - Escaping / character

Hi,, I have the line below in a file: $!VarSet |LFDSFN1| = '"E:\APC\Trials\20140705_427_Prototype Trial\Data\T4_20140705_Trial_Cycle_Data_13_T_Norm.txt" "VERSION=100 FILEEXT=\"*.txt\" FILEDESC=\"General Text\" "+""+"TITLE{SEARCH=NONE NAME=\"New Dataset\" LINE=1I want to write a script to change... (2 Replies)
Discussion started by: carlr
2 Replies

5. Shell Programming and Scripting

sed replacement in file when line is in a variable

Hi, I have a file where I want to replace the 15th field separated by comma, only on specific lines matching lots of different conditions. I have managed to read the file line by line, within the loop my line is held in a variable called $line I assume this will be using sed (maybe... (5 Replies)
Discussion started by: jpt123
5 Replies

6. Shell Programming and Scripting

How do I replace a unicode character using sed

I have a unicode character {Unicode: 0x1C} in my file and I need to replace it with a blank. How would a sed command look like? cat file1 | sed "s/&#x28;//g;" > file2 Is X28 the right value for this Unicode character?? (4 Replies)
Discussion started by: Hangman2
4 Replies

7. Shell Programming and Scripting

sed xml file multiple line replacement

I have a file called config.xml, it's a simple xml file, and I need use sed/awk to erase some lines. <machine xsi:type="unix-machineType"> <name>server1</name> <node-manager> <name>server1</name> <listen-address>server1</listen-address> </node-manager> ... (3 Replies)
Discussion started by: cbo0485
3 Replies

8. Shell Programming and Scripting

Need Replacement for sed

Hi Can anyone provide me the replacement of sed with xargs perl syntax for the below sed -e :a -e '/;$/!N;s/\n//; ta' -e 's/;$//' This should be without looping has to take minimal time for search (0 Replies)
Discussion started by: dbsurf
0 Replies

9. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

10. UNIX for Dummies Questions & Answers

Replacement using sed

Hi I have the following file that i need to run a sed command on 1<tab>running 2<tab>running 3<tab>running 4<tab>running I want to be able to replace a line i.e the second one with '2<tab>failed'. As the first number is unique that can be used to search for the relevant line (using ^2 i... (5 Replies)
Discussion started by: handak9
5 Replies
Login or Register to Ask a Question