11-11-2016
I'm afraid it's not that easy - in UTF8 (and other) encoded files, characters above the ASCII set are represented by more than one byte, of which every single one will be replaced by a space when running above command. Using the -s option, on the other hand, will squeeze any count of adjacent non-ASCII chars into one single byte.
Last edited by RudiC; 11-11-2016 at 07:15 AM..
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
hello
I have this output
ifspeed 100000000
ifspeed 100000000
collisions 413
collisions 10
duplex full
duplex ... (1 Reply)
Discussion started by: melanie_pfefer
1 Replies
2. Shell Programming and Scripting
Hi,
i have the log attached. Actually i want the long space just become 1 space left
like this :
Rgds, (12 Replies)
Discussion started by: justbow
12 Replies
3. Shell Programming and Scripting
I've got a file (numbers.txt) filled with numbers and I want to replace each one of those numbers with a new random number between 0 and 9. This is my script so far:
#!/bin/bash
rand=$(($RANDOM % 9))
sed -i s//$rand/g numbers.txtThe problem that I have is that it replaces each number with just... (2 Replies)
Discussion started by: hellocatfood
2 Replies
4. Shell Programming and Scripting
Input:
Youcaneasilydothisbyhighlightingyourcode.
Putting space after three characters.
You can eas ily dot his byh igh lig hti ngy our cod e.
How can i do this using sed? (10 Replies)
Discussion started by: cola
10 Replies
5. Shell Programming and Scripting
i need to replace the any special characters with escape characters like below.
test!=123-> test\!\=123
!@#$%^&*()-= to be replaced by
\!\@\#\$\%\^\&\*\(\)\-\= (8 Replies)
Discussion started by: laknar
8 Replies
6. UNIX for Advanced & Expert Users
I created a awk state to calculate the number of success however when the query runs it has a leading zero. Any ideas on how to remove the leading zero from the calculation?
Here is my query:
cat myfile.log | grep | awk '{print $2,$3,$7,$11,$15,$19,$23,$27,$31,$35($19/$15*100)}'
02:00:00... (1 Reply)
Discussion started by: bizomb
1 Replies
7. Shell Programming and Scripting
Thank you for 4 looking this post.
We have a tab delimited file where we are facing problem in a lot of funny character. I have tried using awk but failed that is not working.
In the 5th field ID which is supposed to be a integer only of that file, we are getting corrupted data as below.
I... (12 Replies)
Discussion started by: Srithar
12 Replies
8. UNIX for Dummies Questions & Answers
I would like to remove all characters starting with "%" and ending with ")" in the 4th field - please help!!
1412007819.864 /device/services/heartbeatxx 204 0.547%!i(int=0) 0.434 0.112
1412007819.866 /device/services/heartbeatxx 204 0.547%!i(int=1) 0.423 0.123... (10 Replies)
Discussion started by: snemuk14
10 Replies
9. Shell Programming and Scripting
here's what im trying to do.
i have a file containing lines similar to this:
data.txt:
1hsRmRsbHRiSFZNTTA1dlEyMWFkbU5wUW5CSlIyeDFTVU5SYjJOSFRuWmpia0ZuWXpKV2FHTnRU
1lKUnpWMldrZFZaMG95V25oYQpSelEyWTBka2QyRklhSHBrUjA1b1kwUkJkd3BOVXpWM1lVaG5k... (5 Replies)
Discussion started by: SkySmart
5 Replies
10. UNIX for Beginners Questions & Answers
Hi Folks -
I need help manipulating a file.
For column 2, I need to replace the first 3 leading zeros with spaces.
The file looks like such:
00098|00011250000003|00000000000.0200|D|1|07|51|04INDP |04|00820|CS|000000|092717|000000000000.0000|000|... (3 Replies)
Discussion started by: SIMMS7400
3 Replies
LEARN ABOUT MOJAVE
gb18030
GB18030(5) BSD File Formats Manual GB18030(5)
NAME
gb18030 -- GB 18030 encoding method for Chinese text
SYNOPSIS
ENCODING "GB18030"
DESCRIPTION
The GB18030 encoding implements GB 18030-2000, a PRC national standard for the encoding of Chinese characters. It is a superset of the older
GB 2312-1980 and GBK encodings, and incorporates Unicode's Unihan Extension A completely. It also provides code space for all Unicode 3.0
code points.
Multibyte characters in the GB18030 encoding can be one byte, two bytes, or four bytes long. There are a total of over 1.5 million code
positions.
GB 11383-1981 (ASCII) characters are represented by single bytes in the range 0x00 to 0x7F.
Chinese characters are represented as either two bytes or four bytes. Characters that are represented by two bytes begin with a byte in the
range 0x81-0xFE and end with a byte either in the range 0x40-0x7E or 0x80-0xFE.
Characters that are represented by four bytes begin with a byte in the range 0x81-0xFE, have a second byte in the range 0x30-0x39, a third
byte in the range 0x81-0xFE and a fourth byte in the range 0x30-0x39.
SEE ALSO
euc(5), gb2312(5), gbk(5), utf8(5)
Chinese National Standard GB 18030-2000: Information Technology -- Chinese ideograms coded character set for information interchange --
Extension for the basic set, March 2000.
The Unicode Standard, Version 3.0, The Unicode Consortium, 2000.
STANDARDS
The GB18030 encoding is believed to be compatible with GB 18030-2000.
BSD
August 10, 2003 BSD