In principle not too difficult:
, but you should be aware that these character graphics chars usually belong to a multibyte character set like utf-8 or so which may impose restrictions.
Hi,
How do I remove the lines where special characters or Unicode characters appear?
The following query does work but I wonder if there is a better way.
cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.'
The following lines show that my query is incomplete.
Warning: The word "*Khan" is... (1 Reply)
Hi there,
I'd like to write a script that removes any set of character from any string. The first argument would be the string, the second argument would be the characters to remove. For example:
$ myscript "My name's Santiago. What's yours?" "atu"
My nme's Snigo. Wh's yors?
I wrote the... (11 Replies)
Hello,
Is there a simpler way to remove special characters (color codes) from each lines in a log file?
I use sed like in the example below but I think there should be a more simple way to achieve the same result:
$ cat -vet file1
^, , , ,
Maybe to convert the file somehow?
... (5 Replies)
Dear Members,
We have a file which contains some special characters. I need to replace these special character by a new line character(\n).
The Special character is \x85.
I am not sure what this character means and how we can remove it.
Any inputs are greatly appreciated.
Thanks... (5 Replies)
hello all
I am writing a perl code and i wish to remove the special characters for text.
I wish to remove all extended ascii characters. If the list of special characters is huge, how can i do this using substitute command
s/specialcharacters/null/g
I really want to code like... (3 Replies)
Hi All,
I have a variable like
AVAIL="\
BACK:bkpstg:testdb3.iad.expertcity.com:backtest|\
#AUTH:authstg:testdb3.iad.expertcity.com:authiapd|\
TEST:authstg:testdb3.iad.expertcity.com:authiapd|\
"
What I want to do here is that If a find # before any entry, remove the entire string... (5 Replies)
Hi,
I have a input of the form:
..., word1, word2, word3...
I want out put of the form
word1, word2, word3
I tried echo '..., word1, word2, word3...' | tr -d '...,'
but that takes out the commas in the middle too so I get
word1 word2 word3
but I want the commas in the middle.
... (3 Replies)
Hi,
I have string like this ="Lookup Procedure"
But i want the output like this Lookup Procedure
=," should be removed.
Please suggest me the solution.
Regards,
Madhuri (2 Replies)
Hi Gurus,
I have file which contains some unicode charachator like "ü". I want to replace it with some charactors. I searched in internet and got command sed "s/ü/-/g", but I don't know how to type ü in unix command line.
Please help me for this one.
Thanks in advance (7 Replies)
Hi,
I have a "|" delimited file that is exported from a database.
There is one column in the file which has description/comments entered by some application user. It has "Control-M" character and "New Line" character in between the text.
Hence, when i export the data, this record with the new... (4 Replies)
Discussion started by: tarun.trehan
4 Replies
LEARN ABOUT FREEBSD
multibyte
MULTIBYTE(3) BSD Library Functions Manual MULTIBYTE(3)NAME
multibyte -- multibyte and wide character manipulation functions
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <limits.h>
#include <stdlib.h>
#include <wchar.h>
DESCRIPTION
The basic elements of some written natural languages, such as Chinese, cannot be represented uniquely with single C chars. The C standard
supports two different ways of dealing with extended natural language encodings: wide characters and multibyte characters. Wide characters
are an internal representation which allows each basic element to map to a single object of type wchar_t. Multibyte characters are used for
input and output and code each basic element as a sequence of C chars. Individual basic elements may map into one or more (up to MB_LEN_MAX)
bytes in a multibyte character.
The current locale (setlocale(3)) governs the interpretation of wide and multibyte characters. The locale category LC_CTYPE specifically
controls this interpretation. The wchar_t type is wide enough to hold the largest value in the wide character representations for all
locales.
Multibyte strings may contain 'shift' indicators to switch to and from particular modes within the given representation. If explicit bytes
are used to signal shifting, these are not recognized as separate characters but are lumped with a neighboring character. There is always a
distinguished 'initial' shift state. Some functions (e.g., mblen(3), mbtowc(3) and wctomb(3)) maintain static shift state internally,
whereas others store it in an mbstate_t object passed by the caller. Shift states are undefined after a call to setlocale(3) with the
LC_CTYPE or LC_ALL categories.
For convenience in processing, the wide character with value 0 (the null wide character) is recognized as the wide character string termina-
tor, and the character with value 0 (the null byte) is recognized as the multibyte character string terminator. Null bytes are not permitted
within multibyte characters.
The C library provides the following functions for dealing with multibyte characters:
Function Description
mblen(3) get number of bytes in a character
mbrlen(3) get number of bytes in a character (restartable)
mbrtowc(3) convert a character to a wide-character code (restartable)
mbsrtowcs(3) convert a character string to a wide-character string (restartable)
mbstowcs(3) convert a character string to a wide-character string
mbtowc(3) convert a character to a wide-character code
wcrtomb(3) convert a wide-character code to a character (restartable)
wcstombs(3) convert a wide-character string to a character string
wcsrtombs(3) convert a wide-character string to a character string (restartable)
wctomb(3) convert a wide-character code to a character
SEE ALSO mklocale(1), setlocale(3), stdio(3), big5(5), euc(5), gb18030(5), gb2312(5), gbk(5), mskanji(5), utf8(5)STANDARDS
These functions conform to ISO/IEC 9899:1999 (``ISO C99'').
BSD April 8, 2004 BSD