11-11-2005
Multibyte characters to ASCII
Hello,
Is there any UNIX utility/command/executable that will convert mutlibyte characters to standard single byte ASCII characters in a given file?
and
Is there any UNIX utility/command/executable that will recognize multibyte characters in a given file name?
The typical multibyte character set that we might encounter are Chinese and or Japanese.
Thanks
Jerardfjay
10 More Discussions You Might Find Interesting
1. Programming
i know it's out there, but I cannot remember how to check if a given ascii character string contains all digits or not ... any ideas?
ie...function("123") --> OK
function("NOT_A_NUMBER") --> returns error
thanks!! (2 Replies)
Discussion started by: jalburger
2 Replies
2. Shell Programming and Scripting
Hi All,
In the HP Unix that i'm using when i initialise a string as Stalled="'30¬G'"
Stalled=$Stalled" '30¬C'", it is taking the character ¬ as a comma. I need to grep for 30¬G 30¬C in a file and take its count. But since this character ¬ is not being understood, the count returns a zero.
The... (2 Replies)
Discussion started by: roops
2 Replies
3. Shell Programming and Scripting
Hi! I'm trying to separate text into sentences, like this:
$pattern = "/(|]|,)**/";
preg_match_all($pattern, $text, $matches);
This works fine unless the text contains multibyte characters, like "åäö". How can I make this work with these exotic characters? (2 Replies)
Discussion started by: Ilja
2 Replies
4. Shell Programming and Scripting
Hi gurus,
I have a file in unix with ascii values. I need to convert all the ascii values in the file to ascii characters. File contains nearly 20000 records with ascii values. (10 Replies)
Discussion started by: sandeeppvk
10 Replies
5. Shell Programming and Scripting
I am having a file(1234.txt) downloaded from windows server (in Ascii format).However when i ftp this file to Unix server and try to work with it..i am unable to do anything.When i try to open the file using vi editor the file opens in the following format ...
@
@
@
@
@
@
@
@... (4 Replies)
Discussion started by: appu2176
4 Replies
6. Shell Programming and Scripting
Hi! I'm trying to separate text into sentences, like this:
$pattern = "/(|]|,)**/";
preg_match_all($pattern, $text, $matches);
This works fine unless the text contains multibyte characters, like "åäö". How can I make this work with these exotic characters?
An example phrase that doesn't match:... (1 Reply)
Discussion started by: Ilja
1 Replies
7. Shell Programming and Scripting
Hi,
I have many text files which contain some non-ASCII characters. I attach the screenshots of one of the files for people to have a look at. The issue is even after issuing the non-ASCII removal commands one of the characters does not go away. The character that goes away is the black one with a... (2 Replies)
Discussion started by: shoaibjameel123
2 Replies
8. Shell Programming and Scripting
I have been having an encoding problem that I need to solve.
I have an 4-column tab-separated file: I need to remove all of the lines that contain the string 'vis-à-vis'
achiever-n vis-à-vis+ns-j+vp oppose-v 1
achiever-n vis-à-vis+ns-the+vg assess-v 1
administrator-n ... (4 Replies)
Discussion started by: owwow14
4 Replies
9. Shell Programming and Scripting
Hi
I have a requirement to insert a dot "." after a position in each line, say 110th position.
For which, I have written the below command.
cat filename | sed 's/./&\./110' > new_filename
The code is working fine, but when we have multi byte (2 or 3) characters in the input file, the... (3 Replies)
Discussion started by: tostay2003
3 Replies
10. UNIX for Beginners Questions & Answers
Hi,
I'm writing a BBS telnet program. I'm having issues with it not displaying lower ASCII characters. For example, instead of displaying the "smiley face" character (Ctrl-B), it displays ^B. Is this because i'm using Ncurses? If so, is there any way around this?
Thanks. (3 Replies)
Discussion started by: ignatius
3 Replies
LEARN ABOUT OPENDARWIN
cut
CUT(1) BSD General Commands Manual CUT(1)
NAME
cut -- select portions of each line of a file
SYNOPSIS
cut -b list [-n] [file ...]
cut -c list [file ...]
cut -f list [-d delim] [-s] [file ...]
DESCRIPTION
The cut utility selects portions of each line (as specified by list) from each file and writes them to the standard output. If no file argu-
ments are specified, or a file argument is a single dash ('-'), cut reads from from the standard input. The items specified by list can be
in terms of column position or in terms of fields delimited by a special character. Column numbering starts from 1.
The list option argument is a comma or whitespace separated set of increasing numbers and/or number ranges. Number ranges consist of a num-
ber, a dash ('-'), and a second number and select the fields or columns from the first number to the second, inclusive. Numbers or number
ranges may be preceded by a dash, which selects all fields or columns from 1 to the first number. Numbers or number ranges may be followed
by a dash, which selects all fields or columns from the last number to the end of the line. Numbers and number ranges may be repeated, over-
lapping, and in any order. It is not an error to select fields or columns not present in the input line.
The options are as follows:
-b list
The list specifies byte positions.
-c list
The list specifies character positions.
-d delim
Use the first character of delim as the field delimiter character instead of the tab character.
-f list
The list specifies fields, delimited in the input by a single tab character. Output fields are separated by a single tab character.
-n Do not split multi-byte characters.
-s Suppress lines with no field delimiter characters. Unless specified, lines with no delimiters are passed through unmodified.
ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of cut if the -n option is specified. Their effect is described in
environ(7).
EXAMPLES
Extract users' login names and shells from the system passwd(5) file as ``name:shell'' pairs:
cut -d : -f 1,7 /etc/passwd
Show the names and login times of the currently logged in users:
who | cut -c 1-16,26-38
DIAGNOSTICS
The cut utility exits 0 on success, and >0 if an error occurs.
SEE ALSO
paste(1)
STANDARDS
The cut utility conforms to IEEE Std 1003.2-1992 (``POSIX.2'').
HISTORY
A cut command appeared in AT&T System III UNIX.
BUGS
The -c option is a synonym for the -b option, which causes incorrect behaviour in locales that support multibyte characters.
When operating on fields (-f option is specified), cut does not recognise multibyte characters, and the delim character is recognised in the
middle of multibyte sequences.
BSD
June 6, 1993 BSD