UTF-8,16,32 character lengths using awk Post: 302961607

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Using grep to find strings of certain lengths?

I am trying to use grep to find strings of certain lengths that all start with the same letter. Is this possible?:confused:

2. UNIX for Advanced & Expert Users

Convert UTF-8 encoded hex value to a character

Hi, I have a non-ascii character (Ŵ), which can be represented in UTF-8 encoding as equivalent hex value (\xC5B4). Is there a function in unix to convert this hex value back to display the charcter ?

3. Solaris

limit on Solaris username lengths?

Hi this question applies to Solaris 8,9,10 and opensolaris as in my environment it applies to all of these Is there a limit on the size of the username (in /etc/passwd) or indeed does there come a point where, like the 8 character limitation of passwords, the system receives the input but...

4. Shell Programming and Scripting

Read lines with different lengths in while loop

Hi there ! I need to treat files with variable line length, and process the tab-delimited words of each line. The tools I know are some basic bash scripting and sed ... I haven't got to python or perl yet. So my file looks like this obj1 0.01953 0.34576 0.04418 0.01249 obj2 0.78140...

5. Shell Programming and Scripting

Merging data from 2 files of different lengths?

Hi all, Sorry if someone has answered something like this already, but I have a problem. I am not brilliant with "awk" but think it should be the command to use to get what I am after. I have 2 files: job-file (several hundred lines like): 1018003,LONG MU WAN,1113S 1018004,LONG MU...

6. Shell Programming and Scripting

How to modify character to UTF-8 in shell script?

I have a shell script running to load some data from a text file to database. Text file contains some non-ASCII characters like �. How can i convert these characters to UTF-8 codes before loading to DB.

7. UNIX for Dummies Questions & Answers

Issue with UTF-8 BOM character in text file

Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad. Problem is: When saving...

8. Shell Programming and Scripting

Merge two files with different lengths

Hi there, I have two very long files like: file1: two fields 1 123 1 125 1 234 2 123 2 234 2 300 2 312 3 10 3 215 4 56 ...

9. Linux

Help to Convert file from UNIX UTF-8 to Windows UTF-16

Hi, I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt and i am getting some chinese characters as below which l opened the converted file on windows machine. LANG=en_US.UTF-8...

10. Shell Programming and Scripting

Paste files of varying lengths

I have three files of varying lengths and different number of columns. How can I paste all three with all columns aligned? File1 ---- 123 File2 ---- 234 345 678 File3 ---- 456 789 Output should look like: 123 234 456 345 789

LEARN ABOUT MOJAVE

tcl_unichartoupper

Tcl_UtfToUpper(3)					      Tcl Library Procedures						 Tcl_UtfToUpper(3)

__________________________________________________________________________________________________________________________________________________

NAME

       Tcl_UniCharToUpper,  Tcl_UniCharToLower, Tcl_UniCharToTitle, Tcl_UtfToUpper, Tcl_UtfToLower, Tcl_UtfToTitle - routines for manipulating the
       case of Unicode characters and UTF-8 strings

SYNOPSIS

       #include <tcl.h>

       Tcl_UniChar
       Tcl_UniCharToUpper(ch)

       Tcl_UniChar
       Tcl_UniCharToLower(ch)

       Tcl_UniChar
       Tcl_UniCharToTitle(ch)

       int
       Tcl_UtfToUpper(str)

       int
       Tcl_UtfToLower(str)

       int
       Tcl_UtfToTitle(str)

ARGUMENTS

       int ch (in)	       The Tcl_UniChar to be converted.

       char *str (in/out)      Pointer to UTF-8 string to be converted in place.
_________________________________________________________________

DESCRIPTION

       The first three routines convert the case of individual Unicode characters:

       If ch represents a lower-case character, Tcl_UniCharToUpper returns the corresponding upper-case character.  If no upper-case character	is
       defined, it returns the character unchanged.

       If ch represents an upper-case character, Tcl_UniCharToLower returns the corresponding lower-case character.  If no lower-case character is
       defined, it returns the character unchanged.

       If ch represents a lower-case character, Tcl_UniCharToTitle returns the corresponding title-case character.  If no title-case character	is
       defined,  it  returns  the  corresponding upper-case character.	If no upper-case character is defined, it returns the character unchanged.
       Title-case is defined for a small number of characters that have a different appearance when they are at the  beginning	of  a  capitalized
       word.

       The next three routines convert the case of UTF-8 strings in place in memory:

       Tcl_UtfToUpper  changes every UTF-8 character in str to upper-case.  Because changing the case of a character may change its size, the byte
       offset of each character in the resulting string may differ from its original location.	Tcl_UtfToUpper writes a null byte at  the  end	of
       the  converted  string.	 Tcl_UtfToUpper returns the new length of the string in bytes.	This new length is guaranteed to be no longer than
       the original string length.

       Tcl_UtfToLower is the same as Tcl_UtfToUpper except it turns each character in the string into its lower-case equivalent.

       Tcl_UtfToTitle is the same as Tcl_UtfToUpper except it turns the first character in the string into its title-case equivalent and all  fol-
       lowing characters into their lower-case equivalents.

BUGS

       At  this  time,	the  case  conversions are only defined for the ISO8859-1 characters.  Unicode characters above 0x00ff are not modified by
       these routines.

KEYWORDS

       utf, unicode, toupper, tolower, totitle, case

Tcl									8.1							 Tcl_UtfToUpper(3)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Using grep to find strings of certain lengths?

Discussion started by: crabtruck

2. UNIX for Advanced & Expert Users

Convert UTF-8 encoded hex value to a character

Discussion started by: sumirmehta

3. Solaris

limit on Solaris username lengths?

Discussion started by: hcclnoodles

4. Shell Programming and Scripting

Read lines with different lengths in while loop

Discussion started by: jossojjos