Sponsored Content
Top Forums Shell Programming and Scripting Help with Unicode identification using PERL or AWK Post 302598612 by gimley on Wednesday 15th of February 2012 02:19:35 AM
Old 02-15-2012
Help with Unicode identification using PERL or AWK

Hello,
I have a large file in UTF8 format with around 200 thousand plus strings which have a large number of scripts (code-blocks/code-pages).
I need to extract from the file only the following:
All strings having basic Latin characters: 0021-007E
All strings in the Devanagari range: 0900 to 097F
Has someone written a script in PERL or AWK to handle this. I do not want to reinvent the wheel and hence the request.
Many thanks in advance. I have never tried character identification in PERL or AWK and hence the request.
 

10 More Discussions You Might Find Interesting

1. Solaris

file identification

Can anyone identify what this file is for? 241436 Dec 17 16:29 dtdbcache_:0 Is it necessary? My system is at 94% and I am trying to clean / directory as much as possible. Any other files I can set to dev/null besides messages, and the wtmp and wtmpx? Please and Thanks. (3 Replies)
Discussion started by: mnsalazar
3 Replies

2. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

3. Shell Programming and Scripting

Need Help in Users Identification ( TRU64 )

I'm looking for a script that allows me to export to CSV, the information I need. Somehow, I must gather the User ID, the User Login, the Last User Login, the Password complexity, the Password Age, The Expiration Date, . . . My experience is equal to very, very few. The only thing I have is... (2 Replies)
Discussion started by: catfish
2 Replies

4. Shell Programming and Scripting

version identification

Hi Which command do i use to know which version of solaris am i working on?? thanks in advance regards (1 Reply)
Discussion started by: knopix
1 Replies

5. UNIX for Dummies Questions & Answers

ip identification

how can i find my own ip address from unix. command like who -x .this would provide all the ip address but i need to list only current user ip address. who am i command does not display the ip. (1 Reply)
Discussion started by: naushad
1 Replies

6. UNIX for Dummies Questions & Answers

file identification

Can anybody tell me what are these files are and what do they do and if they are safe to delete. Thanks /var/cache/yum/base # ls -al total 44792 drwxr-xr-x 4 root root 4096 Sep 22 11:43 . drwxr-xr-x 10 root root 4096 Nov 18 2007 .. -rw-r--r-- 1 root root 0 Sep 22... (5 Replies)
Discussion started by: mcraul
5 Replies

7. Shell Programming and Scripting

perl sort unicode non-ascii letters

In another thread (field separator in Perl) I nearly solved my sorting problem and I finally understood the Schwartzian transform especially thank to KevinADC. After that I've found out that the sorting was not done the way I need it. I did not notice it at first because I used all vowels as a... (6 Replies)
Discussion started by: ahsog
6 Replies

8. Shell Programming and Scripting

Ambiguity in unicode, Perl CGI

Hello, I was written a cgi with a textarea to save some words from web. I grab and write words like this: $cgiparams{'CONTENTS'} =~ s/\r//g; #$cgiparams{'CONTENTS'} =~ s/á/á/g; open(TM, ">$editedfilename"); #binmode(TM,... (1 Reply)
Discussion started by: Zaxon
1 Replies

9. Shell Programming and Scripting

file identification

hi there, i have written the following simple lines: find $SCENE -name "*.xml" echo -n "Input the name of the image file to be read: " set im_name = ($<) i like to set the value for im_name automatically to the .xml, which was found by the first line without having to input it. the... (4 Replies)
Discussion started by: friend
4 Replies

10. Shell Programming and Scripting

Perl script backspace not working for Unicode characters

Hello, My Perl script reads input from stdin and prints it out to stdout. After I read input I use BACKSPACE to erase characters. However BACKSPACE does not work with Unicode characters that are multi-bytes. On screen the character is erased but underneath only one byte is deleted instead of all... (3 Replies)
Discussion started by: tdw
3 Replies
XSTR(1) 						      General Commands Manual							   XSTR(1)

NAME
xstr - extract strings from C programs to implement shared strings SYNOPSIS
xstr [ -c ] [ - ] [ file ] DESCRIPTION
Xstr maintains a file strings into which strings in component parts of a large program are hashed. These strings are replaced with refer- ences to this common area. This serves to implement shared constant strings, most useful if they are also read-only. The command xstr -c name will extract the strings from the C source in name, replacing string references by expressions of the form (&xstr[number]) for some number. An appropriate declaration of xstr is prepended to the file. The resulting C text is placed in the file x.c, to then be compiled. The strings from this file are placed in the strings data base if they are not there already. Repeated strings and strings which are suffices of existing strings do not cause changes to the data base. After all components of a large program have been compiled a file xs.c declaring the common xstr space can be created by a command of the form xstr This xs.c file should then be compiled and loaded with the rest of the program. If possible, the array can be made read-only (shared) sav- ing space and swap overhead. Xstr can also be used on a single file. A command xstr name creates files x.c and xs.c as before, without using or affecting any strings file in the same directory. It may be useful to run xstr after the C preprocessor if any macro definitions yield strings or if there is conditional code which contains strings which may not, in fact, be needed. Xstr reads from its standard input when the argument `-' is given. An appropriate command sequence for running xstr after the C preprocessor is: cc -E name.c | xstr -c - cc -c x.c mv x.o name.o Xstr does not touch the file strings unless new items are added, thus make can avoid remaking xs.o unless truly necessary. FILES
strings Data base of strings x.c Massaged C source xs.c C source for definition of array `xstr' /tmp/xs* Temp file when `xstr name' doesn't touch strings SEE ALSO
mkstr(1) BUGS
If a string is a suffix of another string in the data base, but the shorter string is seen first by xstr both strings will be placed in the data base, when just placing the longer one there will do. 3rd Berkeley Distribution May 7, 1986 XSTR(1)
All times are GMT -4. The time now is 09:36 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy