12-07-2012
Hi.
Quote:
Originally Posted by
gimley
... Does PERL give problems with Unicode? ...
You might want to start with:
perldoc perlunitu, then
man perlunicode
You seem to be using Windows. I have used the utf8 facilities on GNU/Linux systems, but I have no idea whether that might be available in/with ActiveState Perl.
Doing an advanced search here for
perl utf8 yields about 50 hits, some of which may be useful.
Best wishes ... cheers, drl
( Edit 1: add note about advanced search )
Last edited by drl; 12-07-2012 at 07:43 AM..
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
Hi - I tried to remove ^M in a delimited file using "tr -d "\r" and "sed 's/^M//g'", but it does not work quite well. While the ^M is removed, the format of the record is still cut in half, like
a,b, c
c,d,e
The delimited file is generated using sh script by outputing a SQL query result to... (7 Replies)
Discussion started by: sirahc
7 Replies
2. Shell Programming and Scripting
Hi Experts
I am very new to perl and need to make a script using perl.
I would like to remove blanks in a text tab delimited file in in a specfic column range ( colum 21 to column 43) sample input and output shown below :
Input:
117 102 650 652 654 656
117 93 95... (3 Replies)
Discussion started by: Faisal Riaz
3 Replies
3. Shell Programming and Scripting
Hey there - a bit of background on what I'm trying to accomplish, first off. I am trying to load the data from a pipe delimited file into a database. The loading tool that I use cannot handle embedded newline characters within a field, so I need to scrub them out.
Solutions that I have tried... (7 Replies)
Discussion started by: bbetteridge
7 Replies
4. Shell Programming and Scripting
I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record.
Input file ... (2 Replies)
Discussion started by: clintrpeterson
2 Replies
5. Shell Programming and Scripting
Hi All
I wanted to know how to effectively delete some columns in a large tab delimited file.
I have a file that contains 5 columns and almost 100,000 rows
3456 f g t t
3456 g h
456 f h
4567 f g h z
345 f g
567 h j k lThis is a very large data file and tab delimited.
I need... (2 Replies)
Discussion started by: Lucky Ali
2 Replies
6. Shell Programming and Scripting
Since there are approximately 75K gsfiles and hundreds of stfiles per gsfile, this script can take hours. How can I rewrite this script, so that it's much faster? I'm not as familiar with perl but I'm open to all suggestions.
ls file.list>$split
for gsfile in `cat $split`;
do
csplit... (17 Replies)
Discussion started by: verge
17 Replies
7. Shell Programming and Scripting
Hi,
I have the following command in place
nawk -F, '!a++' file > file.uniq
It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error:
bash-3.2$ nawk -F, '!a++'... (17 Replies)
Discussion started by: makn
17 Replies
8. Shell Programming and Scripting
I am working on a homonym dictionary of names i.e. names which are clustered together according to their “sound-alike” pronunciation:
An example will make this clear:
Since the dictionary is manually constructed it often happens that inadvertently two sets of “homonyms” which should be grouped... (2 Replies)
Discussion started by: gimley
2 Replies
9. UNIX for Advanced & Expert Users
I have a file size is around 24 G with 14 columns, delimiter with "|"
My requirement- can anyone provide me the fastest and best to get the below results
Number of records of the file
First column and second Column- Unique counts
Thanks for your time
Karti
------ Post updated at... (3 Replies)
Discussion started by: kartikirans
3 Replies
10. Shell Programming and Scripting
I have a large file 1.5 gb and want to sort the file.
I used the following AWK script to do the job
!x++
The script works but it is very slow and takes over an hour to do the job. I suspect this is because the file is not sorted.
Any solution to speed up the AWk script or a Perl script would... (4 Replies)
Discussion started by: gimley
4 Replies
PERLTW(1) Perl Programmers Reference Guide PERLTW(1)
NAME
perltw - XXXX Perl XX
DESCRIPTION
XXXX Perl XXX!
X 5.8.0 XXX, Perl XXXXXX Unicode (XXX) XX, XXXXXXXXXXXXXXXXXXX; CJK (XXX) XXXXXXXX. Unicode XXXXXXX, XXXXXXXXXXXX: XXXX, XXXX, XXXXXXXX
(XXX, XXXX, XXXX, XXXX, XXX, XXXX, XX). XXXXXXXXXXXXXX (X PC XXXX).
Perl XXX Unicode XXXX. XXX Perl XXXXXXXXX Unicode XX; Perl XXXXXX (XXXXXXXXX) XXX Unicode XXXX. XXXXXXX, XXXXX Unicode XXXXXXXXXXXX, Perl
XXX Encode XXXX, XXXXXXXXXXXXXXXXXXX.
Encode XXXXXXXXXXXXXXXXX ('big5' XX 'big5-eten'):
big5-eten Big5 XX (XXXXXXX)
big5-hkscs Big5 + XXXXX, 2001 XX
cp950 XXX 950 (Big5 + XXXXXXX)
XXXX, X Big5 XXXXXXX Unicode, XXXXXXXX:
perl -Mencoding=big5,STDOUT,utf8 -pe1 < file.big5 > file.utf8
Perl XXXX "piconv", XXXXX Perl XXXXXXXXXXX, XXXX:
piconv -f big5 -t utf8 < file.big5 > file.utf8
piconv -f utf8 -t big5 < file.utf8 > file.big5
XX, XX encoding XX, XXXXXXXXXXXXXXXXX, XXXX:
#!/usr/bin/env perl
# XX big5 XXXX; XXXXXXXXXXXXX big5 XX
use encoding 'big5', STDIN => 'big5', STDOUT => 'big5';
print length("XX"); # 2 (XXXXXXX)
print length('XX'); # 4 (XXXXXXXX)
print index("XXXX", "XX"); # -1 (XXXXXXX)
print index('XXXX', 'XX'); # 1 (XXXXXXXXX)
XXXXXXXX, "X" XXXXXXXX "X" XXXXXXXXXX Big5 XX "X"; "X" XXXXXXXXX "X" XXXXXXXXXX "X". XXXXXX Big5 XXXXXXXXXXX.
XXXXXXX
XXXXXXXXXXX, XXX CPAN (<http://www.cpan.org/>) XX Encode::HanExtra XX. XXXXXXXXXXX:
cccii 1980 XXXXXXXXXXXX
euc-tw Unix XXXXX, XX CNS11643 XX 1-7
big5plus XXXXXXXXXXXXX Big5+
big5ext XXXXXXXXXXXXX Big5e
XX, Encode::HanConvert XXXXXXXXXXXXXXXX:
big5-simp Big5 XXXXX Unicode XXXXXX
gbk-trad GBK XXXXX Unicode XXXXXX
XXX GBK X Big5 XXXX, XXXXXXXXX b2g.pl X g2b.pl XXXX, XXXXXXXXXXX:
use Encode::HanConvert;
$euc_cn = big5_to_gb($big5); # X Big5 XX GBK
$big5 = gb_to_big5($euc_cn); # X GBK XX Big5
XXXXXX
XXX Perl XXXXXXXXX (XXXXXXXXX), XXXXXXX Perl XXX, XX Unicode XXXXX. XX, XXXXXXXXX:
XX Perl XXXXX
<http://www.perl.com/>
Perl XXX (XXXXXXXX)
<http://www.cpan.org/>
Perl XXXXX (Comprehensive Perl Archive Network)
<http://lists.perl.org/>
Perl XXXXXX
XX Perl XXX
<http://www.oreilly.com.tw/product_perl.php?id=index_perl>
XXXXXXXXX Perl XX
<http://groups.google.com/groups?q=tw.bbs.comp.lang.perl>
XX Perl XXXXX (XXXXX BBS X Perl XXX)
Perl XXXXX
<http://www.pm.org/groups/taiwan.html>
XX Perl XXXXX
<irc://irc.freenode.org/#perl.tw>
Perl.tw XXXXX
Unicode XXXX
<http://www.unicode.org/>
Unicode XXXX (Unicode XXXXXX)
<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html>
Unix/Linux XX UTF-8 X Unicode XXX
XXXXX
XXXXXXX
<http://www.cpatch.org/>
Linux XXXXXXX
<http://www.linux.org.tw/CLDP/>
SEE ALSO
Encode, Encode::TW, encoding, perluniintro, perlunicode
AUTHORS
Jarkko Hietaniemi <jhi@iki.fi>
Audrey Tang (XX) <audreyt@audreyt.org>
perl v5.16.2 2012-10-11 PERLTW(1)