02-16-2006
try setting locale to univ.utf8
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
i have a string like echo "a|b|c" . i want to count the | symbols in this string . how to do this .plz tell the command (11 Replies)
Discussion started by: kamesh83
11 Replies
2. Shell Programming and Scripting
Hi Folks,
I have a input file of the below format.
~~~OLKIT~OLKIT~1~~TBD~BEST PAGER & WIRELESS~4899 COMMON MARKET PLACE~~~DUBLIN~KS~43016~I~Y~DIRECT~D~~0
BPGRWRLS~~~OLKIT~OLKIT~1~~TBD~BEST PAGER & WIRELESS~4899 COMMON MARKET PLACE~~~DUBLIN~KS~43016~I~Y~DIRECT~D~~0... (12 Replies)
Discussion started by: srikanthgr1
12 Replies
3. Shell Programming and Scripting
I'm trying to count the number of 2 specific characters in a very large file. I'd like to avoid using gsub because its taking too long.
I was thinking something like:
awk '-F' { t += NF - 1 } END {print t}' infile > outfile
which isn't working
Any ideas would be great. (3 Replies)
Discussion started by: dcfargo
3 Replies
4. UNIX for Dummies Questions & Answers
Hello,
I want to count the occurences of a specific word in a .txt file in bash shell.
Can somebody help me pleaze??
Thanks!!! (2 Replies)
Discussion started by: mskart
2 Replies
5. Shell Programming and Scripting
Hello,
I have a text file with n lines in the following format (9 column fields):
Example:
contig00012 149606 G C 49 68 60 18 c$cccccacccccccccc^c
I need to count the number of lower-case and upper-case occurences in column 9, respectively, of the... (3 Replies)
Discussion started by: s052866
3 Replies
6. Shell Programming and Scripting
We have a log file, the format is similar to this:
08/04/2011 05:03:08 Connection Success
08/04/2011 05:13:18 Connection Success
08/04/2011 05:23:28 Connection Fail
08/04/2011 05:33:38 Connection Success
08/04/2011 06:14:18 Connection Success
08/04/2011 06:24:28 Connection Fail
08/04/2011... (6 Replies)
Discussion started by: clu
6 Replies
7. UNIX for Advanced & Expert Users
Hi,
I need help regarding counting specific word or character per line and validate it against a specific number i.e 10. And if number of character equals the specific number then that line will be part of the output.
Specific number = 6
Specific word or char = ||
Sample data:... (1 Reply)
Discussion started by: janzper
1 Replies
8. Shell Programming and Scripting
Hello,
I try to sort results of occurences in an array by using awk but I can't find the right command. that's why I'm asking your help ! :)
Please see below the command that I run:
awk '{ for ( i=1; i<=length; i++ ) arr++ }END{ for ( i in arr ) { print i, arr } }' dictionnary.txt
... (3 Replies)
Discussion started by: destin45
3 Replies
9. UNIX for Dummies Questions & Answers
Hi all, I have a file that contains characters. How do I get total of spesific character from that file and save the count to a variable for doing for calculation.
data.txt
1
2
2
2
2
3
3
4
5
6
7
8
5
4
3
4 (5 Replies)
Discussion started by: weslyarfan
5 Replies
10. Shell Programming and Scripting
I will appreciate if you help me here in this script in Solaris Enviroment.
Scenario:
i have 2 files :
1) /tmp/TRANSACTIONS_DAILY_20180730.txt:
201807300000000004
201807300000000005
201807300000000006
201807300000000007
201807300000000008
2)... (10 Replies)
Discussion started by: teokon90
10 Replies
LEARN ABOUT DEBIAN
marc::charset
MARC::Charset(3pm) User Contributed Perl Documentation MARC::Charset(3pm)
NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8
SYNOPSIS
# import the marc8_to_utf8 function
use MARC::Charset 'marc8_to_utf8';
# prepare STDOUT for utf8
binmode(STDOUT, 'utf8');
# print out some marc8 as utf8
print marc8_to_utf8($marc8_string);
DESCRIPTION
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates
unicode, and allows you to put non-Roman scripts in MARC bibliographic records.
http://www.loc.gov/marc/specifications/spechome.html
EXPORTS
ignore_errors()
Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records
that contain both MARC8 and UNICODE characters.
my $ignore = MARC::Charset->ignore_errors();
MARC::Charset->ignore_errors(1); # ignore errors
MARC::Charset->ignore_errors(0); # DO NOT ignore errors
assume_unicode()
Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting.
This is helepfuli if you have records that contain both MARC8 and UNICODE characters.
my $setting = MARC::Charset->assume_unicode();
MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode
assume_encoding()
Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current
setting. This is helpful if you have records that contain both MARC8 and other characters.
my $setting = MARC::Charset->assume_encoding();
MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
MARC::Charset->assume_encoding(''); # DO NOT assume any encoding
marc8_to_utf8()
Converts a MARC-8 encoded string to UTF-8.
my $utf8 = marc8_to_utf8($marc8);
If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:
my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');
or
MARC::Charset->ignore_errors(1);
my $utf8 = marc8_to_utf8($marc8);
utf8_to_marc8()
Will attempt to translate utf8 into marc8.
my $marc8 = utf8_to_marc8($utf8);
If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:
my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');
or
MARC::Charset->ignore_errors(1);
my $utf8 = marc8_to_utf8($marc8);
DEFAULT CHARACTER SETS
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the
appropriate character set code:
use MARC::Charset::Constants qw(:all);
$MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
$MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;
SEE ALSO
o MARC::Charset::Constant
o MARC::Charset::Table
o MARC::Charset::Code
o MARC::Charset::Compiler
o MARC::Record
o MARC::XML
AUTHOR
Ed Summers (ehs@pobox.com)
perl v5.12.4 2011-08-05 MARC::Charset(3pm)