10-13-2006
If you're saving it as UTF-16, you're saving it as 16-bit Unicode, and my guess was right on the money. You can't grep UTF-16. It uses 16-bit characters instead of the normal 8 bits. Grep is comparing half a character of UTF-16 to one full character of some other set. Naturally it won't work.
Can you save it as UTF-8 instead? It can represent all UNICODE characters, without breaking grep.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi all,
I have a problem when i grep for a particular field among all fhe files in the directory.
if i do an ls -l field *
i can find it.
however at the moment the number of files in the directory are close to 28000 and it returns an
ksh: /usr/bin/grep: arg list too long
Assuming i... (2 Replies)
Discussion started by: manualvin
2 Replies
2. Shell Programming and Scripting
Hi
I want to get the value between to XML tags as follows
<EAN>12345</EAN>
so i would want to return 12345. i have tried sed and awk but can't do it.
can anyone help? (9 Replies)
Discussion started by: handak9
9 Replies
3. Shell Programming and Scripting
I want to search the below XML pattern in the XML files, but the XML files would be in a .GZ files,
<PRODID>LCTO84876</PRODID>
<PARTNUMBER>8872AC1</PARTNUMBER>
<WWPRODID>MODEL84876</WWPRODID>
<COUNTRY>US</COUNTRY>
<LANGUAGE>1</LANGUAGE>
What's the command/script to search it ? :confused: (2 Replies)
Discussion started by: saravvij
2 Replies
4. Shell Programming and Scripting
Hi all,
I have a file resp_cde.ats which has values as:-
APPDIR=C:\Program Files\Cogny\cert
PUBSDIR=C:\Program Files\Cognoy\cert\documentation
TOURDIR=C:\Program Files\Cognoy\cert\tour
DATADIR=C:\Program Files\Cognoy\cert\data
Now I use the grep command in a shell script:-
x=`grep... (2 Replies)
Discussion started by: vikasrout
2 Replies
5. Shell Programming and Scripting
I have a .xml file similar to the following:
<Column>
<Name>FIELD1</Name>
<Title>CO.</Title>
</Column>
<Column>
<Name>FIELD2</Name>
<EditField>TextBox</EditField>
<ColumnSpan0>4</ColumnSpan0>
<Title>NORMAL</Title>
... (12 Replies)
Discussion started by: jl487
12 Replies
6. UNIX for Dummies Questions & Answers
I have 2 XML Data files with a tag named PARTICIPATION_TYPE and i am trying to grep for that and getting unique values. However one of the xml data file data is not aligned properly like below.
File 1: (works fine when i do grep) grep "PARTICIPATION_TYPE" file1.xml | sort -u
Data:
....... (3 Replies)
Discussion started by: Ariean
3 Replies
7. UNIX for Dummies Questions & Answers
I have an xml file with header as below.
<Provider xmlns="http://www.xyzx.gov/xyz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xyzx.gov/xyz xyz.xsd" SCHEMA_VERSION="2.5" PROVIDER="5">
I want to get the schema version here that is 2.5 and put in a... (7 Replies)
Discussion started by: Ariean
7 Replies
8. Shell Programming and Scripting
Dear community,
I have a big XML log file containing several rows splitted by tag: <ActivityLogRecord> and </ActivityLogRecord>. An example below.
What I need is read the file and extract some value from each tags and put them into one line (each line for every <ActivityLogRecord> tag).
So... (5 Replies)
Discussion started by: Lord Spectre
5 Replies
9. Shell Programming and Scripting
Hi,
I have the below tag/s in my xml.
<foreign-server name="MOHTASHIM_SERVER">
What will be the easist way to extract MOHTASHIM_SERVER without the double quotes "" from the above tag?
Desired Output: (10 Replies)
Discussion started by: mohtashims
10 Replies
10. UNIX for Beginners Questions & Answers
I have a simple xml file,need the output with the <value> tag and <result> tag
text.xml
<test-method status="FAIL" duration="45">
<value>
Id=C18
</value>
<result>
wrong paramter
</result>
</test-method>
<test-method status="FAIL" duration="45">
<value>
Id=C19
</value>
<result>
Data... (5 Replies)
Discussion started by: DevAakash
5 Replies
LEARN ABOUT DEBIAN
encode::imaputf7
Encode::IMAPUTF7(3pm) User Contributed Perl Documentation Encode::IMAPUTF7(3pm)
NAME
Encode::IMAPUTF7 - modification of UTF-7 encoding for IMAP
SYNOPSIS
use Encode qw/encode decode/;
use Encode::IMAPUTF7;
print encode('IMAP-UTF-7', 'RA~Xpertoire');
print decode('IMAP-UTF-7', R&AOk-pertoire');
ABSTRACT
IMAP mailbox names are encoded in a modified UTF7 when names contains international characters outside of the printable ASCII range. The
modified UTF-7 encoding is defined in RFC2060 (section 5.1.3).
There is another CPAN module with same purpose, Unicode::IMAPUtf7. However, it works correctly only with strings, which encoded form does
not contain plus sign. For example, the Cyrillic string x{043f}x{0440}x{0435}x{0434}x{043b}x{043e}x{0433} is represented in UTF-7 as
+BD8EQAQ1BDQEOwQ+BDM- Note the second plus sign 4 characters before the end. Unicode::IMAPUtf7 encodes the above string as
+BD8EQAQ1BDQEOwQ&BDM- which is not valid modified UTF-7 (the ampersand and the plus are swapped). The problem is solved by the current
module, which is slightly modified Encode::Unicode::UTF7 and has nothing common with Unicode::IMAPUtf7.
RFC2060 - section 5.1.3 - Mailbox International Naming Convention
By convention, international mailbox names are specified using a modified version of the UTF-7 encoding described in [UTF-7]. The purpose
of these modifications is to correct the following problems with UTF-7:
1) UTF-7 uses the "+" character for shifting; this conflicts with
the common use of "+" in mailbox names, in particular USENET
newsgroup names.
2) UTF-7's encoding is BASE64 which uses the "/" character; this
conflicts with the use of "/" as a popular hierarchy delimiter.
3) UTF-7 prohibits the unencoded usage of ""; this conflicts with
the use of "" as a popular hierarchy delimiter.
4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with
the use of "~" in some servers as a home directory indicator.
5) UTF-7 permits multiple alternate forms to represent the same
string; in particular, printable US-ASCII chararacters can be
represented in encoded form.
In modified UTF-7, printable US-ASCII characters except for "&" represent themselves; that is, characters with octet values 0x20-0x25 and
0x27-0x7e. The character "&" (0x26) is represented by the two- octet sequence "&-".
All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all Unicode 16-bit octets) are represented in modified BASE64, with a further
modification from [UTF-7] that "," is used instead of "/". Modified BASE64 MUST NOT be used to represent any printing US-ASCII character
which can represent itself.
"&" is used to shift to modified BASE64 and "-" to shift back to US- ASCII. All names start in US-ASCII, and MUST end in US-ASCII (that
is, a name that ends with a Unicode 16-bit octet MUST end with a "- ").
For example, here is a mailbox name which mixes English, Japanese, and Chinese text: ~peter/mail/&ZeVnLIqe-/&U,BTFw-
REQUESTS & BUGS
Please report any requests, suggestions or bugs via the RT bug-tracking system at http://rt.cpan.org/ or email to
bug-Encode-IMAPUTF7@rt.cpan.org.
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Encode-IMAPUTF7 is the RT queue for Encode::IMAPUTF7. Please check to see if your bug has already
been reported.
COPYRIGHT
Copyright 2005 Sava Chankov
Sava Chankov, sava@cpan.org
This software may be freely copied and distributed under the same terms and conditions as Perl.
AUTHORS
Peter Makholm <peter@makholm.net>, current maintainer
Sava Chankov <sava@cpan.org>, original author
SEE ALSO
perl(1), Encode.
perl v5.12.4 2011-09-25 Encode::IMAPUTF7(3pm)