As jim mentioned above, all Assamese characters begin with a byte sequence that is not present in English text. So, differentiating between the two is easy enough.
From what I gathered, Assamese script uses characters that are also used in Bengali, and their Unicode range is 0x0980-0x09FF, which when encoded into UTF-8 is a range of three-byte sequences: 11100000 10100110 10000000 - 11100000 10100111 10111111 (hopefully I didn't mess that up ).
The leading byte's value in that range is always 0xe0 (0340 octal, 224 decimal). We can test for its presence with AWK:
Regards,
Alister
Hi,
I have my input as follows :
I have given two entries-
From system Mon Aug 1 23:52:47 2005
Source !100000006!:
Impact !100000005!: High
Status ! 7!: New
Last Name+!100000001!:
First Name+ !100000003!:
... (4 Replies)
I have a text file with lot of rows like..
Action & Adventure|2012: Supernova NR|2009-11-01 00:01:00|2010-05-01 23:59:00|Active|3
Action & Adventure|50 Dead Men Walking|2010-01-05 00:01:00|2010-06-30 23:59:00|Active|3
Action & Adventure|Afterwards|2009-11-26 00:01:00|2010-03-26... (3 Replies)
Hi All,
I am new to this forumn as well to the UNIX, I have basic knowledge of UNIX which I studied some years ago, now I have to do some shell scripting to load data into Oracle database using sqlldr utility, whcih I am able to do. I have a requirement where I need to do following operation.
I... (10 Replies)
Hi Friends,
Could you guys help me out of this problem... I need to send an email to all the users and the email has to be picked from the text file.
text file contains the no. of records like:
Code:
giridhar
224285
847333
giridhar276@gmail.com
ramana
84849
33884... (0 Replies)
I would like to print the output beginning with a header from a seperate file like this:
awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong? (4 Replies)
Hello, I have a pretty simple question, but I am new to Python and am trying to write a simple program. Put simply, I want to take a text file that looks like this:
11111 22222
33333 44444
55555 66666
77777 88888
and produce two lists, one containing the contents of the left column, one the... (0 Replies)
Hello,
I want to grep a log ("server.log") for words in a separate file ("white-list.txt") and generate a separate log file containing each line that uses a word from the "white-list.txt" file.
Putting that in bullet points:
Search through "server.log" for lines that contain any word... (15 Replies)
performing this code to read from file and print each character in separate line
works well with ASCII encoded text
void
preprocess_file (FILE *fp)
{
int cc;
for (;;)
{ cc = getc (fp);
if (cc == EOF)
break;
printf ("%c\n", cc);
}
}
int
main(int... (1 Reply)
hi,
I'm trying to print out the contents of a folder into a .txt file.
The code I'm trying amongst variations is:
ls -1 > filenames.txt
but it prints them all on the same line ie.
image102.bmpimage103.bmpimage104.bmpimage105.bmpimage106.bmp
how can I change this?
Please... (2 Replies)
Hi,
Could anyone help me with this please.
Input file --
ant 1 2 3 4
2 3 4 56 7
dog 8 9 56
ant 2 3 4 5
cvh 6 7 8
ant 1 3 45
78 0 -
Would like to split the file as soon as it encounters the word "ant" very first time.
First Output file-- ant 1 2 3 4
2 3 4 56 7
dog 8 9 56
... (2 Replies)
Discussion started by: Indra2011
2 Replies
LEARN ABOUT REDHAT
utf8
UTF-8(7) Linux Programmer's Manual UTF-8(7)NAME
UTF-8 - an ASCII compatible multi-byte Unicode encoding
DESCRIPTION
The Unicode 3.0 character set occupies a 16-bit code space. The most obvious Unicode encoding (known as UCS-2) consists of a sequence of
16-bit words. Such strings can contain as parts of many 16-bit characters bytes like '