Sponsored Content
Top Forums Shell Programming and Scripting script to separate bilingual text file Post 302414885 by jim mcnamara on Wednesday 21st of April 2010 07:58:15 AM
Old 04-21-2010
That means each leading byte of Assamese text characters has the following bit pattern:

1110---- (highest order bit first) --- are bits used for data. not typing of characters.

This is the pattern for ASCII:
0------

The spaces present a problem because they exist in each character set.

You need a bit of C code to test bits, then write either one byte to an output stream
for ASCII or three bytes to an output stream opened for Assamese text. Since the spaces are common to both, I would write them as a newline and worrry about them later.

Lastly, convert the newlines in the two output files back into space characters after you are done.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Separate a portion of text file into another file

Hi, I have my input as follows : I have given two entries- From system Mon Aug 1 23:52:47 2005 Source !100000006!: Impact !100000005!: High Status ! 7!: New Last Name+!100000001!: First Name+ !100000003!: ... (4 Replies)
Discussion started by: srikanth_ksv
4 Replies

2. Shell Programming and Scripting

Separate lines from text file

I have a text file with lot of rows like.. Action & Adventure|2012: Supernova NR|2009-11-01 00:01:00|2010-05-01 23:59:00|Active|3 Action & Adventure|50 Dead Men Walking|2010-01-05 00:01:00|2010-06-30 23:59:00|Active|3 Action & Adventure|Afterwards|2009-11-26 00:01:00|2010-03-26... (3 Replies)
Discussion started by: ramse8pc
3 Replies

3. Shell Programming and Scripting

Splitting text file into 2 separate files ??

Hi All, I am new to this forumn as well to the UNIX, I have basic knowledge of UNIX which I studied some years ago, now I have to do some shell scripting to load data into Oracle database using sqlldr utility, whcih I am able to do. I have a requirement where I need to do following operation. I... (10 Replies)
Discussion started by: shekharjchandra
10 Replies

4. UNIX for Advanced & Expert Users

shell script to send separate mails to different users from a text file

Hi Friends, Could you guys help me out of this problem... I need to send an email to all the users and the email has to be picked from the text file. text file contains the no. of records like: Code: giridhar 224285 847333 giridhar276@gmail.com ramana 84849 33884... (0 Replies)
Discussion started by: giridhar276
0 Replies

5. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong? (4 Replies)
Discussion started by: sdf
4 Replies

6. Shell Programming and Scripting

Separate Text File into Two Lists Using Python

Hello, I have a pretty simple question, but I am new to Python and am trying to write a simple program. Put simply, I want to take a text file that looks like this: 11111 22222 33333 44444 55555 66666 77777 88888 and produce two lists, one containing the contents of the left column, one the... (0 Replies)
Discussion started by: Tyler_92
0 Replies

7. Shell Programming and Scripting

How to grep a log file for words listed in separate text file?

Hello, I want to grep a log ("server.log") for words in a separate file ("white-list.txt") and generate a separate log file containing each line that uses a word from the "white-list.txt" file. Putting that in bullet points: Search through "server.log" for lines that contain any word... (15 Replies)
Discussion started by: nbsparks
15 Replies

8. Programming

Read text from file and print each character in separate line

performing this code to read from file and print each character in separate line works well with ASCII encoded text void preprocess_file (FILE *fp) { int cc; for (;;) { cc = getc (fp); if (cc == EOF) break; printf ("%c\n", cc); } } int main(int... (1 Reply)
Discussion started by: khaled79
1 Replies

9. UNIX for Beginners Questions & Answers

Ls to text file on separate lines

hi, I'm trying to print out the contents of a folder into a .txt file. The code I'm trying amongst variations is: ls -1 > filenames.txt but it prints them all on the same line ie. image102.bmpimage103.bmpimage104.bmpimage105.bmpimage106.bmp how can I change this? Please... (2 Replies)
Discussion started by: newbie100
2 Replies

10. UNIX for Beginners Questions & Answers

Script to separate file

Hi, Could anyone help me with this please. Input file -- ant 1 2 3 4 2 3 4 56 7 dog 8 9 56 ant 2 3 4 5 cvh 6 7 8 ant 1 3 45 78 0 - Would like to split the file as soon as it encounters the word "ant" very first time. First Output file-- ant 1 2 3 4 2 3 4 56 7 dog 8 9 56 ... (2 Replies)
Discussion started by: Indra2011
2 Replies
encoding(n)						       Tcl Built-In Commands						       encoding(n)

__________________________________________________________________________________________________________________________________________________

NAME
encoding - Manipulate encodings SYNOPSIS
encoding option ?arg arg ...? _________________________________________________________________ INTRODUCTION
Strings in Tcl are encoded using 16-bit Unicode characters. Different operating system interfaces or applications may generate strings in other encodings such as Shift-JIS. The encoding command helps to bridge the gap between Unicode and these other formats. DESCRIPTION
Performs one of several encoding related operations, depending on option. The legal options are: encoding convertfrom ?encoding? data Convert data to Unicode from the specified encoding. The characters in data are treated as binary data where the lower 8-bits of each character is taken as a single byte. The resulting sequence of bytes is treated as a string in the specified encoding. If encoding is not specified, the current system encoding is used. encoding convertto ?encoding? string Convert string from Unicode to the specified encoding. The result is a sequence of bytes that represents the converted string. Each byte is stored in the lower 8-bits of a Unicode character. If encoding is not specified, the current system encoding is used. encoding dirs ?directoryList? Tcl can load encoding data files from the file system that describe additional encodings for it to work with. This command sets the | search path for *.enc encoding data files to the list of directories directoryList. If directoryList is omitted then the command | returns the current list of directories that make up the search path. It is an error for directoryList to not be a valid list. If, | when a search for an encoding data file is happening, an element in directoryList does not refer to a readable, searchable direc- | tory, that element is ignored. encoding names Returns a list containing the names of all of the encodings that are currently available. encoding system ?encoding? Set the system encoding to encoding. If encoding is omitted then the command returns the current system encoding. The system encod- ing is used whenever Tcl passes strings to system calls. EXAMPLE
It is common practice to write script files using a text editor that produces output in the euc-jp encoding, which represents the ASCII characters as singe bytes and Japanese characters as two bytes. This makes it easy to embed literal strings that correspond to non-ASCII characters by simply typing the strings in place in the script. However, because the source command always reads files using the current system encoding, Tcl will only source such files correctly when the encoding used to write the file is the same. This tends not to be true in an internationalized setting. For example, if such a file was sourced in North America (where the ISO8859-1 is normally used), each byte in the file would be treated as a separate character that maps to the 00 page in Unicode. The resulting Tcl strings will not contain the expected Japanese characters. Instead, they will contain a sequence of Latin-1 characters that correspond to the bytes of the original string. The encoding command can be used to convert this string to the expected Japanese Unicode characters. For example, set s [encoding convertfrom euc-jp "xA4xCF"] would return the Unicode string "u306F", which is the Hiragana letter HA. SEE ALSO
Tcl_GetEncoding(3) KEYWORDS
encoding Tcl 8.1 encoding(n)
All times are GMT -4. The time now is 12:46 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy