Sponsored Content
Top Forums Shell Programming and Scripting script to separate bilingual text file Post 302414885 by jim mcnamara on Wednesday 21st of April 2010 07:58:15 AM
Old 04-21-2010
That means each leading byte of Assamese text characters has the following bit pattern:

1110---- (highest order bit first) --- are bits used for data. not typing of characters.

This is the pattern for ASCII:
0------

The spaces present a problem because they exist in each character set.

You need a bit of C code to test bits, then write either one byte to an output stream
for ASCII or three bytes to an output stream opened for Assamese text. Since the spaces are common to both, I would write them as a newline and worrry about them later.

Lastly, convert the newlines in the two output files back into space characters after you are done.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Separate a portion of text file into another file

Hi, I have my input as follows : I have given two entries- From system Mon Aug 1 23:52:47 2005 Source !100000006!: Impact !100000005!: High Status ! 7!: New Last Name+!100000001!: First Name+ !100000003!: ... (4 Replies)
Discussion started by: srikanth_ksv
4 Replies

2. Shell Programming and Scripting

Separate lines from text file

I have a text file with lot of rows like.. Action & Adventure|2012: Supernova NR|2009-11-01 00:01:00|2010-05-01 23:59:00|Active|3 Action & Adventure|50 Dead Men Walking|2010-01-05 00:01:00|2010-06-30 23:59:00|Active|3 Action & Adventure|Afterwards|2009-11-26 00:01:00|2010-03-26... (3 Replies)
Discussion started by: ramse8pc
3 Replies

3. Shell Programming and Scripting

Splitting text file into 2 separate files ??

Hi All, I am new to this forumn as well to the UNIX, I have basic knowledge of UNIX which I studied some years ago, now I have to do some shell scripting to load data into Oracle database using sqlldr utility, whcih I am able to do. I have a requirement where I need to do following operation. I... (10 Replies)
Discussion started by: shekharjchandra
10 Replies

4. UNIX for Advanced & Expert Users

shell script to send separate mails to different users from a text file

Hi Friends, Could you guys help me out of this problem... I need to send an email to all the users and the email has to be picked from the text file. text file contains the no. of records like: Code: giridhar 224285 847333 giridhar276@gmail.com ramana 84849 33884... (0 Replies)
Discussion started by: giridhar276
0 Replies

5. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong? (4 Replies)
Discussion started by: sdf
4 Replies

6. Shell Programming and Scripting

Separate Text File into Two Lists Using Python

Hello, I have a pretty simple question, but I am new to Python and am trying to write a simple program. Put simply, I want to take a text file that looks like this: 11111 22222 33333 44444 55555 66666 77777 88888 and produce two lists, one containing the contents of the left column, one the... (0 Replies)
Discussion started by: Tyler_92
0 Replies

7. Shell Programming and Scripting

How to grep a log file for words listed in separate text file?

Hello, I want to grep a log ("server.log") for words in a separate file ("white-list.txt") and generate a separate log file containing each line that uses a word from the "white-list.txt" file. Putting that in bullet points: Search through "server.log" for lines that contain any word... (15 Replies)
Discussion started by: nbsparks
15 Replies

8. Programming

Read text from file and print each character in separate line

performing this code to read from file and print each character in separate line works well with ASCII encoded text void preprocess_file (FILE *fp) { int cc; for (;;) { cc = getc (fp); if (cc == EOF) break; printf ("%c\n", cc); } } int main(int... (1 Reply)
Discussion started by: khaled79
1 Replies

9. UNIX for Beginners Questions & Answers

Ls to text file on separate lines

hi, I'm trying to print out the contents of a folder into a .txt file. The code I'm trying amongst variations is: ls -1 > filenames.txt but it prints them all on the same line ie. image102.bmpimage103.bmpimage104.bmpimage105.bmpimage106.bmp how can I change this? Please... (2 Replies)
Discussion started by: newbie100
2 Replies

10. UNIX for Beginners Questions & Answers

Script to separate file

Hi, Could anyone help me with this please. Input file -- ant 1 2 3 4 2 3 4 56 7 dog 8 9 56 ant 2 3 4 5 cvh 6 7 8 ant 1 3 45 78 0 - Would like to split the file as soon as it encounters the word "ant" very first time. First Output file-- ant 1 2 3 4 2 3 4 56 7 dog 8 9 56 ... (2 Replies)
Discussion started by: Indra2011
2 Replies
UUENCODE(5)						      BSD File Formats Manual						       UUENCODE(5)

NAME
uuencode -- format of an encoded uuencode file DESCRIPTION
Files output by uuencode(1) consist of a header line, followed by a number of body lines, and a trailer line. The uudecode(1) command will ignore any lines preceding the header or following the trailer. Lines preceding a header must not, of course, look like a header. The header line starts with the word ``begin'', a space, a file mode (in octal), a space, and finally a string which names the file being encoded. The central engine of uuencode(1) is a six-bit encoding function which outputs an ASCII character. The six bits to be encoded are treated as a small integer and added with the ASCII value for the space character (octal 40). The result is a printable ASCII character. In the case where all six bits to be encoded are zero, the ASCII backquote character ` (octal 140) is emitted instead of what would normally be a space. The body of an encoded file consists of one or more lines, each of which may be a maximum of 86 characters long (including the trailing new- line). Each line represents an encoded chunk of data from the input file and begins with a byte count, followed by encoded bytes, followed by a newline. The byte count is a six-bit integer encoded with the above function, representing the number of bytes encoded in the rest of the line. The method used to encode the data expands its size by 133% (described below). Therefore it is important to note that the byte count describes the size of the chunk of data before it is encoded, not afterwards. The six bit size of this number effectively limits the number of bytes that can be encoded in each line to a maximum of 63. While uuencode(1) will not encode more than 45 bytes per line, uudecode(1) will toler- ate the maximum line size. The remaining characters in the line represent the data of the input file encoded as follows. Input data are broken into groups of three eight-bit bytes, which are then interpreted together as a 24-bit block. The first bit of the block is the highest order bit of the first character, and the last is the lowest order bit of the third character. This block is then broken into four six-bit integers which are encoded one by one starting from the first bit of the block. The result is a four character ASCII string for every three bytes of input data. Encoded lines of data continue in this manner until the input file is exhausted. The end of the body is signaled by an encoded line with a byte count of zero (the ASCII backquote character `). Obviously, not every input file will be a multiple of three bytes in size. In these cases, uuencode(1) will pad the remaining one or two bytes of data with garbage bytes until a three byte group is created. The byte count in a line containing garbage padding will reflect the actual number of bytes encoded, making it possible to convey how many bytes are garbage. The trailer line consists of ``end'' on a line by itself. SEE ALSO
mail(1), uucp(1), uudecode(1), uuencode(1), ascii(7) HISTORY
The uuencode file format appeared in 4.0BSD. BUGS
The interpretation of the uuencode format relies on properties of the ASCII character set and may not work correctly on non-ASCII systems. BSD
April 9, 1997 BSD
All times are GMT -4. The time now is 12:59 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy