06-09-2013
Quote:
Originally Posted by
Don Cragun
Nicely done Alister. But even this approach has an exception. This code will remove all characters in a file after the last <newline> character as long as the file is encoded using a codeset that is a superset of ASCII.
If you want to use this on IBM's AIX on a mainframe computer (or any other OS that uses EBCDIC to encode text), you'll need to change the $0=="10" in the awk command to $0=="37".
Quite right. Parameterizing the byte value along with
echo | od -An -tu1 yield a solution for any character encoding with a single-byte newline (a multibyte sequence can be matched, but not without complicating the code).
Regards,
Alister
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
hi,
I want to print the below lines
"Message from bac logistics
The Confirmation File has not been received."
When i give like this in the code
"Message from bac logistics\n The Confirmation File has not been received."
It is giving only
Message from bac logistics\n The... (9 Replies)
Discussion started by: trichyselva
9 Replies
2. Shell Programming and Scripting
Hi All,
I have named a file with current date,time and year as follows:
month=`date | awk '{print $2}'`
date=`date | awk '{print $3}'`
year=`date | awk '{print $6}'`
time=`date +%Hh_%Mm_%Ss'`
filename="test_"$month"_"$date"_"$year"_"$time".txt"
> $filename
The file is created with a... (2 Replies)
Discussion started by: amio
2 Replies
3. Shell Programming and Scripting
Hi All,
We append the output of a file's size in a file. But a newline character is appended after the variable.
Pls help how to clear this.
filesize=`ls -l test.txt | awk `{print $5}'`
echo File size of test.txt is $filesize bytes >> logfile.txt
The output we got is,
File size of... (4 Replies)
Discussion started by: amio
4 Replies
4. Shell Programming and Scripting
Hi All,
I have 5000 records like this
Request_id|Type|Status|Priority|Ticket Submitted Date and Time|Actual Resolved Date and Time|Current Ticket Owner Group|Case final Ticket Owner Group|Customer Severity|Reported Symptom/Request|Component|Hot Topic|Reason for Missed SLA|Current Ticket... (2 Replies)
Discussion started by: j_53933
2 Replies
5. Shell Programming and Scripting
I'd like to remove (do a pattern or precise replacement - this I can handle in SED using Regex )
---AFTER THE 1ST Occurrence ( i.e. on the 2nd occurrence - from the 2nd to fourth occurance ) of a specific string : type 1
-- After the 1st occurrence of 1 string1 till the 1st occurrence of... (4 Replies)
Discussion started by: sieger007
4 Replies
6. Shell Programming and Scripting
Hi,
In my file, I have '\n' characters inside a single record. Because of this, a single records appears in many lines and looks like multiple records. In the below file.
File 1
====
1,nmae,lctn,da\n
t
2,ghjik,o\n
ut,de\n
fk
Expected output after the \n removed
File 2
=====... (5 Replies)
Discussion started by: machomaddy
5 Replies
7. Shell Programming and Scripting
Hi,
I have a very huge file, around 1GB of data.
I want to remove the newline characters in the file but not preceded by the original end delimiter {}
sample data will look like this
1234567
abcd{}
1234sssss
as67
abcd{}
12dsad3dad
4sdad567
abcdsadd{}
this should look like this... (6 Replies)
Discussion started by: ratheeshjulk
6 Replies
8. Shell Programming and Scripting
hi i am having delimited .dat file having content like below.
test.dat(5 line of records)
======
PT2~Stag~Pt2 Stag Test.
Updated~PT2 S T~Area~~UNCEF R20~~2012-05-24 ~2014-05-24~~
PT2~Stag y~Pt2 Stag Test.
Updated~PT2 S T~Area~METR~~~2012-05-24~2014-05-24~~test
PT2~Pt2 Stag Test~~PT2 S... (4 Replies)
Discussion started by: sushine11
4 Replies
9. Shell Programming and Scripting
I have a file which comes every day and the file data look's as below.
Vi abc.txt
a|b|c|d\n
a|g|h|j\n
Some times we receive the file with only a new line character in the file like
vi abc.txt
\n (8 Replies)
Discussion started by: rak Kundra
8 Replies
10. UNIX for Beginners Questions & Answers
Hi,
I came across one issue recently where output from one of the columns of the table from where i am creating input file has newline characters hence, record in the file is spread over multiple lines. Fields in the file are separated by pipe (|) delimiter. As header will never have newline... (4 Replies)
Discussion started by: Prathmesh
4 Replies
EUC(5) BSD File Formats Manual EUC(5)
NAME
euc -- EUC encoding of wide characters
SYNOPSIS
ENCODING "EUC"
VARIABLE len1 mask1 len2 mask2 len3 mask3 len4 mask4 mask
DESCRIPTION
EUC implements a system of 4 multibyte codesets. A multibyte character in the first codeset consists of len1 bytes starting with a byte in
the range of 0x00 to 0x7f. To allow use of ASCII, len1 is always 1. A multibyte character in the second codeset consists of len2 bytes
starting with a byte in the range of 0x80-0xff excluding 0x8e and 0x8f. A multibyte character in the third codeset consists of len3 bytes
starting with the byte 0x8e. A multibyte character in the fourth codeset consists of len4 bytes starting with the byte 0x8f.
The wchar_t encoding of EUC multibyte characters is dependent on the len and mask arguments. First, the bytes are moved into a wchar_t as
follows:
byte0 << ((lenN-1) * 8) | byte1 << ((lenN-2) * 8) | ... | bytelenN-1
The result is then ANDed with ~mask and ORed with maskN. Codesets 2 and 3 are special in that the leading byte (0x8e or 0x8f) is first
removed and the lenN argument is reduced by 1.
For example, the ja_JP.eucJP locale has the following VARIABLE line:
VARIABLE 1 0x0000 2 0x8080 2 0x0080 3 0x8000 0x8080
Codeset 1 consists of the values 0x0000 - 0x007f.
Codeset 2 consists of the values who have the bits 0x8080 set.
Codeset 3 consists of the values 0x0080 - 0x00ff.
Codeset 4 consists of the values 0x8000 - 0xff7f excluding the values which have the 0x0080 bit set.
Notice that the global mask is set to 0x8080, this implies that from those 2 bits the codeset can be determined.
SEE ALSO
mklocale(1), setlocale(3)
BSD
November 8, 2003 BSD