In a UTF-8 locale, the two-byte sequence specified by the C string "\347\03" is not a valid character, and it looks like gawk is discarding th contents of that line (or at least the characters starting with the invalid byte sequence and following characters up to the end of the line) are being discarded without printing a diagnostic. The standards only specify the behavior of awk when the files it reads are text files, so it is allowed to do anything it wants in this case. (By definition, file1 is not a text file since it contains byte sequences that are not valid characters in the current locale.) As RudiC suggested, using the C locale (which uses a single-byte character set with all byte sequences being valid characters) instead of the en_US.UTF-8 locale (which uses a variable number of bytes to encode a character and some sequences do not form valid characters) should do what you want in this case. (Note, however, that even in the C locale, NUL bytes are not valid in a text file and, if the file is not an empty file, the last byte of the file must be a <newline> character, and no line in the file can be longer than LINE_MAX bytes long. On most systems, LINE_MAX is 2048. The standards don't allow LINE_MAX to be less than 2048. The command:
will give you the limit on your system.) From what we have seen so far (i.e., the first three lines), file1 appears to be a valid text file in the C locale.
Most of it went over my head only thing I understood that awk is not recognizing
Quote:
'file1
as a text file and so it isn't working. Ok so I tried and set the variable
Quote:
LC_ALL=C
. After that
Quote:
locale
command was still showing
Quote:
LC_ALL=
value empty but when I do
Quote:
echo $LC_ALL
it was showing value C. So that should be fine.
After that I ran sed command
and it didn't work, it just printed identical file in file2 with all junk characters as it is. Resulting file2 is same size as file1. So sed command didn't work.
I tried cut command and it works for all lines except the first line. On first line it leaves
Quote:
^C
character and for all other lines it removed junk characters as expected.
. Let me know if this gives you any clue why cut command removes all junk characters except for 1st line.
Most of it went over my head only thing I understood that awk is not recognizing as a text file and so it isn't working. Ok so I tried and set the variable . After that command was still showing value empty but when I do it was showing value C. So that should be fine.
After that I ran sed command
and it didn't work, it just printed identical file in file2 with all junk characters as it is. Resulting file2 is same size as file1. So sed command didn't work.
I tried cut command and it works for all lines except the first line. On first line it leaves character and for all other lines it removed junk characters as expected.
. Let me know if this gives you any clue why cut command removes all junk characters except for 1st line.
We JUST want you to set LC_ALL=C in the environment of the awk, sed, or cut command you're trying to run. We aren't trying to get you to change the environment in your shell for other commands that you might run later. We NEVER suggested that you issue the command:
by itself.
If cut -c2- is giving you what you want with the locale still set to en_US.UTF-8, cut is acting strangely (or at least inconsistently) when compared to awk with the same input. The cut -c option works on characters (just like awk substr()) and the two byte sequence at the start of each line you have shown us in file1 is NOT a character in any locale that uses UTF-8 as it's underlying codeset.
You could use:
to throw away the first two bytes (not characters) of every line from file1 independent of locale.
Or you could use the suggestion RudiC provided:
or, using the same syntax, with any of the awk suggestions we've provided, such as:
to set the locale for sed or awk only without affecting the locale that would be used by any other utility you (or anyone else) would run later in that login session or in any other login sessions currently active on your system.
These 2 Users Gave Thanks to Don Cragun For This Post:
We JUST want you to set LC_ALL=C in the environment of the awk, sed, or cut command you're trying to run. We aren't trying to get you to change the environment in your shell for other commands that you might run later. We NEVER suggested that you issue the command:
by itself.
If cut -c2- is giving you what you want with the locale still set to en_US.UTF-8, cut is acting strangely (or at least inconsistently) when compared to awk with the same input. The cut -c option works on characters (just like awk substr()) and the two byte sequence at the start of each line you have shown us in file1 is NOT a character in any locale that uses UTF-8 as it's underlying codeset.
You could use:
to throw away the first two bytes (not characters) of every line from file1 independent of locale.
Or you could use the suggestion RudiC provided:
or, using the same syntax, with any of the awk suggestions we've provided, such as:
to set the locale for sed or awk only without affecting the locale that would be used by any other utility you (or anyone else) would run later in that login session or in any other login sessions currently active on your system.
All of these commands worked like charm
Thank you so much "Don Cragun" and "RudiC".
Last edited by later_troy; 04-25-2016 at 11:19 AM..
This is my ubuntu version:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial
$ /bin/awk -V | head -n1
bash: /bin/awk: No such file or directory
I have gotten a script that helps me to parse,... (14 Replies)
Hi Experts,
Need your kind help with gsub awk.
Below is my pattern:"exec=1_host_cnt=100_dup=4_NameTag=targetSrv_500.csv","'20171122112948"," 100"," 1"," 1"," 4","400","","",
" aac sample exec ""hostname=XXXXX commandline='timeout 10 openssl speed -multi 2 ; exit 0'"" ","-1","-1","1","... (6 Replies)
Hi, I'm having trouble with a simple copy command in a script on HPUX.
I am trying to copy a file and append date & time.
The echo command prints out what I am expecting..
echo "Backing up $file to $file.$DATE.$FIXNUM" | tee -a $LOGFILE
+ echo 'Backing up... (4 Replies)
Dear all,
I had script which used to work, but recently it is not working as expected.
I have command line in my shell script to choose the following format from the output_elog and perform some task afterwards on
As you see, I want all numbers in foramt following RED mark except for... (12 Replies)
I was trying to write a simple script which will read a text file and count the number of vowels in the file. My code is given below -
#!/bin/bash
file=$1
v=0
if
then
echo "$0 filename"
exit 1
fi
if
then
echo "$file not a file"
exit 2
fi
while read -n... (14 Replies)
Hi!
Been working on a script and I've been having a problem. I've finally narrowed it down to this variable I'm setting:
servername=$(awk -v FS=\/ '{ print $7 } blah.txt | sed 's\/./-/g' | awk -v FS=\- '{print $1}')"
This will essentially pare down a line like this:
... (7 Replies)
I've been trying to figure this out since last night, and I'm just stumped. The last time I did any shell scripting was 8 years ago on a Unix box, and it was never my strong suit. I'm on a Mac running Leopard now. Here's my dilemma - hopefully someone can point me in the right direction.
I'm... (10 Replies)
I have a script with a find command using xargs to copy the files found to another directory. The find command is finding the appropriate file, but it's not copying. I've checked permissions, and those are all O.K., so I'm not sure what I'm missing. Any help is greatly appreciated.
This is... (2 Replies)
I am facing some strange problem.
I know, there is only one record in a file 'test.txt' which starts with 'X'
I ensure that with following command,
awk /^X/ test.txt | wc -l
This gives me output = '1'.
Now I take out this record out of the file, as follows :
awk /^X/ test.txt >... (1 Reply)