awk command not working as expected


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk command not working as expected
# 29  
Old 04-06-2016
In a UTF-8 locale, the two-byte sequence specified by the C string "\347\03" is not a valid character, and it looks like gawk is discarding th contents of that line (or at least the characters starting with the invalid byte sequence and following characters up to the end of the line) are being discarded without printing a diagnostic. The standards only specify the behavior of awk when the files it reads are text files, so it is allowed to do anything it wants in this case. (By definition, file1 is not a text file since it contains byte sequences that are not valid characters in the current locale.) As RudiC suggested, using the C locale (which uses a single-byte character set with all byte sequences being valid characters) instead of the en_US.UTF-8 locale (which uses a variable number of bytes to encode a character and some sequences do not form valid characters) should do what you want in this case. (Note, however, that even in the C locale, NUL bytes are not valid in a text file and, if the file is not an empty file, the last byte of the file must be a <newline> character, and no line in the file can be longer than LINE_MAX bytes long. On most systems, LINE_MAX is 2048. The standards don't allow LINE_MAX to be less than 2048. The command:
Code:
getconf LINE_MAX

will give you the limit on your system.) From what we have seen so far (i.e., the first three lines), file1 appears to be a valid text file in the C locale.
# 30  
Old 04-07-2016
Most of it went over my head only thing I understood that awk is not recognizing
Quote:
'file1
as a text file and so it isn't working. Ok so I tried and set the variable
Quote:
LC_ALL=C
. After that
Quote:
locale
command was still showing
Quote:
LC_ALL=
value empty but when I do
Quote:
echo $LC_ALL
it was showing value C. So that should be fine.

After that I ran sed command
Code:
sed 's/^..//' file1 > file2

and it didn't work, it just printed identical file in file2 with all junk characters as it is. Resulting file2 is same size as file1. So sed command didn't work.

I tried cut command and it works for all lines except the first line. On first line it leaves
Quote:
^C
character and for all other lines it removed junk characters as expected.
Code:
cut -c 2- file1 > file2

. Let me know if this gives you any clue why cut command removes all junk characters except for 1st line.
# 31  
Old 04-07-2016
Quote:
Originally Posted by later_troy
Most of it went over my head only thing I understood that awk is not recognizing as a text file and so it isn't working. Ok so I tried and set the variable . After that command was still showing value empty but when I do it was showing value C. So that should be fine.

After that I ran sed command
Code:
sed 's/^..//' file1 > file2

and it didn't work, it just printed identical file in file2 with all junk characters as it is. Resulting file2 is same size as file1. So sed command didn't work.

I tried cut command and it works for all lines except the first line. On first line it leaves character and for all other lines it removed junk characters as expected.
Code:
cut -c 2- file1 > file2

. Let me know if this gives you any clue why cut command removes all junk characters except for 1st line.
We JUST want you to set LC_ALL=C in the environment of the awk, sed, or cut command you're trying to run. We aren't trying to get you to change the environment in your shell for other commands that you might run later. We NEVER suggested that you issue the command:
Code:
LC_ALL=C

by itself.

If cut -c2- is giving you what you want with the locale still set to en_US.UTF-8, cut is acting strangely (or at least inconsistently) when compared to awk with the same input. The cut -c option works on characters (just like awk substr()) and the two byte sequence at the start of each line you have shown us in file1 is NOT a character in any locale that uses UTF-8 as it's underlying codeset.

You could use:
Code:
cut -b3- file1 > file2

to throw away the first two bytes (not characters) of every line from file1 independent of locale.

Or you could use the suggestion RudiC provided:
Code:
LC_ALL=C sed 's/^..//' file1 > file2

or, using the same syntax, with any of the awk suggestions we've provided, such as:
Code:
LC_ALL=c awk '{print subset($0, 3)}' file1 > file2

to set the locale for sed or awk only without affecting the locale that would be used by any other utility you (or anyone else) would run later in that login session or in any other login sessions currently active on your system.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 32  
Old 04-07-2016
Quote:
Originally Posted by Don Cragun
We JUST want you to set LC_ALL=C in the environment of the awk, sed, or cut command you're trying to run. We aren't trying to get you to change the environment in your shell for other commands that you might run later. We NEVER suggested that you issue the command:
Code:
LC_ALL=C

by itself.

If cut -c2- is giving you what you want with the locale still set to en_US.UTF-8, cut is acting strangely (or at least inconsistently) when compared to awk with the same input. The cut -c option works on characters (just like awk substr()) and the two byte sequence at the start of each line you have shown us in file1 is NOT a character in any locale that uses UTF-8 as it's underlying codeset.

You could use:
Code:
cut -b3- file1 > file2

to throw away the first two bytes (not characters) of every line from file1 independent of locale.

Or you could use the suggestion RudiC provided:
Code:
LC_ALL=C sed 's/^..//' file1 > file2

or, using the same syntax, with any of the awk suggestions we've provided, such as:
Code:
LC_ALL=c awk '{print subset($0, 3)}' file1 > file2

to set the locale for sed or awk only without affecting the locale that would be used by any other utility you (or anyone else) would run later in that login session or in any other login sessions currently active on your system.
All of these commands worked like charm

Code:
cut -b3- file1 > file2
LC_ALL=c awk '{print subset($0, 3)}' file1 > file2
LC_ALL=C sed 's/^..//' file1 > file2

Thank you so much "Don Cragun" and "RudiC".

Last edited by later_troy; 04-25-2016 at 11:19 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk matching script not working as expected

This is my ubuntu version: $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.4 LTS Release: 16.04 Codename: xenial $ /bin/awk -V | head -n1 bash: /bin/awk: No such file or directory I have gotten a script that helps me to parse,... (14 Replies)
Discussion started by: delbroooks
14 Replies

2. Shell Programming and Scripting

awk gsub not working as expected

Hi Experts, Need your kind help with gsub awk. Below is my pattern:"exec=1_host_cnt=100_dup=4_NameTag=targetSrv_500.csv","'20171122112948"," 100"," 1"," 1"," 4","400","","", " aac sample exec ""hostname=XXXXX commandline='timeout 10 openssl speed -multi 2 ; exit 0'"" ","-1","-1","1","... (6 Replies)
Discussion started by: pradyumnajpn10
6 Replies

3. Shell Programming and Scripting

Cp command not working as expected in HPUX

Hi, I'm having trouble with a simple copy command in a script on HPUX. I am trying to copy a file and append date & time. The echo command prints out what I am expecting.. echo "Backing up $file to $file.$DATE.$FIXNUM" | tee -a $LOGFILE + echo 'Backing up... (4 Replies)
Discussion started by: Glennyp
4 Replies

4. Shell Programming and Scripting

awk not working as expected in script

Dear all, I had script which used to work, but recently it is not working as expected. I have command line in my shell script to choose the following format from the output_elog and perform some task afterwards on As you see, I want all numbers in foramt following RED mark except for... (12 Replies)
Discussion started by: emily
12 Replies

5. Shell Programming and Scripting

Read command not working as expected

I was trying to write a simple script which will read a text file and count the number of vowels in the file. My code is given below - #!/bin/bash file=$1 v=0 if then echo "$0 filename" exit 1 fi if then echo "$file not a file" exit 2 fi while read -n... (14 Replies)
Discussion started by: linux_learner
14 Replies

6. Shell Programming and Scripting

bash variable (set via awk+sed) not working as expected

Hi! Been working on a script and I've been having a problem. I've finally narrowed it down to this variable I'm setting: servername=$(awk -v FS=\/ '{ print $7 } blah.txt | sed 's\/./-/g' | awk -v FS=\- '{print $1}')" This will essentially pare down a line like this: ... (7 Replies)
Discussion started by: creativedynamo
7 Replies

7. OS X (Apple)

Cat command not working as expected

I've been trying to figure this out since last night, and I'm just stumped. The last time I did any shell scripting was 8 years ago on a Unix box, and it was never my strong suit. I'm on a Mac running Leopard now. Here's my dilemma - hopefully someone can point me in the right direction. I'm... (10 Replies)
Discussion started by: Daniel M. Clark
10 Replies

8. UNIX for Dummies Questions & Answers

Find command not working as expected

I have a script with a find command using xargs to copy the files found to another directory. The find command is finding the appropriate file, but it's not copying. I've checked permissions, and those are all O.K., so I'm not sure what I'm missing. Any help is greatly appreciated. This is... (2 Replies)
Discussion started by: mpflug
2 Replies

9. Shell Programming and Scripting

Var substitution in awk - not working as expected

countA=`awk '/X/''{print substr($0,38,1)}' fName | wc -l` countB=`wc -l fName | awk '{print int($1)}'` echo > temp ratio=`awk -va=$countA -vc=$countB '{printf "%.4f", a/c}' temp` After running script for above I am getting an error as : awk: 0602-533 Cannot find or open file -vc=25. The... (3 Replies)
Discussion started by: videsh77
3 Replies

10. Shell Programming and Scripting

awk not working as expected with BIG files ...

I am facing some strange problem. I know, there is only one record in a file 'test.txt' which starts with 'X' I ensure that with following command, awk /^X/ test.txt | wc -l This gives me output = '1'. Now I take out this record out of the file, as follows : awk /^X/ test.txt >... (1 Reply)
Discussion started by: videsh77
1 Replies
Login or Register to Ask a Question