In a UTF-8 locale, the two-byte sequence specified by the C string "\347\03" is not a valid character, and it looks like gawk is discarding th contents of that line (or at least the characters starting with the invalid byte sequence and following characters up to the end of the line) are being discarded without printing a diagnostic. The standards only specify the behavior of awk when the files it reads are text files, so it is allowed to do anything it wants in this case. (By definition, file1 is not a text file since it contains byte sequences that are not valid characters in the current locale.) As RudiC suggested, using the C locale (which uses a single-byte character set with all byte sequences being valid characters) instead of the en_US.UTF-8 locale (which uses a variable number of bytes to encode a character and some sequences do not form valid characters) should do what you want in this case. (Note, however, that even in the C locale, NUL bytes are not valid in a text file and, if the file is not an empty file, the last byte of the file must be a <newline> character, and no line in the file can be longer than LINE_MAX bytes long. On most systems, LINE_MAX is 2048. The standards don't allow LINE_MAX to be less than 2048. The command:
will give you the limit on your system.) From what we have seen so far (i.e., the first three lines), file1 appears to be a valid text file in the C locale.
I am facing some strange problem.
I know, there is only one record in a file 'test.txt' which starts with 'X'
I ensure that with following command,
awk /^X/ test.txt | wc -l
This gives me output = '1'.
Now I take out this record out of the file, as follows :
awk /^X/ test.txt >... (1 Reply)
I have a script with a find command using xargs to copy the files found to another directory. The find command is finding the appropriate file, but it's not copying. I've checked permissions, and those are all O.K., so I'm not sure what I'm missing. Any help is greatly appreciated.
This is... (2 Replies)
I've been trying to figure this out since last night, and I'm just stumped. The last time I did any shell scripting was 8 years ago on a Unix box, and it was never my strong suit. I'm on a Mac running Leopard now. Here's my dilemma - hopefully someone can point me in the right direction.
I'm... (10 Replies)
Hi!
Been working on a script and I've been having a problem. I've finally narrowed it down to this variable I'm setting:
servername=$(awk -v FS=\/ '{ print $7 } blah.txt | sed 's\/./-/g' | awk -v FS=\- '{print $1}')"
This will essentially pare down a line like this:
... (7 Replies)
I was trying to write a simple script which will read a text file and count the number of vowels in the file. My code is given below -
#!/bin/bash
file=$1
v=0
if
then
echo "$0 filename"
exit 1
fi
if
then
echo "$file not a file"
exit 2
fi
while read -n... (14 Replies)
Dear all,
I had script which used to work, but recently it is not working as expected.
I have command line in my shell script to choose the following format from the output_elog and perform some task afterwards on
As you see, I want all numbers in foramt following RED mark except for... (12 Replies)
Hi, I'm having trouble with a simple copy command in a script on HPUX.
I am trying to copy a file and append date & time.
The echo command prints out what I am expecting..
echo "Backing up $file to $file.$DATE.$FIXNUM" | tee -a $LOGFILE
+ echo 'Backing up... (4 Replies)
Hi Experts,
Need your kind help with gsub awk.
Below is my pattern:"exec=1_host_cnt=100_dup=4_NameTag=targetSrv_500.csv","'20171122112948"," 100"," 1"," 1"," 4","400","","",
" aac sample exec ""hostname=XXXXX commandline='timeout 10 openssl speed -multi 2 ; exit 0'"" ","-1","-1","1","... (6 Replies)
This is my ubuntu version:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial
$ /bin/awk -V | head -n1
bash: /bin/awk: No such file or directory
I have gotten a script that helps me to parse,... (14 Replies)
Discussion started by: delbroooks
14 Replies
LEARN ABOUT OSF1
cmp
cmp(1) General Commands Manual cmp(1)NAME
cmp - Compares two files
SYNOPSIS
cmp [-l | -s] file1 file2
STANDARDS
Interfaces documented on this reference page conform to industry standards as follows:
cmp:XCU5.0
Refer to the standards(5) reference page for more information about industry standards and associated tags.
OPTIONS
Prints the byte number (decimal) and the differing bytes (octal) for each difference. Does not print data for differing files; returns
only an exit value.
OPERANDS
The path name of a file to be compared. The path name of a file to be compared.
DESCRIPTION
The cmp command compares two files.
If file1 or file2 is - (dash), standard input is used for that file. It is an error to specify - for both files.
By default, the cmp command prints no information if the files are the same. If the files differ, cmp prints the byte and line number
where the difference occurred.
The cmp command also specifies whether one file is an initial subsequence of the other (that is, if the cmp command reads an End-of-File
character in one file before finding any differences). Usually, you use the cmp command to compare nontext files and the diff command to
compare text files.
Note that bytes and lines reported by cmp are numbered from 1.
EXIT STATUS
The following exit values are returned: The files are identical. The files differ. This includes files of different lengths that are
identical in the first part of both files. An error occurred.
EXAMPLES
To determine whether two files are identical, enter: cmp prog.o.bak prog.o
The preceding command compares the files prog.o.bak and prog.o. If the files are identical, a message is not displayed. If the
files differ, the location of the first difference is displayed. For instance: prog.o.bak prog.o differ: byte 5, line 1
If the message cmp: EOF on prog.o.bak is displayed, then the first part of prog.o is identical to prog.o.bak, but there is addi-
tional data in prog.o.
If the message cmp: EOF on prog.o is displayed, it is prog.o.bak that is the same as prog.o but also contains addition data. To
display each pair of bytes that differ, enter: cmp -l prog.o.bak prog.o
This compares the files and then displays the byte number (in decimal) and the differing bytes (in octal) for each difference. For
example, if the fifth byte is octal 101 in prog.o.bak and 141 in prog.o, then the cmp command displays: 5 101 141
.
.
.
ENVIRONMENT VARIABLES
The following environment variables affect the execution of cmp: Provides a default value for the internationalization variables that are
unset or null. If LANG is unset or null, the corresponding value from the default locale is used. If any of the internationalization vari-
ables contain an invalid setting, the utility behaves as if none of the variables had been defined. If set to a non-empty string value,
overrides the values of all the other internationalization variables. Determines the locale for the interpretation of sequences of bytes
of text data as characters (for example, single-byte as opposed to multibyte characters in arguments). Determines the locale for the for-
mat and contents of diagnostic messages written to standard error. Determines the location of message catalogues for the processing of
LC_MESSAGES.
SEE ALSO
Commands: comm(1), bdiff(1), diff(1), diff3(1), sdiff(1)
Standards: standards(5)cmp(1)