Confusion with the concept of wc -c and wc -m


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Confusion with the concept of wc -c and wc -m
# 1  
Old 09-16-2014
Confusion with the concept of wc -c and wc -m

Is wc -c and wc -m same ?

Code:
Shellscript::cat file1 
hello
Shellscript::cat file1 | wc -c
6
Shellscript::cat file1 | wc -m
6
Shellscript::file file1
file1: ASCII text
Shellscript::uname -a
Linux was85host 2.6.27.45-0.1-vmi #1 SMP 2010-02-22 16:49:47 +0100 i686 i686 i386 GNU/Linux

Atleast for wc -m i was expecting the result to be 5 as it calculates the number of characters and not bytes.

wc -c shows 6. Is 1 character not 1 byte in Ascii system ?

Thanks,
# 2  
Old 09-16-2014
I don't know for your case, Linux 2.6.27.45-0.1

But for Solaris 10
-c is bytes
-C is the same as -m, count characters

So I suggest check it out on system via the man-pages
Code:
man wc

# 3  
Old 09-16-2014
IMHO you are foregoing something in your character count:

Code:
# cat file
hello

# wc -c file 
       6 file

# od -ax file
0000000    h   e   l   l   o  lf
            6865    6c6c    6f0a
0000006

You have a "lf" (line feed) character at the end of the line. This is the case in every well-formed ASCII text file, that the last line is terminated by a line-feed-character.

I hope this helps.

bakunin

/PS: a difference between "bytes" and "characters" would mean that there are multi-byte characters in your file (like the text being in Unicode, etc.)
These 2 Users Gave Thanks to bakunin For This Post:
# 4  
Old 09-16-2014
ASCII vs ISO

Correct the (old) ASCII code is a 7-bit (1 byte = 8 bits).
More information : https: //www.cs.tut.fi/~jkorpela/chars.html

But depends on your charactersettings of your system.

Quote:
Specifying a Character Encoding on Linux, UNIX, and Mac

The character encoding used on a Linux or UNIX system depends on the setting of the LC_ALL, LC_CTYPE or LANG environment variables. (At this writing, setting a basis.java.args=-Dfile.encoding= line in the BBj.properties file has no effect.) These three environment variables accept the name of a locale as their value. It is not typically necessary to explicitly set all three variables, but it is important to understand their hierarchy and what they actually do.
The LC_* environment variables, such as LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, etc, each control a specific aspect of software behavior in a given country-specific locale. They exist to provide a finer degree of control over these behaviors. The LC_CTYPE variable controls character encoding. The LANG environment variable provides a setting for each of these various aspects, but individual behavior categories can be overridden by the LC_* variables. The LC_ALL environment variable supersedes both the LC_* and the LANG variables. When BBj is started on a Linux or UNIX machine, the system sets the character encoding by checking the LC_ALL, LC_CTYPE and LANG environment variables in that order. The first one of these variables that contains a valid setting will be used, and the others will be ignored. This means if a particular setting for LANG is ineffective, check for the presence of LC_CTYPE or LC_ALL variables which could be overriding it. If none of these environment variables are set, the system will use the default locale found in the /usr/lib/locale directory.
Configuring environment variables for BBj can be done in the <bbj install dir>/bin/.envsetup file. Following are some examples:

LC_ALL=en_US.ISO8859-15
In this example a code set modifier, .ISO8859-15, has been appended to the locale name to specify the ISO-8859-15 character set.

LANG=de_DE@euro
The @euro causes the ISO-8859-15 character set to be used, which has a character mapping for the Euro character at $A4$. Setting the de_DE locale without @euro would get the ISO-8859-1 character set, which has no Euro character.

LANG=de_DE.cp1252
Here the code set modifier .cp1252 has been appended to the de_DE locale, specifying that Microsoft's modified version of the ISO-8859-1 character set should be used. This setting for LANG provides $80$ as a mapping for the Euro character, which would solve the difficulty with the accounting system described above. Unfortunately, most Linux systems do not have locales equipped to use the CP1252 character set. Such a new locale would have to be defined before the LANG variable could be assigned to it. See "Defining a New Locale" below for instructions on how to do this.
Source : documentation.basis.com/BASISHelp/WebHelp/inst/character_encoding.htm


sorry I'm not allowed to enclosure links ...

---------- Post updated at 03:05 PM ---------- Previous update was at 03:02 PM ----------

https//www.cs.tut.fi/~jkorpela/chars.html

don't forget the : add after the s of http

documentation.basis.com/BASISHelp/WebHelp/inst/character_encoding.htm

---------- Post updated at 03:59 PM ---------- Previous update was at 03:05 PM ----------

Code:
locale -a

displays a list of all the available locale definitions.


Code:
locale -m

displays a list of all the available character sets on a given machine

Code:
locale charmap

to see which character set is currently being used

---------- Post updated at 04:53 PM ---------- Previous update was at 03:59 PM ----------

Quote:
Originally Posted by shellscripting
wc -c shows 6. Is 1 character not 1 byte in Ascii system ?
so to my opinion it is normal in case your characterset use 1 byte / 1 character
This User Gave Thanks to droopy4u For This Post:
# 5  
Old 09-16-2014
Quote:
Originally Posted by bakunin
This is the case in every well-formed ASCII text file, that the last line is terminated by a line-feed-character.
That'd be correct for all *nix based systems...others like Apple use a carriage return while Windoze uses a CR+LF combo to separate an ASCII text stream into lines...
# 6  
Old 09-16-2014
Quote:
Originally Posted by shamrock
That'd be correct for all *nix based systems...others like Apple use a carriage return while Windoze uses a CR+LF combo to separate an ASCII text stream into lines...
Sorry - you are right, of course. Since the thread was about the behavior of the wc command i took only unixoid systems into account.

bakunin
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Sort concept confusion

By default, sort reorders lines in ASCII collating sequence --- whitespace first, then numerals,uppercase letters and finally lowercase letters. Shellscript:cat sort.txt aaa bbb ddd AAA eee GGG ggg Shellscript:sort sort.txt aaa AAA bbb ddd eee ggg GGG Why the default output... (6 Replies)
Discussion started by: shellscripting
6 Replies

2. UNIX for Dummies Questions & Answers

ACL concept

Hi.. Could someone explain about setfacl,getfacl in unix and its uses. Regards, Suresh (1 Reply)
Discussion started by: suresh sunkara
1 Replies

3. UNIX for Advanced & Expert Users

Looping concept please help

Hi Gurus, Please help me in below requirement. Instance =5 (it is user parameter) total=52 (it is user parameter i need to split this to 5 and reminder as 1 instances totally 6 for example i need to splitt to each 52/5=10.4 1-10 11-20 21-30 31-40 41-50 (2 Replies)
Discussion started by: ragu.selvaraj
2 Replies

4. UNIX for Dummies Questions & Answers

help me in RAID concept...

i couldn't get what does the metainit command represents in numeric values. (i.e) #metainit d66 2 1 c0t0d0s4 1 c0t0d0s5 ??here 2 1 1 represnts what ?? can some one tell clearly about this... (6 Replies)
Discussion started by: sriniv666
6 Replies

5. AIX

Use of mirroring concept....

hi.... Friends... Why using mirroring ? what is the use of mirroring? just any one tell about clearly.... thanks.... (4 Replies)
Discussion started by: Kannan841
4 Replies

6. Shell Programming and Scripting

Concept Of Array

Hi all, I used array a lot in C,VB,C# and java but now i am very new to shell programming,so i need a start of array in shell programming. All i want to do is read a string and put it into a character type array. For reading the string,i did this: $ read a now i want to put the content of a... (1 Reply)
Discussion started by: gautamshaw
1 Replies

7. UNIX for Dummies Questions & Answers

about concept of Interrupts.

Hi all, I am new here ,i want to know about interrupts in detail.What r Interrupts .how they r handeled. Thanx in adavnce. (1 Reply)
Discussion started by: vishwasrao
1 Replies

8. UNIX for Dummies Questions & Answers

The Concept of thread

Hi all This is my first thread here.i confused with the concept of thread.Can anyone tell me this concept in detail.my Quation may be at primary level. Thanx in advance for help. (1 Reply)
Discussion started by: vishwasrao
1 Replies

9. UNIX for Advanced & Expert Users

semaphore concept

Hi All, I am going through the semaphore concept and have a doubt regarding the same and hope to get a resolution here. I have a file which has a number of records. I want to write an application (in C) which will be able to do concurrent read/write on these records. Of what I have... (8 Replies)
Discussion started by: maverix
8 Replies
Login or Register to Ask a Question