Sponsored Content
Top Forums UNIX for Advanced & Expert Users UTF-8,16,32 character lengths using awk Post 302961482 by RudiC on Monday 30th of November 2015 11:19:23 AM
Old 11-30-2015
What exactly is going wrong? Do you want byte counts or char counts? As to awk, my version reports the same byte count as does wc -c:
Code:
wc -c <file
57
awk '{print length}' file
43
12

(You'll have to count the two <NL> chars in)

As to your awk function - wc needs an input file, so it won't count the stringtocheck variable...
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Using grep to find strings of certain lengths?

I am trying to use grep to find strings of certain lengths that all start with the same letter. Is this possible?:confused: (4 Replies)
Discussion started by: crabtruck
4 Replies

2. UNIX for Advanced & Expert Users

Convert UTF-8 encoded hex value to a character

Hi, I have a non-ascii character (Ŵ), which can be represented in UTF-8 encoding as equivalent hex value (\xC5B4). Is there a function in unix to convert this hex value back to display the charcter ? (10 Replies)
Discussion started by: sumirmehta
10 Replies

3. Solaris

limit on Solaris username lengths?

Hi this question applies to Solaris 8,9,10 and opensolaris as in my environment it applies to all of these Is there a limit on the size of the username (in /etc/passwd) or indeed does there come a point where, like the 8 character limitation of passwords, the system receives the input but... (6 Replies)
Discussion started by: hcclnoodles
6 Replies

4. Shell Programming and Scripting

Read lines with different lengths in while loop

Hi there ! I need to treat files with variable line length, and process the tab-delimited words of each line. The tools I know are some basic bash scripting and sed ... I haven't got to python or perl yet. So my file looks like this obj1 0.01953 0.34576 0.04418 0.01249 obj2 0.78140... (7 Replies)
Discussion started by: jossojjos
7 Replies

5. Shell Programming and Scripting

Merging data from 2 files of different lengths?

Hi all, Sorry if someone has answered something like this already, but I have a problem. I am not brilliant with "awk" but think it should be the command to use to get what I am after. I have 2 files: job-file (several hundred lines like): 1018003,LONG MU WAN,1113S 1018004,LONG MU... (4 Replies)
Discussion started by: sgb2301
4 Replies

6. Shell Programming and Scripting

How to modify character to UTF-8 in shell script?

I have a shell script running to load some data from a text file to database. Text file contains some non-ASCII characters like ü. How can i convert these characters to UTF-8 codes before loading to DB. (5 Replies)
Discussion started by: vel4ever
5 Replies

7. UNIX for Dummies Questions & Answers

Issue with UTF-8 BOM character in text file

Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad. Problem is: When saving... (4 Replies)
Discussion started by: jawsnnn
4 Replies

8. Shell Programming and Scripting

Merge two files with different lengths

Hi there, I have two very long files like: file1: two fields 1 123 1 125 1 234 2 123 2 234 2 300 2 312 3 10 3 215 4 56 ... (11 Replies)
Discussion started by: ClaraW
11 Replies

9. Linux

Help to Convert file from UNIX UTF-8 to Windows UTF-16

Hi, I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt and i am getting some chinese characters as below which l opened the converted file on windows machine. LANG=en_US.UTF-8... (3 Replies)
Discussion started by: phanidhar6039
3 Replies

10. Shell Programming and Scripting

Paste files of varying lengths

I have three files of varying lengths and different number of columns. How can I paste all three with all columns aligned? File1 ---- 123 File2 ---- 234 345 678 File3 ---- 456 789 Output should look like: 123 234 456 345 789 (6 Replies)
Discussion started by: Un1xNewb1e
6 Replies
wc(1)							      General Commands Manual							     wc(1)

NAME
wc - Counts the lines, words, characters, and bytes in a file SYNOPSIS
wc [-c | -m] [-lw] [file...] The wc command counts the lines, words, and characters or bytes in a file, or in the standard input if you do not specify any files, and writes the results to standard output. It also keeps a total count for all named files. STANDARDS
Interfaces documented on this reference page conform to industry standards as follows: wc: XCU5.0 Refer to the standards(5) reference page for more information about industry standards and associated tags. OPTIONS
Counts bytes in the input. Counts lines in the input. Counts characters in the input. Counts words in the input. OPERANDS
Specifies the pathname of the input file. If this operand is omitted, standard input is used. DESCRIPTION
A word is defined as a string of characters delimited by white space as defined in the X/Open Base Definitions for XCU4. The wc command counts lines, words, and bytes by default. Use the appropriate options to limit wc output. Specifying wc without options is the equivalent of specifying wc -lwc. If any options are specified, only the requested information is output. The order in which counts appear in the output line is lines, words, bytes. If an option is omitted, then the corresponding field in the output is omitted. If the -m option is used, then character counts replace byte counts. When you specify one or more files, wc displays the names of the files along with the counts. If standard input is used, then no file name is displayed. EXIT STATUS
The following exit values are returned: Successful completion. An error occurred. EXAMPLES
To display the number of lines, words, and bytes in the file text, enter: wc text This results in the following output: 27 185 722 text The numbers 27, 185, and 722 are the number of lines, words, and bytes, respectively, in the file text. To display only one or two of the three counts include the appropriate options. For example, the following command displays only line and byte counts: wc -cl text 27 722 text To count lines, words, and bytes in more than one file, use wc with more than one input file or with a file name pat- tern. For example, the following command can be issued in a directory containing the files text, text1, and text2: wc -l text* 27 text 112 text1 5 text2 144 total The numbers 27, 112, and 5 are the numbers of lines in the files text, text1, and text2, respectively, and 144 is the total number of lines in the three files. The file name is always appended to the output. To obtain a pure number for things like reporting purposes, pipe all input to the wc command using cat. For example, the following command will report the total count of characters in all files in a directory. echo There are `cat *.c | wc -c` characters in *.c files There are 1869 characters in *.c files ENVIRONMENT VARIABLES
The following environment variables affect the execution of wc: Provides a default value for the internationalization variables that are unset or null. If LANG is unset or null, the corresponding value from the default locale is used. If any of the internationalization vari- ables contain an invalid setting, the utility behaves as if none of the variables had been defined. If set to a non-empty string value, overrides the values of all the other internationalization variables. Determines the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multibyte characters in arguments and input files) and which characters are defined as white space characters. Determines the locale for the format and contents of diagnostic messages written to standard error and informative messages written to standard output. Determines the location of message catalogues for the processing of LC_MESSAGES. SEE ALSO
Commands: cksum(1), ls(1) Standards: standards(5) wc(1)
All times are GMT -4. The time now is 11:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy