Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

unihist(1) [debian man page]

unihist(1)						      General Commands Manual							unihist(1)

NAME
unihist - Generate a histogram of the characters in a Unicode file SYNOPSIS
unihist ([option flags]) DESCRIPTION
unihist generates a histogram of the characters in its input, which must be encoded in UTF-8 Unicode. By default, for each character it prints the frequency of the character as a percentage of the total, the absolute number of tokens in the input, the UTF-32 code in hexa- decimal, and, if the character is displayable, the glyph itself as UTF-8 Unicode. Command line flags allow unwanted information to be sup- pressed. In particular, note that by suppressing the percentages and counts it is possible to generate a list of the unique characters in the input. Output is produced ordered by character code. To sort it in descending order of frequency, pipe the output into the command: sort -k1 -n -r By default, unihist handles all of Unicode. To reduce memory usage and increase speed, it may be compiled so as to handle only the Basic Multilingual Plane (plane 0) by defining BMPONLY. COMMAND LINE FLAGS
-c Suppress printing of counts and percentages. -g Suppress printing of glyphs. -h Print usage information. -u Suppress printing of the Unicode code as text. -v Print version information. SEE ALSO
uniname (1) REFERENCES
Unicode Standard, version 5.0 AUTHOR
Bill Poser billposer@alum.mit.edu LICENSE
GNU General Public License May, 2008 unihist(1)

Check Out this Related Man Page

unifuzz(1)						      General Commands Manual							unifuzz(1)

NAME
unifuzz - Emit strings designed to test Unicode handling SYNOPSIS
unifuzz ([option flags]) DESCRIPTION
unifuzz emits strings designed to test the ability of programs intended to accept Unicode input to handle unexpected input. These include: characters from all Unicode ranges, Private Use characters, surrogates, undefined characters, non-characters, control characters, exotic space characters, sequences violating normalization rules, unexpected sequences (e.g. a base character from one range followed by a combin- ing character from another range), and long sequences of combining characters. It can also generate very long lines, strings containing embedded nulls, and ill-formed UTF-8. COMMAND LINE FLAGS
-b Restrict the output to the Basic Multilingual Plane (Plane 0). -g Do not emit specific characters. -h Print usage information. -l Emit very long lines. -n Emit string with embedded nulls. -q Be quiet. Omit commentary. -r <number> Set the number of random characters to emit. -S Scan ranges - emit a character from each range. -s <seed> Set the seed for the random number generator. -u Emit ill-formed UTF-8. -v Print version information. The sequence of random characters is determined by a pseudorandom number generator, so the same sequence can be obtained by setting the seed to the same value. If not set on the command line, a seed is chosen based on the time of execution. The seed used is included in the output in a line of the form "Seed = NNNNNN" immediately preceding the random character sequence. Note that in order to obtain the same sequence it is necessary to keep the same setting for restriction of output to the BMP. REFERENCES
Unicode Standard, version 5.0 AUTHOR
Bill Poser billposer@alum.mit.edu LICENSE
GNU General Public License April, 2008 unifuzz(1)
Man Page

8 More Discussions You Might Find Interesting

1. Programming

Suppress last N lines printing

Hi, I want to know different ways of suppressing printing of last N lines. Can anyone help? Thanks, Sree (1 Reply)
Discussion started by: chakri400
1 Replies

2. Shell Programming and Scripting

Remove non printing chars

How can I remove all the non printing characters from a file? I've tried using sed but I can't figure out how ! (change all but...) works with the substitute command. I know how to do it in perl but I would prefer not to use this solution. Thanks! (8 Replies)
Discussion started by: oti
8 Replies

3. UNIX for Dummies Questions & Answers

Help in sort command

I am new to unix. Could anyone tell me what the following command does. cat *.caller.dat | grep "," | sort -t\" -k 2,2 -u -T $SORTING_DIR > external_source.sort Your help would be much appreciated. Thanks in advance. (7 Replies)
Discussion started by: srikanth_ksv
7 Replies

4. Shell Programming and Scripting

Character Sets

Hi I was just wondering if there was a way in which i could find out the character set used in a file in HP-UX. ie Whether it is Unicode, UTF-8,ascii etc. Regards (3 Replies)
Discussion started by: PradeepRed
3 Replies

5. Shell Programming and Scripting

Limit of no of characters PER LINE in a unix file

Hi , Whats the limit of characters PER LINE in a unix file , allowed for editing..sort , cut , sed , awk etc (5 Replies)
Discussion started by: mohapatra
5 Replies

6. Shell Programming and Scripting

until loop Perl

I am trying to print out a section of a file begining at the start and printng until a character is found. My code and input file are below. This code is printing out every line except for the line with the character which is not what I want the out put should be a file with numbers 1-4. ... (3 Replies)
Discussion started by: cold_Que
3 Replies

7. Shell Programming and Scripting

counts based on percentage

I have a file with multiple entries and I have calculated the percentages. Now I want to know how many of my entries are there between 1-10% 11-20% and so on.. chr1_14401_14450 0.211954217888936 chr1_14451_14500 1.90758796100042 chr1_14501_14550 4.02713013988978 chr1_14551_14600 ... (3 Replies)
Discussion started by: Diya123
3 Replies

8. Programming

Frequency percentage distribution histogram with R

I am new to R and would like to calculate the percentage frequency distribution of h1 and h2. How can I combine h1 and h2 in one plot? I tried the following code. h1=c(5.18,4.61,3.30,7.58,3.00,3.80,1.95,2.67,2.77,2.73,2.33,3.36,3.50,1.91,4.25,3.87,2.86,2.26,2.00,3.86,3.33,3.59,4.00)... (0 Replies)
Discussion started by: ayyappa342
0 Replies