Count occurences of the word without it repeating


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Count occurences of the word without it repeating
# 8  
Old 06-15-2019
Note: If you do not specify a field separator (FS) in awk, it uses the default of a single space (" "), which has a special meaning:
Quote:
If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.
The Open Group Base Specifications Issue 7, 2018 edition
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count the occurences of strings

I have some text files in a folder f1 with 10 columns. The first five columns of a file are shown below. aab abb 263-455 263 455 aab abb 263-455 263 455 aab abb 263-455 263 455 bbb abb 26-455 26 455 bbb abb 26-455 26 455 bbb aka 264-266 264 266 bga bga 230-232 230 ... (10 Replies)
Discussion started by: gomez
10 Replies

2. Shell Programming and Scripting

awk count occurences

line number:status, market, keystation 1,SENT,EBS,1 : 1 2,DONE,REU,1 : 1 3,SENT,EBS,2 : 1 4,DONE,EBS,1 : 0 5,SENT,EBS,2 : 0 6,SENT,EBS,2 : 0 7,SENT,EBS,2 : 0 8,SENT,EBS,1 : 1 for each status, market combination I want to keep a tally of active orders. i.e if an order is SENT, then +1, if... (8 Replies)
Discussion started by: Calypso
8 Replies

3. UNIX for Dummies Questions & Answers

Count pattern occurences

hi, I have a text..and i need to find a pattern in the text and count to the no of times the pattern occured. i have used grep command ..but the problem is , it shows the occurrences of the pattern but doesn't count no of times the pattern occuries. (5 Replies)
Discussion started by: nvnni
5 Replies

4. Shell Programming and Scripting

Count occurences of string

Hi, Please help me in finding the number of occurences of the string. Example: Apple, green, blue, Apple, Orange, green, blue are the strings can be even in the next line. The o/p should look as: Word Count ----- ----- Apple 2 green 2 Orange 1 blue 2 Thanks (2 Replies)
Discussion started by: acc888
2 Replies

5. Shell Programming and Scripting

Awk to count occurences

Hi, i am in need of an awk script to accomplish the following: Input table looks like: Student1 arts Student2 science Student3 arts Student4 science Student5 science Student6 science Student7 science Student8 science Student9 science Student10 science Student11 science... (8 Replies)
Discussion started by: saint2006
8 Replies

6. UNIX for Dummies Questions & Answers

Count number of occurences of a word

I want to count the number of occurences of say "200" in a file but that file also contains various stuff including dtaes like 2007 or smtg like 200.1 so count i am getting by doing grep -c "word" file is wrong Please help!!!!! (8 Replies)
Discussion started by: shikhakaul
8 Replies

7. Shell Programming and Scripting

Perl - Count occurences

I have enclosed the script. I am able to find the files that contain my search string but when I try to count the occurences within the file I get zero always. Any help on this. #!/usr/bin/perl my $find = $ARGV; my $replace = $ARGV; my $glob = $ARGV; @filelist = <*$glob>; # process each... (22 Replies)
Discussion started by: TimHortons
22 Replies

8. UNIX for Dummies Questions & Answers

How to count the occurences of a specific word in a file in bash shell

Hello, I want to count the occurences of a specific word in a .txt file in bash shell. Can somebody help me pleaze?? Thanks!!! (2 Replies)
Discussion started by: mskart
2 Replies

9. Shell Programming and Scripting

no of occurences of q word

hi I hace a string "abc,def,ghi,abc,def ,ghi,abc,def,ghi,abc,def ,ghi,abc" i replaced commas with spaces, now i want to calculate nof occurences of "abc" word. thanks in advance Satya (6 Replies)
Discussion started by: Satyak
6 Replies

10. Web Development

How to find all occurences of word?

Hi, For example lets consider i have word like this:cell I have some text that is stored in table. These are few sentences. TRAP also regulates translation of trpE by promoting formation of an cell. In addition initiation of pabA, trpP and ycbK by directly blocking cells. I... (0 Replies)
Discussion started by: vanitham
0 Replies
Login or Register to Ask a Question
Text::Unidecode(3)					User Contributed Perl Documentation					Text::Unidecode(3)

NAME
Text::Unidecode -- US-ASCII transliterations of Unicode text SYNOPSIS
use utf8; use Text::Unidecode; print unidecode( "x{5317}x{4EB0} " # those are the Chinese characters for Beijing ); # That prints: Bei Jing DESCRIPTION
It often happens that you have non-Roman text data in Unicode, but you can't display it -- usually because you're trying to show it to a user via an application that doesn't support Unicode, or because the fonts you need aren't accessible. You could represent the Unicode characters as "???????" or "15BA15A01610...", but that's nearly useless to the user who actually wants to read what the text says. What Text::Unidecode provides is a function, "unidecode(...)" that takes Unicode data and tries to represent it in US-ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F). The representation is almost always an attempt at transliteration -- i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system. (See the example in the synopsis.) Unidecode's ability to transliterate is limited by two factors: o The amount and quality of data in the original So if you have Hebrew data that has no vowel points in it, then Unidecode cannot guess what vowels should appear in a pronounciation. S f y hv n vwls n th npt, y wn't gt ny vwls n th tpt. (This is a specific application of the general principle of "Garbage In, Garbage Out".) o Basic limitations in the Unidecode design Writing a real and clever transliteration algorithm for any single language usually requires a lot of time, and at least a passable knowledge of the language involved. But Unicode text can convey more languages than I could possibly learn (much less create a transliterator for) in the entire rest of my lifetime. So I put a cap on how intelligent Unidecode could be, by insisting that it support only context-insensitive transliteration. That means missing the finer details of any given writing system, while still hopefully being useful. Unidecode, in other words, is quick and dirty. Sometimes the output is not so dirty at all: Russian and Greek seem to work passably; and while Thaana (Divehi, AKA Maldivian) is a definitely non-Western writing system, setting up a mapping from it to Roman letters seems to work pretty well. But sometimes the output is very dirty: Unidecode does quite badly on Japanese and Thai. If you want a smarter transliteration for a particular language than Unidecode provides, then you should look for (or write) a transliteration algorithm specific to that language, and apply it instead of (or at least before) applying Unidecode. In other words, Unidecode's approach is broad (knowing about dozens of writing systems), but shallow (not being meticulous about any of them). FUNCTIONS
Text::Unidecode provides one function, "unidecode(...)", which is exported by default. It can be used in a variety of calling contexts: "$out = unidecode($in);" # scalar context This returns a copy of $in, transliterated. "$out = unidecode(@in);" # scalar context This is the same as "$out = unidecode(join '', @in);" "@out = unidecode(@in);" # list context This returns a list consisting of copies of @in, each transliterated. This is the same as "@out = map scalar(unidecode($_)), @in;" "unidecode(@items);" # void context "unidecode(@bar, $foo, @baz);" # void context Each item on input is replaced with its transliteration. This is the same as "for(@bar, $foo, @baz) { $_ = unidecode($_) }" You should make a minimum of assumptions about the output of "unidecode(...)". For example, if you assume an all-alphabetic (Unicode) string passed to "unidecode(...)" will return an all-alphabetic string, you're wrong -- some alphabetic Unicode characters are transliterated as strings containing punctuation (e.g., the Armenian letter at 0x0539 currently transliterates as "T`". However, these are the assumptions you can make: o Each character 0x0000 - 0x007F transliterates as itself. That is, "unidecode(...)" is 7-bit pure. o The output of "unidecode(...)" always consists entirely of US-ASCII characters -- i.e., characters 0x0000 - 0x007F. o All Unicode characters translate to a sequence of (any number of) characters that are newline (" ") or in the range 0x0020-0x007E. That is, no Unicode character translates to "x01", for example. (Altho if you have a "x01" on input, you'll get a "x01" in output.) o Yes, some transliterations produce a " " -- but just a few, and only with good reason. Note that the value of newline (" ") varies from platform to platform -- see "perlport" in perlport. o Some Unicode characters may transliterate to nothing (i.e., empty string). o Very many Unicode characters transliterate to multi-character sequences. E.g., Han character 0x5317 transliterates as the four- character string "Bei ". o Within these constraints, I may change the transliteration of characters in future versions. For example, if someone convinces me that the Armenian letter at 0x0539, currently transliterated as "T`", would be better transliterated as "D", I may well make that change. DESIGN GOALS AND CONSTRAINTS
Text::Unidecode is meant to be a transliterator-of-last resort, to be used once you've decided that you can't just display the Unicode data as is, and once you've decided you don't have a more clever, language-specific transliterator available. It transliterates context- insensitively -- that is, a given character is replaced with the same US-ASCII (7-bit ASCII) character or characters, no matter what the surrounding character are. The main reason I'm making Text::Unidecode work with only context-insensitive substitution is that it's fast, dumb, and straightforward enough to be feasable. It doesn't tax my (quite limited) knowledge of world languages. It doesn't require me writing a hundred lines of code to get the Thai syllabification right (and never knowing whether I've gotten it wrong, because I don't know Thai), or spending a year trying to get Text::Unidecode to use the ChaSen algorithm for Japanese, or trying to write heuristics for telling the difference between Japanese, Chinese, or Korean, so it knows how to transliterate any given Uni-Han glyph. And moreover, context-insensitive substitution is still mostly useful, but still clearly couldn't be mistaken for authoritative. Text::Unidecode is an example of the 80/20 rule in action -- you get 80% of the usefulness using just 20% of a "real" solution. A "real" approach to transliteration for any given language can involve such increasingly tricky contextual factors as these The previous / preceding character(s) What a given symbol "X" means, could depend on whether it's followed by a consonant, or by vowel, or by some diacritic character. Syllables A character "X" at end of a syllable could mean something different from when it's at the start -- which is especially problematic when the language involved doesn't explicitly mark where one syllable stops and the next starts. Parts of speech What "X" sounds like at the end of a word, depends on whether that word is a noun, or a verb, or what. Meaning By semantic context, you can tell that this ideogram "X" means "shoe" (pronounced one way) and not "time" (pronounced another), and that's how you know to transliterate it one way instead of the other. Origin of the word "X" means one thing in loanwords and/or placenames (and derivatives thereof), and another in native words. "It's just that way" "X" normally makes the /X/ sound, except for this list of seventy exceptions (and words based on them, sometimes indirectly). Or: you never can tell which of the three ways to pronounce "X" this word actually uses; you just have to know which it is, so keep a dictionary on hand! Language The character "X" is actually used in several different languages, and you have to figure out which you're looking at before you can determine how to transliterate it. Out of a desire to avoid being mired in any of these kinds of contextual factors, I chose to exclude all of them and just stick with context-insensitive replacement. TODO
Things that need tending to are detailed in the TODO.txt file, included in this distribution. Normal installs probably don't leave the TODO.txt lying around, but if nothing else, you can see it at http://search.cpan.org/search?dist=Text::Unidecode MOTTO
The Text::Unidecode motto is: It's better than nothing! ...in both meanings: 1) seeing the output of "unidecode(...)" is better than just having all font-unavailable Unicode characters replaced with "?"'s, or rendered as gibberish; and 2) it's the worst, i.e., there's nothing that Text::Unidecode's algorithm is better than. CAVEATS
If you get really implausible nonsense out of "unidecode(...)", make sure that the input data really is a utf8 string. See "perlunicode" in perlunicode. THANKS
Thanks to Harald Tveit Alvestrand, Abhijit Menon-Sen, and Mark-Jason Dominus. SEE ALSO
Unicode Consortium: http://www.unicode.org/ Geoffrey Sampson. 1990. Writing Systems: A Linguistic Introduction. ISBN: 0804717567 Randall K. Barry (editor). 1997. ALA-LC Romanization Tables: Transliteration Schemes for Non-Roman Scripts. ISBN: 0844409405 [ALA is the American Library Association; LC is the Library of Congress.] Rupert Snell. 2000. Beginner's Hindi Script (Teach Yourself Books). ISBN: 0658009109 COPYRIGHT AND DISCLAIMERS
Copyright (c) 2001 Sean M. Burke. All rights reserved. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. Much of Text::Unidecode's internal data is based on data from The Unicode Consortium, with which I am unafiliated. AUTHOR
Sean M. Burke "sburke@cpan.org" perl v5.16.3 2001-07-14 Text::Unidecode(3)