Sorting on length with identification of number of characters Post: 302758507

10 More Discussions You Might Find Interesting

1. AIX

Is the Length of User ID for AIX Limit to 8 Characters?

Hi, I'm using AIX version 5.3 currently. I'm trying to create a user id, e.g. andyleong, which the system prompted the length is too long. 1. I would like to know is that the length of user id is limited to maximum 8 characters for AIX. 2. Is it apply to all versions of AIX? If no...

2. UNIX for Dummies Questions & Answers

Conditional sorting on fixed length flat file

I have a fixed length file that need to be sorted according to the following rule IF B=1 ORDER by A,B Else ORDER by A,C Input file is ABC 131 112 122 231 212 222 Output needed ABC 112 131 122 212 231 222

3. Shell Programming and Scripting

Sorting with non- and alphanumeric characters

Hi guys, I'm new to this forum and I'm not a UNIX expert. I can't figure out this certain problem i'm having: I need to sort some words, some of the words are annotations (enclosed within < and >). I need to have them sorted alphabetically with all non-alphanumeric characters up front. For...

4. Shell Programming and Scripting

Search and replace particular characters in fixed length-file

Masters, I have fixed length input file like FHEAD0000000001XXXX20090901 0000009000Y1000XXX2 THEAD000000000220090901 ITM0000109393813 430143504352N22SP 000000000000RN000000010000EA P0000000000000014390020090901 TTAIL0000000003000000 FTAIL00000000040000000002 Note...

5. UNIX for Dummies Questions & Answers

Sorting words based on length

i need to write a bash script that recive a list of varuables kaka pele ronaldo beckham zidane messi rivaldo gerrard platini i need the program to print the longest word of the list. word in the output appears on a separate line and word order in the output is in the order Llachsicografi costs....

6. Shell Programming and Scripting

Remove characters from fixed length file

Hello I've question on the requirement I am working on. We are getting a fixed length file with "33" characters long. We are processing that file loading into DB. Now some times we are getting a file with "35" characters long. In this case I have to remove two characters (in 22,23...

7. Shell Programming and Scripting

Need to find lines where the length is less than 50 characters

Hi, I have a big file say abc.csv. And in that file, I need to find lines whose length is less than 50 characters. How can it be achieved? Thanks in advance. Thanks

8. Shell Programming and Scripting

Sorting by length

Hello, I have a very large file: a dictionary of headwords of around 40000 and would like to have the dictionary sorted by its length i.e. the largest string first and the smallest at the end. I have hunted for a perl or awk script on the forum which can do the job but there is none available. I...

9. Shell Programming and Scripting

Sorting a file with frequency on length

Hello, I have a file which has the following structure word space Frequency The file is around 30,000 headwords each along with its frequency. The words have different lengths. What I need is a PERL or AWK script which can sort the file on length of the headword and once the file is sorted on...

10. Shell Programming and Scripting

Checking the user input in perl for characters and length

My question is basically as the title says. How can I check a user inputted string is only certain characters long (for example, 3 characters long) and how do I check a user inputted string only contains certain characters (for example, it should only contain the characters 'u', 'a', 'g', and 'c')...

LEARN ABOUT DEBIAN

lingua::stem::snowball

Lingua::Stem::Snowball(3pm)				User Contributed Perl Documentation			       Lingua::Stem::Snowball(3pm)

NAME

       Lingua::Stem::Snowball - Perl interface to Snowball stemmers.

SYNOPSIS

	   my @words = qw( horse hooves );

	   # OO interface:
	   my $stemmer = Lingua::Stem::Snowball->new( lang => 'en' );
	   $stemmer->stem_in_place( @words ); # qw( hors hoov )

	   # Functional interface:
	   my @stems = stem( 'en', @words );

DESCRIPTION

       Stemming reduces related words to a common root form -- for instance, "horse", "horses", and "horsing" all become "hors".  Most commonly,
       stemming is deployed as part of a search application, allowing searches for a given term to match documents which contain other forms of
       that term.

       This module is very similar to Lingua::Stem -- however, Lingua::Stem is pure Perl, while Lingua::Stem::Snowball is an XS module which
       provides a Perl interface to the C version of the Snowball stemmers.  (<http://snowball.tartarus.org>).

   Supported Languages
       The following stemmers are available (as of Lingua::Stem::Snowball 0.95):

	   |-----------------------------------------------------------|
	   | Language	| ISO code | default encoding | also available |
	   |-----------------------------------------------------------|
	   | Danish	| da	   | ISO-8859-1       | UTF-8	       |
	   | Dutch	| nl	   | ISO-8859-1       | UTF-8	       |
	   | English	| en	   | ISO-8859-1       | UTF-8	       |
	   | Finnish	| fi	   | ISO-8859-1       | UTF-8	       |
	   | French	| fr	   | ISO-8859-1       | UTF-8	       |
	   | German	| de	   | ISO-8859-1       | UTF-8	       |
	   | Hungarian	| hu	   | ISO-8859-1       | UTF-8	       |
	   | Italian	| it	   | ISO-8859-1       | UTF-8	       |
	   | Norwegian	| no	   | ISO-8859-1       | UTF-8	       |
	   | Portuguese | pt	   | ISO-8859-1       | UTF-8	       |
	   | Romanian	| ro	   | ISO-8859-2       | UTF-8	       |
	   | Russian	| ru	   | KOI8-R	      | UTF-8	       |
	   | Spanish	| es	   | ISO-8859-1       | UTF-8	       |
	   | Swedish	| sv	   | ISO-8859-1       | UTF-8	       |
	   | Turkish	| tr	   | UTF-8	      | 	       |
	   |-----------------------------------------------------------|

   Benchmarks
       Here is a comparison of Lingua::Stem::Snowball and Lingua::Stem, using The Works of Edgar Allen Poe, volumes 1-5 (via Project Gutenberg) as
       source material.  It was produced on a 3.2GHz Pentium 4 running FreeBSD 5.3 and Perl 5.8.7.  (The benchmarking script is included in this
       distribution: devel/benchmark_stemmers.plx.)

	   |--------------------------------------------------------------------|
	   | total words: 454285 | unique words: 22748				|
	   |--------------------------------------------------------------------|
	   | module			   | config	   | avg secs | rate	|
	   |--------------------------------------------------------------------|
	   | Lingua::Stem 0.81		   | no cache	   | 2.029    | 223881	|
	   | Lingua::Stem 0.81		   | cache level 2 | 1.280    | 355025	|
	   | Lingua::Stem::Snowball 0.94   | stem	   | 1.426    | 318636	|
	   | Lingua::Stem::Snowball 0.94   | stem_in_place | 0.641    | 708495	|
	   |--------------------------------------------------------------------|

METHODS 
/ FUNCTIONS
   new
	   my $stemmer = Lingua::Stem::Snowball->new(
	       lang	=> 'es',
	       encoding => 'UTF-8',
	   );
	   die $@ if $@;

       Create a Lingua::Stem::Snowball object.	new() accepts the following hash style parameters:

       o   lang: An ISO code taken from the table of supported languages, above.

       o   encoding: A supported character encoding.

       Be careful with the values you supply to new(). If "lang" is invalid, Lingua::Stem::Snowball does not throw an exception, but instead sets
       $@.  Also, if you supply an invalid combination of values for "lang" and "encoding", Lingua::Stem::Snowball will not warn you, but the
       behavior will change: stem() will always return undef, and stem_in_place() will be a no-op.

   stem
	   @stemmed = $stemmer->stem( WORDS, [IS_STEMMED] );
	   @stemmed = stem( ISO_CODE, WORDS, [LOCALE], [IS_STEMMED] );

       Return lowercased and stemmed output.  WORDS may be either an array of words or a single scalar word.

       In a scalar context, stem() returns the first item in the array of stems:

	   $stem       = $stemmer->stem($word);
	   $first_stem = $stemmer->stem(@words); # probably wrong

       LOCALE has no effect; it is only there as a placeholder for backwards compatibility (see Changes).  IS_STEMMED must be a reference to a
       scalar; if it is supplied, it will be set to 1 if the output differs from the input in some way, 0 otherwise.

   stem_in_place
	   $stemmer->stem_in_place(@words);

       This is a high-performance, streamlined version of stem() (in fact, stem() calls stem_in_place() internally). It has no return value,
       instead modifying each item in an existing array of words.  The words must already be in lower case.

   lang
	   my $lang = $stemmer->lang;
	   $stemmer->lang($iso_language_code);

       Accessor/mutator for the lang parameter. If there is no stemmer for the supplied ISO code, the language is not changed (but $@ is set).

   encoding
	   my $encoding = $stemmer->encoding;
	   $stemmer->encoding($encoding);

       Accessor/mutator for the encoding parameter.

   stemmers
	   my @iso_codes = stemmers();
	   my @iso_codes = $stemmer->stemmers();

       Returns a list of all valid language codes.

REQUESTS &; BUGS
       Please report any requests, suggestions or bugs via the RT bug-tracking system at http://rt.cpan.org/ or email to
       bug-Lingua-Stem-Snowball@rt.cpan.org.

       http://rt.cpan.org/NoAuth/Bugs.html?Dist=Lingua-Stem-Snowball is the RT queue for Lingua::Stem::Snowball.  Please check to see if your bug
       has already been reported.

AUTHORS

       Lingua::Stem::Snowball was originally developed to provide access to stemming algorithms for the OpenFTS (full text search engine) project
       (<http://openfts.sourceforge.net>), by Oleg Bartunov, <oleg at sai dot msu dot su> and Teodor Sigaev, <teodor at stack dot net>.

       Currently maintained by Marvin Humphrey <marvin at rectangular dot com>.  Previously maintained by Fabien Potencier <fabpot at cpan dot
       org>.

COPYRIGHT AND LICENSE

       Perl bindings copyright 2004-2008 by Marvin Humphrey, Fabien Potencier, Oleg Bartunov and Teodor Sigaev.

       This software may be freely copied and distributed under the same terms and conditions as Perl.

       Snowball files and stemmers are covered by the BSD license.

SEE ALSO

       <http://snowball.tartarus.org>, Lingua::Stem.

perl v5.14.2							    2011-11-15					       Lingua::Stem::Snowball(3pm)

10 More Discussions You Might Find Interesting

1. AIX

Is the Length of User ID for AIX Limit to 8 Characters?

Discussion started by: meihua_t

2. UNIX for Dummies Questions & Answers

Conditional sorting on fixed length flat file

Discussion started by: zsk_00

3. Shell Programming and Scripting

Sorting with non- and alphanumeric characters

Discussion started by: fed.m.ang

4. Shell Programming and Scripting

Search and replace particular characters in fixed length-file

Discussion started by: bittoo