Replacing stopwords based on a list Post: 302939587

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replacing text based on replacement tables

Dear all, will be grateful for your advices.. The need is (i guess) simple for UNIX experts. Basically, there are replacement tables, which would be used to replace text strings in the data (large volumes..). An exmpl table (a "config file"): VIFIS1_1_PE1836 VIBRIO_FISCHERI VIPAR1_1_PE1662 ...

2. Shell Programming and Scripting

Replacing Character in a file based on element

Hi, I have file like below. Unix:/pclls/turc>cat tibc.property executeReceiver=Y executeSender=Y I want to replace executeSender=N in the file. My file should be like below. executeReceiver=Y executeSender=N I tried with the below command, its giving error. cat tibc.property |...

3. UNIX for Dummies Questions & Answers

Script for replacing text in a file based on list

Hi All, I am fairly new to the world of Unix, and I am looking for a way to replace a line of text in a file with a delimited array of values. I have an aliases file that is currently in use on our mail server that we are migrating off of. Until the migration is complete, the server must stay...

4. Shell Programming and Scripting

Replacing headers based on a second file

I have a file with thousands of sequences that looks like this: I need to replace the headers using a second file Thus, I will end up having the following file: I am looking for an AWK script that I can easily plug in my current pipeline. Any help will be greatly appreciated!

5. Shell Programming and Scripting

Replacing the text in a row based on certain condition

Hi All, I felt tough to frame my question. Any way find my below input. (.CSV file) SNo, City 1, Chennai 2, None 3, Delhi 4,None Note that I have many rows ans also other columns beside my City column. What I need is the below output. SNo, City 1, Chennai 2, Chennai_new 3, Delhi...

6. Shell Programming and Scripting

Finding/replacing strings in some files based on a file

Hi, We have a file (e.g. a .csv file, but could be any other format), with 2 columns: the old value and the new value. We need to modify all the files within the current directory (including subdirectories), so find and replace the contents found in the first column within the file, with the...

7. Shell Programming and Scripting

Help with awk replacing identical columns based on another file

Hello, I am using Awk in UBUNTU 12.04. I have a file like following with three fields and 44706 rows. F1 A A F2 G G F3 A T I have another file like this: AL_1 F1 A A AL_2 F1 A T AL_3 F1 A A AL_1 F2 G G AL_2 F2 G A AL_3 F2 G G BO_1 F1 A A BO_2 F1 A T...

8. Shell Programming and Scripting

Replacing a character with a number based on lines

Hi, I am in need of help for the two things which is to be done. First, I have a file that has around four columns. The first column is filled with letter "A". There are around 400 lines in the files as shown below. A 1 5.2 3.2 A 2 0.2 4.5 A 1 2.2 2.2 A 5 2.1 ...

9. UNIX for Advanced & Expert Users

Replacing string length based on pattern

Hi All, I have a file which is like below. I need to read all the patterns that starts with P and then replace the 9 digit values to 8 digit values (remove leading integer). Can you please help Example : ( Please look below File) File : P,1 M1,...

10. UNIX for Beginners Questions & Answers

Replacing tag based on condition

Hi All, I am having a file like below. The file will having information about the records.If you see the file the file is header and data. For example it have 1 men tag and the tag id will be come after headers. The change is I want to convert All pets tag from P to X. I did a sed like below...

LEARN ABOUT DEBIAN

lingua::stopwords

Lingua::StopWords(3pm)					User Contributed Perl Documentation				    Lingua::StopWords(3pm)

NAME

       Lingua::StopWords - Stop words for several languages.

SYNOPSIS

	   use Lingua::StopWords qw( getStopWords );
	   my $stopwords = getStopWords('en');

	   my @words = qw( i am the walrus goo goo g'joob );

	   # prints "walrus goo goo g'joob"
	   print join ' ', grep { !$stopwords->{$_} } @words;

DESCRIPTION

       In keyword search, it is common practice to suppress a collection of "stopwords": words such as "the", "and", "maybe", etc. which exist in
       in a large number of documents and do not tell you anything important about any document which contains them.  This module provides such
       "stoplists" in several languages.

   Supported Languages
	   |-----------------------------------------------------------|
	   | Language	| ISO code | default encoding | also available |
	   |-----------------------------------------------------------|
	   | Danish	| da	   | ISO-8859-1       | UTF-8	       |
	   | Dutch	| nl	   | ISO-8859-1       | UTF-8	       |
	   | English	| en	   | ISO-8859-1       | UTF-8	       |
	   | Finnish	| fi	   | ISO-8859-1       | UTF-8	       |
	   | French	| fr	   | ISO-8859-1       | UTF-8	       |
	   | German	| de	   | ISO-8859-1       | UTF-8	       |
	   | Hungarian	| hu	   | ISO-8859-1       | UTF-8	       |
	   | Italian	| it	   | ISO-8859-1       | UTF-8	       |
	   | Norwegian	| no	   | ISO-8859-1       | UTF-8	       |
	   | Portuguese | pt	   | ISO-8859-1       | UTF-8	       |
	   | Spanish	| es	   | ISO-8859-1       | UTF-8	       |
	   | Swedish	| sv	   | ISO-8859-1       | UTF-8	       |
	   | Russian	| ru	   | KOI8-R	      | UTF-8	       |
	   |-----------------------------------------------------------|

FUNCTIONS

   getStopWords
	   my $stoplist      = getStopWords('en');
	   my $utf8_stoplist = getStopWords('en', 'UTF-8');

       Retrieve a stoplist in the form of a hashref where the keys are all stopwords and the values are all 1.

	   $stoplist = {
	       and => 1,
	       if  => 1,
	       # ...
	   };

       getStopWords() expects 1-2 arguments.  The first, which is required, is an ISO code representing a supported language.  If the ISO code
       cannot be found, getStopWords returns undef.

       The second argument should be 'UTF-8' if you want the stopwords encoded in UTF-8.  The UTF-8 flag will be turned on, so make sure you
       understand all the implications of that.

SEE ALSO

       The stoplists supplied by this module were created as part of the Snowball project (see <http://snowball.tartarus.org>,
       Lingua::Stem::Snowball).

       Lingua::EN::StopWords provides a different stoplist for English.

AUTHOR

       Maintained by Marvin Humphrey <marvin at rectangular dot com>.  Original author Fabien Potencier, <fabpot at cpan dot org>.

COPYRIGHT AND LICENSE

       Copyright 2004-2008 Fabien Potencier, Marvin Humphrey

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.3 or,
       at your option, any later version of Perl 5 you may have available.

perl v5.10.0							    2009-02-23						    Lingua::StopWords(3pm)