Extremely Fast Text Feature Extraction for Classification and Indexing


 
Thread Tools Search this Thread
Special Forums News, Links, Events and Announcements UNIX and Linux RSS News Extremely Fast Text Feature Extraction for Classification and Indexing
# 1  
Old 08-22-2008
Extremely Fast Text Feature Extraction for Classification and Indexing

HPL-2008-91R1 Extremely Fast Text Feature Extraction for Classification and Indexing - Forman, George; Kirshenbaum, Evan
Keyword(s): text mining, text indexing, bag-of-words, feature engineering, feature extraction, document categorization, text tokenization
Abstract: Most research in speeding up text mining involves algorithmic improvements to induction algorithms, and yet for many large scale applications, such as classifying or indexing large document repositories, the time spent extracting word features from texts can itself greatly exceed the initial trainin ...
Full Report

More...
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Text extraction

Dear All, I am trying to extract text from a file containing cron entries. cat /var/tmp/cron_backups/debmed_tmp < * * * * * /bell > * * * * * /belly what I am trying to do is create two text files containing all entries that begin with < and another text files containing entries with > .... (4 Replies)
Discussion started by: Junaid Subhani
4 Replies

2. Shell Programming and Scripting

sed text extraction between 2 patterns using variables

Hi everyone! I'm writting a function in .bashrc to extract some text from a file. The file looks like this: " random text Begin CG step 1 random text Begin CG step 2 ... Begin CG step 100 random text" For a given number, let's say 70, I want all the text between "Begin CG... (4 Replies)
Discussion started by: radudownload
4 Replies

3. UNIX for Dummies Questions & Answers

fast sequence extraction

Hi everyone, I have a large text file containing DNA sequences in fasta format as follows: >someseq GAACTTGAGATCCGGGGAGCAGTGGATCTC CACCAGCGGCCAGAACTGGTGCACCTCCAG GCCAGCCTCGTCCTGCGTGTC >another seq GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA >seq3450... (4 Replies)
Discussion started by: Fahmida
4 Replies

4. Programming

Fast string removal from large text collection

Hi All, I don't want any codes for this problem. Just suggestions: I have a huge collection of text files (around 300,000) which look like this: 1.fil orange apple dskjdsk computer skjks The entire text collection (referenced above) has about 1 billion words. I have created... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

5. UNIX for Dummies Questions & Answers

String extraction from a text file

The following script code works great for extracting 'postmaster' from a line of text stored in a variable named string: string="PenaltyError:=554 5.7.1 Error, send your mail to postmaster@LOCALDOMAIN" stuff=$( echo $string | cut -d@ -f1 | awk '{ print $NF }' ) echo $stuff However, I need to be... (9 Replies)
Discussion started by: cleanden
9 Replies

6. Shell Programming and Scripting

extraction of perfect text from file.

Hi All, I have a file of the following format. <?xml version='1.0' encoding='utf-8'?> <tomcat-users> <role rolename="tomcat"/> <role rolename="role1"/> <role rolename="manager"/> <role rolename="admin"/> <user username="tomcat" password="tomcat" roles="tomcat"/> <user... (5 Replies)
Discussion started by: nua7
5 Replies
Login or Register to Ask a Question
MicroMason::QuickTemplate(3pm)				User Contributed Perl Documentation			    MicroMason::QuickTemplate(3pm)

NAME
Text::MicroMason::QuickTemplate - Alternate Syntax like Text::QuickTemplate SYNOPSIS
Instead of using this class directly, pass its name to be mixed in: use Text::MicroMason; my $mason = Text::MicroMason::Base->new( -QuickTemplate ); Use the standard compile and execute methods to parse and evalute templates: print $mason->compile( text=>$template )->( @%args ); print $mason->execute( text=>$template, @args ); Or use Text::QuickTemplate's calling conventions: $template = Text::MicroMason->new( -HTMLTemplate, text=>'simple.tmpl' ); print $template->fill( %arguments ); Text::QuickTemplate provides a syntax to embed values into a text template: Good {{timeofday}}, {{name}}! DESCRIPTION
This mixin class overrides several methods to allow MicroMason to emulate the template syntax and some of the other features of Text::QuickTemplate. This class automatically includes the following other mixins: TemplateDir, HasParams, and StoreOne. Compatibility with Text::QuickTemplate This is not a drop-in replacement for Text::QuickTemplate, as the implementation is quite different, but it should be able to process most existing templates without major changes. The following features of EmbPerl syntax are supported: o Curly bracketed tags with parameter names. o Array of parameters hashes. o Special $DONTSET variable. SEE ALSO
The interface being emulated is described in Text::QuickTemplate. For an overview of this templating framework, see Text::MicroMason. This is a mixin class intended for use with Text::MicroMason::Base. For distribution, installation, support, copyright and license information, see Text::MicroMason::Docs::ReadMe. perl v5.10.1 2007-01-29 MicroMason::QuickTemplate(3pm)