06-25-2015
Sorry, I have to explain it again.
This is not an assignment at all. This is for my personal interest for processing text with different coding.
I have tried this using C code instead of scripts but I would prefer script because they are comparatively faster and convenient to put in the pipeline. Moreover, unix scripts are convenient for processing text. Bash shell is the one I use.
There is no actual sample of word list or token list.
To clarify, I would say tokens instead of words and sentence as a set of tokens. I use Ubuntu OS.
Please let me know if you need more details.
Thanks in advance.
![Smilie Smilie](https://www.unix.com/images/smilies/smile.gif)
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi All,
I have an input below. I tried to use the awk below but it seems that it ;s not working. Can anybody help ?
My concept here is to find the 2nd field of the last occurrence of such pattern " ** XXX ccc ccc cc cc ccc 2007 " . In this case, the 2nd field is " XXX ". With this "XXX" term... (20 Replies)
Discussion started by: Raynon
20 Replies
2. Programming
Hi ,
i have a text file that contain a story
How do i extract the out all the sentences that contain the word Mon. in C++
I only want to show those sentences that contain the word mon
eg.
Monkey on a tree.
Rabbit jumping around the tree.
I am very rich, I have lots of money.
Today... (1 Reply)
Discussion started by: xiaojesus
1 Replies
3. Shell Programming and Scripting
This is my first post, please be nice. I have tried to google and read different tutorials.
The task at hand is:
Input file input.txt (example)
abc123defhij-E-1234jslo
456ujs-W-abXjklp
From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
Discussion started by: TestTomas
5 Replies
4. Shell Programming and Scripting
Hi All,
I am trying to extract data from a large text file , I want to extract lines which contains a five digit number followed by a hyphen , like
12345- , i tried with egrep ,eg : egrep "+" text.txt
but which returns all the lines which contains any number of digits followed by hyhen ,... (19 Replies)
Discussion started by: shijujoe
19 Replies
5. Shell Programming and Scripting
I have an xml file with IP addresses all over the show. I want to print only the IP addresses and cut off any text before or after the IP address.
Example:
Note: The IP addresses (x.x.x.x) do not consistently appear in the xml file as per the pattern below. Sometimes there are text before... (8 Replies)
Discussion started by: lewk
8 Replies
6. Shell Programming and Scripting
I sat down yesterday to write this script and have just realised that my methodology is broken........
In essense I have.....
----------------------------------------------------------------- (This line really is in the file)
Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies
7. Shell Programming and Scripting
Hi
This is my first post and I'm just a beginner. So please be nice to me.
I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file?
I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies
8. Shell Programming and Scripting
Hi
I have two text files. The first file is TEXTFILEONE.txt as given below:
<Text Text_ID="10155645315851111_10155645333076543" From="460350337461111" Created="2011-03-16T17:05:37+0000" use_count="123">This is the first text</Text>
<Text Text_ID="10155645315851111_10155645317023456"... (7 Replies)
Discussion started by: my_Perl
7 Replies
9. UNIX for Beginners Questions & Answers
Hi,
I have to extract the whole set if a pattern matches.i have a file called input.txt
input.txt
------------
CREATE TABLE ABC
(
A,
B,
C
);
CREATE TABLE XYZ
(
X,
Y,
Z,
P,
Q
); (6 Replies)
Discussion started by: raju2016
6 Replies
10. Shell Programming and Scripting
hi all,
trying this using shell/bash with sed/awk/grep
I have two files, one containing one column, the other containing multiple columns (comma delimited).
file1.txt
abc12345
def12345
ghi54321
...
file2.txt
abc1,text1,texta
abc,text2,textb
def123,text3,textc
gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies
LEARN ABOUT DEBIAN
path::dispatcher::rule::tokens
Path::Dispatcher::Rule::Tokens(3pm) User Contributed Perl Documentation Path::Dispatcher::Rule::Tokens(3pm)
NAME
Path::Dispatcher::Rule::Tokens - predicate is a list of tokens
SYNOPSIS
my $rule = Path::Dispatcher::Rule::Tokens->new(
tokens => [ "comment", "show", qr/^d+$/ ],
delimiter => '/',
block => sub { display_comment(shift->pos(3)) },
);
$rule->match("/comment/show/25");
DESCRIPTION
Rules of this class use a list of tokens to match the path.
ATTRIBUTES
tokens
Each token can be a literal string, a regular expression, or a list of either (which are taken to mean alternations). For example, the
tokens:
[ 'ticket', [ 'show', 'display' ], [ qr/^d+$/, qr/^#w{3}/ ] ]
first matches "ticket". Then, the next token must be "show" or "display". The final token must be a number or a pound sign followed by
three word characters.
The results are the tokens in the original string, as they were matched. If you have three tokens, then "match->pos(1)" will be the
string's first token ("ticket"), "match->pos(2)" its second ("display"), and "match->pos(3)" its third ("#AAA").
Capture groups inside a regex token are completely ignored.
delimiter
A string that is used to tokenize the path. The delimiter must be a string because prefix matches use "join" on unmatched tokens to return
the leftover path. In the future this may be extended to support having a regex delimiter.
The default is a space, but if you're matching URLs you probably want to change this to a slash.
case_sensitive
Decide whether the rule matching is case sensitive. Default is 1, case sensitive matching.
perl v5.12.4 2011-08-30 Path::Dispatcher::Rule::Tokens(3pm)