Sponsored Content
Top Forums Shell Programming and Scripting awk or perl script for preposition splitter Post 302938931 by gimley on Friday 20th of March 2015 06:05:34 AM
Old 03-20-2015
awk or perl script for preposition splitter

Hello,
I am writing a Natural Language Parser and one of the tools I need is to separate prepositional phrase markers which begin with a Preposition. I have a long list of such markers (sample given below)and am looking for a script in awk or perl which will allow me to access a look-up file containing these prepositions and split them.
A sample is given below:
The text below is a tagged text using a Language parser
Code:
[ There_EX could_MD be_VB more_RBR  casualties_NNS in_IN the_DT mishap_NN ,_, ''_null]

The expected output would be
Code:
[ There_EX could_MD be_VB more_RBR  casualties_NNS]
[ in_IN the_DT mishap_NN ,_, ''_null]

The prepositions would necessarily be preceded by
Code:
NN
NNS
NNP
followed by space

as in the example above.
A sample list of the preposition markers is given below:
Code:
to_IN
in_IN
towards_IN
across_IN
for_IN
into_IN
up to _IN

Many thanks in advance for help. A commented code would help even more to enable me to read from a list and insert a new line when the condition is met.

Last edited by zaxxon; 03-20-2015 at 07:13 AM.. Reason: code tag mismatch
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

perl as awk replacement in a script.

Hey all, Im trying to write a script on windows, which Im not too familiar with. Im generally a bash scripting guy but am using perl for this case. My question is... I have this exact output: 2 Dir(s) 6,380,429,312 bytes free and I just need to get the number out... (4 Replies)
Discussion started by: trey85stang
4 Replies

2. Shell Programming and Scripting

Awk script into Perl

Hello, I have not programmed in Perl, but maybe someone can help me or point me to other links. I have searched for and found a solution to my initial problem. I have a text file of data where I want to search for a particular string but return the prior line. I found out here something that... (3 Replies)
Discussion started by: bsp18974
3 Replies

3. Programming

Help with splitter code in JAVA

I was creating a file using splitter and printwriter. The result in the file come out as: TO:bbb,ccc,eee Instead of, TO:bbb TO:ccc TO:eee May I know what's wrong with this? (1 Reply)
Discussion started by: eel
1 Replies

4. Shell Programming and Scripting

awk script in perl

Hi Linux users, I have to convert a shell script in a perl script! The command takes two files (two tables) and compares them to find the same values in 4 columns ($2" "$3" "$8" "$9) and prints out only the common lines. This is the command: cat first_file.txt | while read i; do cat... (2 Replies)
Discussion started by: m_elena
2 Replies

5. Shell Programming and Scripting

Syllable splitter in Perl

Hello, I am a relative newbie and want to split Names in English into syllables. Does anyone know of a perl script which does that. Since my main area is linguistics, I would be happy to add rules to it and post the perl script back for other users. I tried the CPan perl modules but they don't... (6 Replies)
Discussion started by: gimley
6 Replies

6. Shell Programming and Scripting

Help with convert awk script into perl

Input file (a list of input file name with *.txt extension): campus.com_icmp_ping_alive.txt data_local_cd_httpd.txt data_local_cd.txt new_local_cd_mysql.txt new_local_cd_nagios_content.txt Desired output file: data local_cd_httpd data local_cd new local_cd_mysql new ... (9 Replies)
Discussion started by: perl_beginner
9 Replies

7. Shell Programming and Scripting

Text Splitter

Hi, I need to split files based on text: BEGIN DSJOB Identifier "LA" DateModified "2011-10-28" TimeModified "11.10.02" BEGIN DSRECORD Identifier "ROOT" BEGIN DSSUBRECORD Owner "APT" Name "RecordJobPerformanceData" Value "0" ... (16 Replies)
Discussion started by: unme
16 Replies

8. Shell Programming and Scripting

File Splitter output filename

Issue: I am able to split source file in multiple files of 10 rows each but unable to get the required outputfile name. please advise. Details: input = A.txt having 44 rows required output = A_001.txt , A_002.txt and so on. Can below awk be modified to give required result current... (19 Replies)
Discussion started by: santosh2k2
19 Replies

9. Shell Programming and Scripting

Source xml file splitter

I have a source file that contains multiple XML files concatenated in it. The separator string between files is <?xml version="1.0" encoding="utf-8"?>. I wanted to split files in multiple files with mentioned names. I had used a awk code earlier to spilt files in number of lines i.e. awk... (10 Replies)
Discussion started by: santosh2k2
10 Replies

10. Shell Programming and Scripting

File splitter

I have below script which does splitting based on a different criteria. can it be amended to produce required result SrcFileName=XML_DUMP awk '/<\?xml version="1\.0" encoding="utf-8"\?>/{n++} n{f="'"${SrcFileName}_"'" sprintf("%04d",n) ".txt" print >> f close(f)}' $SrcFileName.txt My... (3 Replies)
Discussion started by: santosh2k2
3 Replies
TAP::Parser::Source::Perl(3pm)				 Perl Programmers Reference Guide			    TAP::Parser::Source::Perl(3pm)

NAME
TAP::Parser::Source::Perl - Stream Perl output VERSION
Version 3.17 SYNOPSIS
use TAP::Parser::Source::Perl; my $perl = TAP::Parser::Source::Perl->new; my $stream = $perl->source( [ $filename, @args ] )->get_stream; DESCRIPTION
Takes a filename and hopefully returns a stream from it. The filename should be the name of a Perl program. Note that this is a subclass of TAP::Parser::Source. See that module for more methods. METHODS
Class Methods "new" my $perl = TAP::Parser::Source::Perl->new; Returns a new "TAP::Parser::Source::Perl" object. Instance Methods "source" Getter/setter the name of the test program and any arguments it requires. my ($filename, @args) = @{ $perl->source }; $perl->source( [ $filename, @args ] ); "croak"s if $filename could not be found. "switches" my $switches = $perl->switches; my @switches = $perl->switches; $perl->switches( @switches ); Getter/setter for the additional switches to pass to the perl executable. One common switch would be to set an include directory: $perl->switches( ['-Ilib'] ); "get_stream" my $stream = $source->get_stream($parser); Returns a stream of the output generated by executing "source". Must be passed an object that implements a "make_iterator" method. Typically this is a TAP::Parser instance. "shebang" Get the shebang line for a script file. my $shebang = TAP::Parser::Source::Perl->shebang( $some_script ); May be called as a class method "get_taint" Decode any taint switches from a Perl shebang line. # $taint will be 't' my $taint = TAP::Parser::Source::Perl->get_taint( '#!/usr/bin/perl -t' ); # $untaint will be undefined my $untaint = TAP::Parser::Source::Perl->get_taint( '#!/usr/bin/perl' ); SUBCLASSING
Please see "SUBCLASSING" in TAP::Parser for a subclassing overview. Example package MyPerlSource; use strict; use vars '@ISA'; use Carp qw( croak ); use TAP::Parser::Source::Perl; @ISA = qw( TAP::Parser::Source::Perl ); sub source { my ($self, $args) = @_; if ($args) { $self->{file} = $args->[0]; return $self->SUPER::source($args); } return $self->SUPER::source; } # use the version of perl from the shebang line in the test file sub _get_perl { my $self = shift; if (my $shebang = $self->shebang( $self->{file} )) { $shebang =~ /^#!(.*perl.*?)(?:(?:s)|(?:$))/; return $1 if $1; } return $self->SUPER::_get_perl(@_); } SEE ALSO
TAP::Object, TAP::Parser, TAP::Parser::Source, perl v5.12.5 2012-11-03 TAP::Parser::Source::Perl(3pm)
All times are GMT -4. The time now is 06:38 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy