Sponsored Content
Top Forums Shell Programming and Scripting awk or perl script for preposition splitter Post 302938931 by gimley on Friday 20th of March 2015 06:05:34 AM
Old 03-20-2015
awk or perl script for preposition splitter

Hello,
I am writing a Natural Language Parser and one of the tools I need is to separate prepositional phrase markers which begin with a Preposition. I have a long list of such markers (sample given below)and am looking for a script in awk or perl which will allow me to access a look-up file containing these prepositions and split them.
A sample is given below:
The text below is a tagged text using a Language parser
Code:
[ There_EX could_MD be_VB more_RBR  casualties_NNS in_IN the_DT mishap_NN ,_, ''_null]

The expected output would be
Code:
[ There_EX could_MD be_VB more_RBR  casualties_NNS]
[ in_IN the_DT mishap_NN ,_, ''_null]

The prepositions would necessarily be preceded by
Code:
NN
NNS
NNP
followed by space

as in the example above.
A sample list of the preposition markers is given below:
Code:
to_IN
in_IN
towards_IN
across_IN
for_IN
into_IN
up to _IN

Many thanks in advance for help. A commented code would help even more to enable me to read from a list and insert a new line when the condition is met.

Last edited by zaxxon; 03-20-2015 at 07:13 AM.. Reason: code tag mismatch
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

perl as awk replacement in a script.

Hey all, Im trying to write a script on windows, which Im not too familiar with. Im generally a bash scripting guy but am using perl for this case. My question is... I have this exact output: 2 Dir(s) 6,380,429,312 bytes free and I just need to get the number out... (4 Replies)
Discussion started by: trey85stang
4 Replies

2. Shell Programming and Scripting

Awk script into Perl

Hello, I have not programmed in Perl, but maybe someone can help me or point me to other links. I have searched for and found a solution to my initial problem. I have a text file of data where I want to search for a particular string but return the prior line. I found out here something that... (3 Replies)
Discussion started by: bsp18974
3 Replies

3. Programming

Help with splitter code in JAVA

I was creating a file using splitter and printwriter. The result in the file come out as: TO:bbb,ccc,eee Instead of, TO:bbb TO:ccc TO:eee May I know what's wrong with this? (1 Reply)
Discussion started by: eel
1 Replies

4. Shell Programming and Scripting

awk script in perl

Hi Linux users, I have to convert a shell script in a perl script! The command takes two files (two tables) and compares them to find the same values in 4 columns ($2" "$3" "$8" "$9) and prints out only the common lines. This is the command: cat first_file.txt | while read i; do cat... (2 Replies)
Discussion started by: m_elena
2 Replies

5. Shell Programming and Scripting

Syllable splitter in Perl

Hello, I am a relative newbie and want to split Names in English into syllables. Does anyone know of a perl script which does that. Since my main area is linguistics, I would be happy to add rules to it and post the perl script back for other users. I tried the CPan perl modules but they don't... (6 Replies)
Discussion started by: gimley
6 Replies

6. Shell Programming and Scripting

Help with convert awk script into perl

Input file (a list of input file name with *.txt extension): campus.com_icmp_ping_alive.txt data_local_cd_httpd.txt data_local_cd.txt new_local_cd_mysql.txt new_local_cd_nagios_content.txt Desired output file: data local_cd_httpd data local_cd new local_cd_mysql new ... (9 Replies)
Discussion started by: perl_beginner
9 Replies

7. Shell Programming and Scripting

Text Splitter

Hi, I need to split files based on text: BEGIN DSJOB Identifier "LA" DateModified "2011-10-28" TimeModified "11.10.02" BEGIN DSRECORD Identifier "ROOT" BEGIN DSSUBRECORD Owner "APT" Name "RecordJobPerformanceData" Value "0" ... (16 Replies)
Discussion started by: unme
16 Replies

8. Shell Programming and Scripting

File Splitter output filename

Issue: I am able to split source file in multiple files of 10 rows each but unable to get the required outputfile name. please advise. Details: input = A.txt having 44 rows required output = A_001.txt , A_002.txt and so on. Can below awk be modified to give required result current... (19 Replies)
Discussion started by: santosh2k2
19 Replies

9. Shell Programming and Scripting

Source xml file splitter

I have a source file that contains multiple XML files concatenated in it. The separator string between files is <?xml version="1.0" encoding="utf-8"?>. I wanted to split files in multiple files with mentioned names. I had used a awk code earlier to spilt files in number of lines i.e. awk... (10 Replies)
Discussion started by: santosh2k2
10 Replies

10. Shell Programming and Scripting

File splitter

I have below script which does splitting based on a different criteria. can it be amended to produce required result SrcFileName=XML_DUMP awk '/<\?xml version="1\.0" encoding="utf-8"\?>/{n++} n{f="'"${SrcFileName}_"'" sprintf("%04d",n) ".txt" print >> f close(f)}' $SrcFileName.txt My... (3 Replies)
Discussion started by: santosh2k2
3 Replies
Locale::Codes::Language(3pm)				 Perl Programmers Reference Guide			      Locale::Codes::Language(3pm)

NAME
Locale::Codes::Language - standard codes for language identification SYNOPSIS
use Locale::Codes::Language; $lang = code2language('en'); # $lang gets 'English' $code = language2code('French'); # $code gets 'fr' @codes = all_language_codes(); @names = all_language_names(); DESCRIPTION
The "Locale::Codes::Language" module provides access to standard codes used for identifying languages, such as those as defined in ISO 639. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 639 two- letter codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying languages. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lang = code2language('en','alpha-2'); $lang = code2language('en',LOCALE_CODE_ALPHA_2); The codesets currently supported are: alpha-2, LOCALE_LANG_ALPHA_2 This is the set of two-letter (lowercase) codes from ISO 639-1, such as 'he' for Hebrew. It also includes additions to this set included in the IANA language registry. This is the default code set. alpha-3, LOCALE_LANG_ALPHA_3 This is the set of three-letter (lowercase) bibliographic codes from ISO 639-2 and 639-5, such as 'heb' for Hebrew. It also includes additions to this set included in the IANA language registry. term, LOCALE_LANG_TERM This is the set of three-letter (lowercase) terminologic codes from ISO 639. ROUTINES
code2language ( CODE [,CODESET] ) language2code ( NAME [,CODESET] ) language_code2code ( CODE ,CODESET ,CODESET2 ) all_language_codes ( [CODESET] ) all_language_names ( [CODESET] ) Locale::Codes::Language::rename_language ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::Language::add_language ( CODE ,NAME [,CODESET] ) Locale::Codes::Language::delete_language ( CODE [,CODESET] ) Locale::Codes::Language::add_language_alias ( NAME ,NEW_NAME ) Locale::Codes::Language::delete_language_alias ( NAME ) Locale::Codes::Language::rename_language_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::Language::add_language_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::Language::delete_language_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.loc.gov/standards/iso639-2/ Source of the ISO 639-2 codes. http://www.loc.gov/standards/iso639-5/ Source of the ISO 639-5 codes. http://www.iana.org/assignments/language-subtag-registry The IANA language subtag registry. AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 1997-2001 Canon Research Centre Europe (CRE). Copyright (c) 2001-2010 Neil Bowers Copyright (c) 2010-2012 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.16.2 2012-10-11 Locale::Codes::Language(3pm)
All times are GMT -4. The time now is 05:26 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy