Sponsored Content
Top Forums Shell Programming and Scripting Extracting words and lines based on keywords Post 302818861 by seemad on Sunday 9th of June 2013 03:54:36 PM
Old 06-09-2013
Extracting words and lines based on keywords

Hello!

I'm trying to process a text file and am stuck at 2 extractions. Hoping someone can help me here:

1. Given a line in a text file and given a keyword, how can I extract the word preceeding the keyword using a shell command/script?

For example: Given a keyword "world" in the line:
"This really beautiful world goes round and round".

How can I extract the word "beautiful"? it can occur either as 2 separate terms as above "Beautiful World" or a conjoined word as "Beautiful-world". I need to extract Beautiful in either of the cases.
[I tried to get the position of the word, "World" and tried to extract the word before that. But it did gave me the entire piece of string from the beginning. However, I only need to extract the word "Beautiful"
]


2. Given a keyword, say, "design", how can I extract the first paragraph under it?
For example, in the text file, I have the below:
Design:
"This is the design for the experiment to be conducted in July. The requirements have already been signed off by the business. Prior to starting the design phase, ensure that the current design document is available for all departments."

para 2

para 3

I need to extract only the paragraph immediately below design. Is there a way I can do this using Shell commands/script?

Thanks so much in advance.

Regards,
seemad
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search and replace words between two keywords

Hi, I have a file which contains the following : select * from test where test_id=1; select id from test1, test2 where test_id=1 and test_id=2; select * from test1, test2, test3 where test_id=4 and test2_id where in (select test2_id from test2); select id1, id2 from test ... (6 Replies)
Discussion started by: vrrajeeb
6 Replies

2. Shell Programming and Scripting

Capture lines based on keywords

Hello everyone, I am trying to write a script that will capture few lines from a text file based on 2 keywords in the first line and 1 keyword in the last one. It could also be based on the first line only + the folllowing 3 lines. Could some one help or give directions. Thanks. (4 Replies)
Discussion started by: nimo
4 Replies

3. Shell Programming and Scripting

Extracting lines in file based on time

Hi, anyone has any ideas on how do we extract lines from a file with format similiar to this: (based on current time) Jun 18 00:16:50 .......... ............. ............ Jun 18 00:17:59 .......... ............. ............ Jun 18 01:17:20 .......... ............. ............ Jun 18... (5 Replies)
Discussion started by: faelric
5 Replies

4. Shell Programming and Scripting

awk : extracting unique lines based on columns

Hi, snp.txt CHR_A SNP_A BP_A_st BP_A_End CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B 5 rs1988728 74904317 74904318 5 74960646 rs1427924 0.377333 0.000740085 0.013930081 5 ... (12 Replies)
Discussion started by: genehunter
12 Replies

5. Shell Programming and Scripting

Extracting specific lines of data from a file and related lines of data based on a grep value range?

Hi, I have one file, say file 1, that has data like below where 19900107 is the date, 19900107 12 144 129 0.7380047 19900108 12 168 129 0.3149017 19900109 12 192 129 3.2766666E-02 ... (3 Replies)
Discussion started by: Wynner
3 Replies

6. Shell Programming and Scripting

Extracting few lines from a file based on identifiers dynamically

i have something like this in a file called mysqldump.sql -- -- Table structure for table `Table11` -- DROP TABLE IF EXISTS `Table11`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `Table11` ( `id` int(11) NOT NULL... (14 Replies)
Discussion started by: vivek d r
14 Replies

7. Shell Programming and Scripting

Extracting lines based on identifiers into multiple files respectively

consider the following is the contents of the file cat 11.sql drop procedure if exists hoop1 ; Delimiter $$ CREATE PROCEDURE hoop1(id int) BEGIN END $$ Delimiter ; . . . . drop procedure if exists hoop2; Delimiter $$ CREATE PROCEDURE hoop2(id int) BEGIN END $$ (8 Replies)
Discussion started by: vivek d r
8 Replies

8. Shell Programming and Scripting

Sorting lines based on keywords for MySQL script

the thing which i require is very very complex.. i tried hard to find the solution but couldnt.. the thing i need to achieve is say i have a file cat delta.sql CREATE VIEW Austin Etc etc . . . CREATE VIEW Barabara AS SELECT blah blah blah FROM Austin z, Cluster s, Instance i WHERE... (4 Replies)
Discussion started by: vivek d r
4 Replies

9. Shell Programming and Scripting

copy range of lines in a file based on keywords from another file

Hi Guys, I have the following problem. I have original file (org.txt) that looks like this module v_1(.....) //arbitrary number of text lines endmodule module v_2(....) //arbitrary number of text lines endmodule module v_3(...) //arbitrary number of text lines endmodule module... (6 Replies)
Discussion started by: kaaliakahn
6 Replies

10. Shell Programming and Scripting

extracting lines based on condition and copy to another file

hi i have an input file that contains some thing like this aaa acc aa abc1 1232 aaa abc2.... poo awq aa abc1 aaa aaa abc2 bbb bcc bb abc1 3214 bbb abc3.... bab bbc bz abc1 3214 bbb abc3.... vvv ssa as abc1 o09 aaa abc4.... azx aaq aa abc1 900 aqq abc19.... aaa aa aaaa abc1 899 aa... (8 Replies)
Discussion started by: anurupa777
8 Replies
Affixes(3pm)						User Contributed Perl Documentation					      Affixes(3pm)

NAME
Text::Affixes - Prefixes and suffixes analisys of text SYNOPSIS
use Text::Affixes; my $text = "Hello, world. Hello, big world."; my $prefixes = get_prefixes($text); # $prefixes now holds # { # 3 => { # 'Hel' => 2, # 'wor' => 2, # } # } # or $prefixes = get_prefixes({min => 1, max => 2},$text); # $prefixes now holds # { # 1 => { # 'H' => 2, # 'w' => 2, # 'b' => 1, # }, # 2 => { # 'He' => 2, # 'wo' => 2, # 'bi' => 1, # } # } # the use for get_suffixes is similar DESCRIPTION
Provides methods for prefixe and suffix analisys of text. METHODS
get_prefixes Extracts prefixes from text. You can specify the minimum and maximum number of characters of prefixes you want. Returns a reference to a hash, where the specified limits are mapped in hashes; each of those hashes maps every prefix in the text into the number of times it was found. By default, both minimum and maximum limits are 3. If the minimum limit is greater than the lower one, an empty hash is returned. A prefix is considered to be a sequence of word characters (w) in the beginning of a word (that is, after a word boundary) that does not reach the end of the word ("regular expressionly", a prefix is the $1 of /(w+)w/). # extracting prefixes of size 3 $prefixes = get_prefixes( $text ); # extracting prefixes of sizes 2 and 3 $prefixes = get_prefixes( {min => 2}, $text ); # extracting prefixes of sizes 3 and 4 $prefixes = get_prefixes( {max => 4}, $text ); # extracting prefixes of sizes 2, 3 and 4 $prefixes = get_prefixes( {min => 2, max=> 4}, $text); get_suffixes The get_suffixes function is similar to the get_prefixes one. You should read the documentation for that one and than come back to this point. A suffix is considered to be a sequence of word characters (w) in the end of a word (that is, before a word boundary) that does not start at the beginning of the word ("regular expressionly" speaking, a prefix is the $1 of /w(w+)/). # extracting suffixes of size 3 $suffixes = get_suffixes( $text ); # extracting suffixes of sizes 2 and 3 $suffixes = get_suffixes( {min => 2}, $text ); # extracting suffixes of sizes 3 and 4 $suffixes = get_suffixes( {max => 4}, $text ); # extracting suffixes of sizes 2, 3 and 4 $suffixes = get_suffixes( {min => 2, max=> 4}, $text); OPTIONS
Apart from deciding on a minimum and maximum size for prefixes or suffixes, you can also decide on some configuration options. exclude_numbers Set to 0 if you consider numbers as part of words. Default value is 1. # this get_suffixes( {min => 1, max => 1, exclude_numbers => 0}, "Hello, but w8" ); # returns this: { 1 => { 'o' => 1, 't' => 1, '8' => 1 } } lowercase Set to 1 to extract all prefixes in lowercase mode. Default value is 0. ATTENTION: This does not mean that prefixes with uppercased characters won't be extracted. It means they will be extracted after being lowercased. # this... get_prefixes( {min => 2, max => 2, lowercase => 1}, "Hello, hello"); # returns this: { 2 => { 'he' => 2 } } TO DO
o Make it more efficient (use C for that) AUTHOR
Jose Castro, "<cog@cpan.org>" COPYRIGHT &; LICENSE Copyright 2004 Jose Castro, All Rights Reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.0 2005-11-19 Affixes(3pm)
All times are GMT -4. The time now is 12:45 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy