06-09-2013
Extracting words and lines based on keywords
Hello!
I'm trying to process a text file and am stuck at 2 extractions. Hoping someone can help me here:
1. Given a line in a text file and given a keyword, how can I extract the word preceeding the keyword using a shell command/script?
For example: Given a keyword "world" in the line:
"This really beautiful world goes round and round".
How can I extract the word "beautiful"? it can occur either as 2 separate terms as above "Beautiful World" or a conjoined word as "Beautiful-world". I need to extract Beautiful in either of the cases.
[I tried to get the position of the word, "World" and tried to extract the word before that. But it did gave me the entire piece of string from the beginning. However, I only need to extract the word "Beautiful"
]
2. Given a keyword, say, "design", how can I extract the first paragraph under it?
For example, in the text file, I have the below:
Design:
"This is the design for the experiment to be conducted in July. The requirements have already been signed off by the business. Prior to starting the design phase, ensure that the current design document is available for all departments."
para 2
para 3
I need to extract only the paragraph immediately below design. Is there a way I can do this using Shell commands/script?
Thanks so much in advance.
Regards,
seemad
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I have a file which contains the following :
select * from test where test_id=1;
select id
from test1, test2 where test_id=1 and test_id=2;
select * from
test1, test2, test3 where test_id=4 and test2_id where in (select test2_id from test2);
select
id1, id2 from test ... (6 Replies)
Discussion started by: vrrajeeb
6 Replies
2. Shell Programming and Scripting
Hello everyone,
I am trying to write a script that will capture few lines from a text file based on 2 keywords in the first line and 1 keyword in the last one. It could also be based on the first line only + the folllowing 3 lines.
Could some one help or give directions. Thanks. (4 Replies)
Discussion started by: nimo
4 Replies
3. Shell Programming and Scripting
Hi,
anyone has any ideas on how do we extract lines from a file with format similiar to this: (based on current time)
Jun 18 00:16:50 .......... ............. ............
Jun 18 00:17:59 .......... ............. ............
Jun 18 01:17:20 .......... ............. ............
Jun 18... (5 Replies)
Discussion started by: faelric
5 Replies
4. Shell Programming and Scripting
Hi,
snp.txt
CHR_A SNP_A BP_A_st BP_A_End CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B
5 rs1988728 74904317 74904318 5 74960646 rs1427924 0.377333 0.000740085 0.013930081
5 ... (12 Replies)
Discussion started by: genehunter
12 Replies
5. Shell Programming and Scripting
Hi,
I have one file, say file 1, that has data like below where 19900107 is the date,
19900107 12 144 129 0.7380047
19900108 12 168 129 0.3149017
19900109 12 192 129 3.2766666E-02
... (3 Replies)
Discussion started by: Wynner
3 Replies
6. Shell Programming and Scripting
i have something like this in a file called mysqldump.sql
--
-- Table structure for table `Table11`
--
DROP TABLE IF EXISTS `Table11`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `Table11` (
`id` int(11) NOT NULL... (14 Replies)
Discussion started by: vivek d r
14 Replies
7. Shell Programming and Scripting
consider the following is the contents of the file
cat 11.sql
drop procedure if exists hoop1 ;
Delimiter $$
CREATE PROCEDURE hoop1(id int)
BEGIN
END
$$
Delimiter ;
.
.
.
.
drop procedure if exists hoop2;
Delimiter $$
CREATE PROCEDURE hoop2(id int)
BEGIN
END
$$ (8 Replies)
Discussion started by: vivek d r
8 Replies
8. Shell Programming and Scripting
the thing which i require is very very complex.. i tried hard to find the solution but couldnt..
the thing i need to achieve is say i have a file
cat delta.sql
CREATE VIEW Austin
Etc etc
.
.
.
CREATE VIEW Barabara
AS
SELECT blah blah blah
FROM Austin z, Cluster s, Instance i
WHERE... (4 Replies)
Discussion started by: vivek d r
4 Replies
9. Shell Programming and Scripting
Hi Guys,
I have the following problem. I have original file (org.txt) that looks like this
module v_1(.....)
//arbitrary number of text lines
endmodule
module v_2(....)
//arbitrary number of text lines
endmodule
module v_3(...)
//arbitrary number of text lines
endmodule
module... (6 Replies)
Discussion started by: kaaliakahn
6 Replies
10. Shell Programming and Scripting
hi
i have an input file that contains some thing like this
aaa acc aa abc1 1232 aaa abc2....
poo awq aa abc1 aaa aaa abc2
bbb bcc bb abc1 3214 bbb abc3....
bab bbc bz abc1 3214 bbb abc3....
vvv ssa as abc1 o09 aaa abc4....
azx aaq aa abc1 900 aqq abc19....
aaa aa aaaa abc1 899 aa... (8 Replies)
Discussion started by: anurupa777
8 Replies
EXTRACT(1) General Commands Manual EXTRACT(1)
NAME
extract - determine meta-information about a file
SYNOPSIS
extract [ -bghLnvV ] [ -H hash-algorithm ] [ -i ] [ -l library ] [ -p type ] [ -x type ] file ...
DESCRIPTION
This manual page documents version 0.6.0 of the extract command.
extract tests each file specified in the argument list in an attempt to infer meta-information from it. Each file is subjected to the
meta-data extraction libraries from libextractor.
libextractor classifies meta-information (also referred to as keywords) into types. A list of all types can be obtained with the -L option.
OPTIONS
-b Display the output in BiBTeX format.
-g Use grep-friendly output (all keywords on a single line for each file). Use the verbose option to print the filename first, fol-
lowed by the keywords. Use the verbose option twice to also display the keyword types. This option will not print keyword types
or non-textual metadata.
-h Print a brief summary of the options.
-i Run plugins in-process (for debugging). By default, each plugin is run in its own process.
-l libraries
Use the specified libraries to extract keywords. The general format of libraries is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*] where
LIBRARYNAME is a libextractor compatible library and typically of the form .Ijpeg. The minus before the libraryname indicates that
this library should be removed from the existing list. To run only a few selected plugins, use -l in combination with -n.
-L Print a list of all known keyword types.
-n Do not use the default set of extractors (typically all standard extractors, currently mp3, ogg, jpg, gif, png, tiff, real, html,
pdf and mime-types), use only the extractors specified with the .B -l option.
-p type
Print only the keywords matching the specified type. By default, all keywords that are found and not removed as duplicates are
printed.
-v Print the version number and exit.
-V Be verbose. This option can be specified multiple times to increase verbosity further.
-x type
Exclude keywords of the specified type from the output. By default, all keywords that are found and not removed as duplicates are
printed.
SEE ALSO
libextractor(3) - description of the libextractor library
EXAMPLES
$ extract test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
mimetype - image/jpeg
$ extract -V -x comment test/test.jpg
Keywords for file test/test.jpg:
mimetype - image/jpeg
$ extract -p comment test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
$ extract -nV -l png.so -p comment test/test.jpg test/test.png
Keywords for file test/test.jpg:
Keywords for file test/test.png:
comment - Testing keyword extraction
LEGAL NOTICE
libextractor and the extract tool are released under the GPL. libextractor is a GNU package.
BUGS
A couple of file-formats (on the order of 10^3) are not recognized...
AUTHORS
extract was originally written by Christian Grothoff <christian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use <libextrac-
tor@gnu.org> to contact the current maintainer(s).
AVAILABILITY
You can obtain the original author's latest version from http://www.gnu.org/software/libextractor/
libextractor 0.6.0 Dec 20, 2009 EXTRACT(1)