Sponsored Content
Top Forums Shell Programming and Scripting Extract lines of text based on a specific keyword Post 302344097 by DionDeVille on Friday 14th of August 2009 03:03:54 PM
Old 08-14-2009
Question Extract lines of text based on a specific keyword

I regularly extract lines of text from files based on the presence of a particular keyword; I place the extracted lines into another text file. This takes about 2 hours to complete using the "sort" command then Kate's find & highlight facility.

I've been reading the forum & googling and can find scripts and shell commands which extract a particular string from a file but nothing that extracts a complete line based on a keyword/string within a line.

Here's an example of the lines of data I'm using:

Code:
<li><a href="http://some-website1.com/"><b>CategoryOne: </b>Description defgh</a></li>
<li><a href="http://some-website2.com/"><b>CategoryThree: </b>Description cdefg</a></li>
<li><a href="http://some-website3.com/"><b>CategoryTwo: </b>Description bcdef</a></li>
<li><a href="http://some-website3.com/"><b>CategoryOne: </b>Description abcde</a></li>
<li><a href="http://some-website2.com/"><b>CategoryOne: </b>Description zabcd</a></li>

The data is alway a list item.

I need something that will find the line containing a specified category which will then extract the complete line and move it to a new text file (preferably named after that category). For example:

If I search for "<b >CategoryOne</b >" then I need it to move every line containing "<b >CategoryOne</b >" to text file categoryone.txt

Please help...
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract the lines between specific line number from a text file

Hi I want to extract certain text between two line numbers like 23234234324 and 54446655567567 How do I do this with a simple sed or awk command? Thank you. ---------- Post updated at 06:16 PM ---------- Previous update was at 05:55 PM ---------- found it: sed -n '#1,#2p'... (1 Reply)
Discussion started by: return_user
1 Replies

2. Shell Programming and Scripting

Extract Lines Containg a Keyword

Hi , I have two files, say KEY_FILE and the MAIN_FILE. I am trying to read the KEY_FILE which has only one column and look for this column data in the MAIN_FILE to extract all the rows that have this key. I have written a script to do so, but somehow it is not returning all the rows ( It... (4 Replies)
Discussion started by: Sheel
4 Replies

3. Shell Programming and Scripting

Merge file lines based off of keyword

Hello Everyone, I have two files I created in a format similar to the ones found below (character position is important): File 1: 21 Cat Y N S Y Y N N FOUR LEGS TAIL WHISKERS 30 Dog N N 1 Y Y N N FOUR LEGS TAIL 33 Fish Y N 1 Y Y N N FINS 43 CAR Y N S Y Y N N WHEELS DOORS... (7 Replies)
Discussion started by: jl487
7 Replies

4. UNIX for Dummies Questions & Answers

Extract lines with specific words with addition 2 lines before and after

Dear all, Greetings. I would like to ask for your help to extract lines with specific words in addition 2 lines before and after these lines by using awk or sed. For example, the input file is: 1 ak1 abc1.0 1 ak2 abc1.0 1 ak3 abc1.0 1 ak4 abc1.0 1 ak5 abc1.1 1 ak6 abc1.1 1 ak7... (7 Replies)
Discussion started by: Amanda Low
7 Replies

5. Shell Programming and Scripting

extract lines from text after keyword

I have a text and I want to extract the 4 lines following a keyword! For example if I have this text and the keyword is AAA hello helloo AAA one two three four helloooo hellooo I want the output to be one two three four (7 Replies)
Discussion started by: stekanius
7 Replies

6. Shell Programming and Scripting

ksh sed - Extract specific lines with mulitple occurance of interesting lines

Data file example I look for primary and * to isolate the interesting slot number. slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'` Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk... (2 Replies)
Discussion started by: popeye
2 Replies

7. Shell Programming and Scripting

Print all lines between two keyword if a specific pattern exist

I have input file as below I need to check for a pattern and if it is there in file then I need to print all the lines below BEGIN and END keyword. Could you please help me how to get this in AIX using sed or awk. Input file: ABC ******** BEGIN ***** My name is Amit. I am learning unix.... (8 Replies)
Discussion started by: Amit Joshi
8 Replies

8. Shell Programming and Scripting

Extract specific lines based on another file

I have a folder containing text files. I need to extract specific lines from the files of this folder based on another file input.txt. How can I do this with awk/sed? file1 ARG 81.9 8 81.9 0 LEU 27.1 9 27.1 0 PHE .0 10 .0 0 ASP 59.8 11 59.8 0 ASN 27.6 12 27.6 0 ALA .0 13 .0 0... (5 Replies)
Discussion started by: alanmathew84
5 Replies

9. Shell Programming and Scripting

Append a specific keyword in a text file into a new column

All, I have some sample text file(.csv) in the below format. In my actual file there are at least 100K rows. date 03/25/2016 A,B,C D,E,F date 03/26/2016 1,2,3 4,5,6 date 03/27/2016 6,4,3 4,5,6 I require the following output where in the date appeared at different locations need to... (3 Replies)
Discussion started by: ks_reddy
3 Replies

10. Shell Programming and Scripting

awk join lines based on keyword

Hello , I will need your help once again. I have the following file: cat file02.txt PATTERN XXX.YYY.ZZZ. 500 ROW01 aaa. 300 XS 14 ROW 45 29 AS XD.FD. PATTERN 500 ZZYN002 ROW gdf gsste ALT 267 fhhfe.ddgdg. PATTERN ERE.MAY. 280 PATTERRNTH 5000 rt.rt. ROW SO a 678 PATTERN... (2 Replies)
Discussion started by: alex2005
2 Replies
strextract(1)						      General Commands Manual						     strextract(1)

NAME
strextract - batch string extraction SYNOPSIS
strextract [-p patternfile] [-i ignorefile] [-d] [source-program...] OPTIONS
Ignore text strings specified in ignorefile. By default, the strextract command searches for ignorefile in the current working directory, your home directory, and /usr/lib/nls. If you omit the -i option, strextract recognizes all strings specified in the patterns file. Use patternfile to match strings in the input source program. By default, the command searches for the pattern file in the current working directory, your home direc- tory, and finally /usr/lib/nls. If you omit the -p option, the strextract command uses a default patterns file that is stored in /usr/lib/nls/patterns. Disables warnings of duplicate strings. If you omit the -d option, strextract prints warnings of duplicate strings in your source program. DESCRIPTION
The strextract command extracts text strings from source programs. This command also writes the string it extracts to a message text file. The message text file contains the text for each message extracted from your input source program. The strextract command names the file by appending to the name of the input source program. In the source-program argument, you name one or more source programs from which you want messages extracted. The strextract command does not extract messages from source programs included using the #include directive. Therefore, you might want a source program and all the source programs it includes on a single strextract command line. You can create a patterns file (as specified by patternfile ) to control how the strextract command extracts text. The patterns file is divided into several sections, each of which is identified by a keyword. The keyword must start at the beginning of a new line, and its first character must be a dollar sign ($). Following the identifier, you specify a number of patterns. Each pattern begins on a new line and follows the regular expression syntax you use in the regexp(3) routine. For more information on the patterns file, see the patterns(4) reference page. In addition to the patterns file, you can create a file that indicates strings that extract ignores. Each line in this ignore file con- tains a single string to be ignored that follows the syntax of the regexp(3) routine. When you invoke the strextract command, it reads the patterns file and the file that contains strings it ignores. You can specify a pat- terns file and an ignore file on the strextract command line. Otherwise, the strextract command matches all strings and uses the default patterns file. If strextract finds strings which match the ERROR directive in the pattern file, it reports the strings to standard error (stderr.) but does not write the string to the message file. After running strextract, you can edit the message text file to remove text strings which do not need translating before running strmerge. It is recommended that you use extract command as a visual front end to the strextract command rather than running strextract directly. RESTRICTIONS
Given the default pattern file, you cannot cause strextract to ignore strings in comments that are longer than one line. You can specify only one rewrite string for all classes of pattern matches. The strextract command does not extract strings from files include with #include directive. You must run the strextract commands on these files separately. % strextract -p c_patterns prog.c prog2.c % vi prog.str % strmerge -p c_patterns prog.c prog2.c % gencat prog.cat prog.msg prog2.msg % vi nl_prog.c % vi nl_prog2.c % cc nl_prog.c nl_prog2.c In this example, the strextract command uses the c_patterns file to determine which strings to match. The input source programs are named prog.c and prog2.c. If you need to remove any of the messages or extract one of the created strings, edit the resulting message file, prog.str. Under no condi- tions should you add to this file. Doing so could result in unpredictable behavior. You issue the strmerge command to replace the extracted strings with calls to the message catalog. In response to this command, strmerge, creates the source message catalogs, prog.msg and prog2.msg, and the output source programs, nl_prog.c and nl_prog2.c. You must edit nl_prog.c and nl_prog2.c to include the appropriate catopen and catclose function calls. The gencat command creates a message catalog and the cc command creates an executable program. SEE ALSO
gencat(1), extract(1), strmerge(1), regexp(3), catopen(3), patterns(4) Writing Software for the International Market strextract(1)
All times are GMT -4. The time now is 04:24 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy