Sponsored Content
Top Forums Shell Programming and Scripting Extract lines of text based on a specific keyword Post 302344097 by DionDeVille on Friday 14th of August 2009 03:03:54 PM
Old 08-14-2009
Question Extract lines of text based on a specific keyword

I regularly extract lines of text from files based on the presence of a particular keyword; I place the extracted lines into another text file. This takes about 2 hours to complete using the "sort" command then Kate's find & highlight facility.

I've been reading the forum & googling and can find scripts and shell commands which extract a particular string from a file but nothing that extracts a complete line based on a keyword/string within a line.

Here's an example of the lines of data I'm using:

Code:
<li><a href="http://some-website1.com/"><b>CategoryOne: </b>Description defgh</a></li>
<li><a href="http://some-website2.com/"><b>CategoryThree: </b>Description cdefg</a></li>
<li><a href="http://some-website3.com/"><b>CategoryTwo: </b>Description bcdef</a></li>
<li><a href="http://some-website3.com/"><b>CategoryOne: </b>Description abcde</a></li>
<li><a href="http://some-website2.com/"><b>CategoryOne: </b>Description zabcd</a></li>

The data is alway a list item.

I need something that will find the line containing a specified category which will then extract the complete line and move it to a new text file (preferably named after that category). For example:

If I search for "<b >CategoryOne</b >" then I need it to move every line containing "<b >CategoryOne</b >" to text file categoryone.txt

Please help...
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract the lines between specific line number from a text file

Hi I want to extract certain text between two line numbers like 23234234324 and 54446655567567 How do I do this with a simple sed or awk command? Thank you. ---------- Post updated at 06:16 PM ---------- Previous update was at 05:55 PM ---------- found it: sed -n '#1,#2p'... (1 Reply)
Discussion started by: return_user
1 Replies

2. Shell Programming and Scripting

Extract Lines Containg a Keyword

Hi , I have two files, say KEY_FILE and the MAIN_FILE. I am trying to read the KEY_FILE which has only one column and look for this column data in the MAIN_FILE to extract all the rows that have this key. I have written a script to do so, but somehow it is not returning all the rows ( It... (4 Replies)
Discussion started by: Sheel
4 Replies

3. Shell Programming and Scripting

Merge file lines based off of keyword

Hello Everyone, I have two files I created in a format similar to the ones found below (character position is important): File 1: 21 Cat Y N S Y Y N N FOUR LEGS TAIL WHISKERS 30 Dog N N 1 Y Y N N FOUR LEGS TAIL 33 Fish Y N 1 Y Y N N FINS 43 CAR Y N S Y Y N N WHEELS DOORS... (7 Replies)
Discussion started by: jl487
7 Replies

4. UNIX for Dummies Questions & Answers

Extract lines with specific words with addition 2 lines before and after

Dear all, Greetings. I would like to ask for your help to extract lines with specific words in addition 2 lines before and after these lines by using awk or sed. For example, the input file is: 1 ak1 abc1.0 1 ak2 abc1.0 1 ak3 abc1.0 1 ak4 abc1.0 1 ak5 abc1.1 1 ak6 abc1.1 1 ak7... (7 Replies)
Discussion started by: Amanda Low
7 Replies

5. Shell Programming and Scripting

extract lines from text after keyword

I have a text and I want to extract the 4 lines following a keyword! For example if I have this text and the keyword is AAA hello helloo AAA one two three four helloooo hellooo I want the output to be one two three four (7 Replies)
Discussion started by: stekanius
7 Replies

6. Shell Programming and Scripting

ksh sed - Extract specific lines with mulitple occurance of interesting lines

Data file example I look for primary and * to isolate the interesting slot number. slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'` Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk... (2 Replies)
Discussion started by: popeye
2 Replies

7. Shell Programming and Scripting

Print all lines between two keyword if a specific pattern exist

I have input file as below I need to check for a pattern and if it is there in file then I need to print all the lines below BEGIN and END keyword. Could you please help me how to get this in AIX using sed or awk. Input file: ABC ******** BEGIN ***** My name is Amit. I am learning unix.... (8 Replies)
Discussion started by: Amit Joshi
8 Replies

8. Shell Programming and Scripting

Extract specific lines based on another file

I have a folder containing text files. I need to extract specific lines from the files of this folder based on another file input.txt. How can I do this with awk/sed? file1 ARG 81.9 8 81.9 0 LEU 27.1 9 27.1 0 PHE .0 10 .0 0 ASP 59.8 11 59.8 0 ASN 27.6 12 27.6 0 ALA .0 13 .0 0... (5 Replies)
Discussion started by: alanmathew84
5 Replies

9. Shell Programming and Scripting

Append a specific keyword in a text file into a new column

All, I have some sample text file(.csv) in the below format. In my actual file there are at least 100K rows. date 03/25/2016 A,B,C D,E,F date 03/26/2016 1,2,3 4,5,6 date 03/27/2016 6,4,3 4,5,6 I require the following output where in the date appeared at different locations need to... (3 Replies)
Discussion started by: ks_reddy
3 Replies

10. Shell Programming and Scripting

awk join lines based on keyword

Hello , I will need your help once again. I have the following file: cat file02.txt PATTERN XXX.YYY.ZZZ. 500 ROW01 aaa. 300 XS 14 ROW 45 29 AS XD.FD. PATTERN 500 ZZYN002 ROW gdf gsste ALT 267 fhhfe.ddgdg. PATTERN ERE.MAY. 280 PATTERRNTH 5000 rt.rt. ROW SO a 678 PATTERN... (2 Replies)
Discussion started by: alex2005
2 Replies
extract(1int)															     extract(1int)

Name
       extract - interactive string extract and replace

Syntax
       extract [ -i ignorefile ] [ -m prefix ] [ -n ] [ -p patternfile ] [ -s string ]
       [ -u ] source-program...

Description
       The command interactively extracts text strings from source programs.  The command replaces the strings it extracts with calls to the func-
       tion.	The command also writes the string it extracts to a source message catalog.  You use this command to replace  hard-coded  messages
       in your program source file with calls to the command and create a source message catalog.  At run time, the program reads the message text
       from the message catalog.  By storing messages in a message catalog, instead of in your program, you allow  the	text  of  messages  to	be
       translated to a new language or modified without the source program being changed.

       In  the source-program argument, you name one or more source programs from which you want messages extracted.  The command does not extract
       messages from source programs included using the directive.  Therefore, you might want to name a source program and all the source programs
       it includes on a single command line.

       You  can create a patterns file (as specified by (patternfile) to control how the command extracts and replaces text.  The patterns file is
       divided into several sections, each of which is identified by a keyword.  The keyword must start at the beginning of a new  line,  and  its
       first  character must be a dollar sign ($).  Following the identifier, you specify a number of patterns.  Each pattern begins on a new line
       and follows the regular expression syntax you use in the routine. For more information on the patterns file, see the reference page.

       In addition to the patterns file, you can create a file that indicates strings that ignores.  Each line in this ignore file contains a sin-
       gle string to be ignored that follows the syntax of the routine.

       When you invoke the command, it reads the patterns file and the file that contains strings it ignores.  You can specify a patterns file and
       an ignore file on the command line.  Otherwise, the command matches all strings and uses a default patterns file.

       When you run it displays three windows on your terminal.  The first window contains the program source code.  The  string  that	matches  a
       string in the patterns file is displayed in reverse video.

       The second window displays the contents of the source message catalog that the command is creating.

       The  third  window contains a list of the commands that are available.  The command displays the current command in reverse video.  You can
       execute the current command by pressing the RETURN key.	Select another command by typing the first letter in the command name and pressing
       the RETURN key.	The command is not sensitive to the case of letters, so you can use uppercase or lowercase letters to issue commands.

       You can use the following commands to control how treats the string displayed in the first window:

       EXTRACT	      Extract the string into the catalog file and rewrite the source using the rewrite string in the patterns file.

       DUPLICATE      If the string has been encountered previously, rewrite the source program using the same message number as before.  The com-
		      mand need not add the message to the source message catalog again, so this command saves space in catalogs.

       IGNORE	      Ignore this and all subsequent occurrences of this string during this interactive session.  This command does  not  add  the
		      string to the ignore file.

       PASS	      Pass by (ignore) this occurrence of this particular string.

       ADD	      Ignore  this  and  all  subsequent occurrences of this string during this interactive session.  Add the string to the ignore
		      file.

       COMMENT	      Add the comment you enter to the source message catalog.	The command prompts you to be sure the comment you entered is cor-
		      rect.  You answer the prompt by typing ``y,'' n, or q, without pressing the RETURN key.

       QUIT	      Quit  from  the  interactive  session.   The  command prompts you to be sure you want to quit.  Answer ``y'' or ``n'' to the
		      prompt, without pressing the return key.

		      The output files that creates up to this point are not removed by this command.  However, the files contain only the  result
		      of the string extractions that occurred before you issued the QUIT command.

       HELP	      Display a description of all the commands.

       The  command  creates  to  files  in your current working directory.  The command creates a new version of the source program that contains
       calls to the function, instead of hard-coded messages.  The new version of the source program has the same name as the  input  source  pro-
       gram, with the prefix ``nl_''.  For example, if the input source program is named the output source program is named

       In  addition  to  a new source program, the command creates a source message catalog. The source message catalog contains the text for each
       message extracted from your input source program.  The command names the file by appending ``.msf'' to the name of the  input  source  pro-
       gram.   For example, the source message catalog for the source program is named You can use the source message catalog as input to the com-
       mand.

Options
       -i     Ignore text strings specified in ignorefile .  By default, the command searches for ignorefile in  the  current  working	directory,
	      your home directory, and

	      If you omit the option, recognizes all strings specified in the patterns file.

       -m   Add  prefix  to  message  numbers in the output source program and source message catalog. You can use this prefix as a mnemonic.  You
	    must process source message catalogs that contain message number prefixes using the option.

       -n   Create a new source message catalog for each input source program. By default, if you specify more than one input  source  program	on
	    the command line, the command creates one source message catalog for all the input source programs.

       -p   Use  patternfile  to  match strings in the input source program.  By default, the command searches for the pattern file in the current
	    directory, your home directory and finally

	    If you omit the option, the command uses a default patterns file that is stored in

       -s   Write string at the top of the source message catalog.  If you omit the option, uses the string specified in the section of  the  pat-
	    terns file.

       -u   Use  a file produced by a previous run of This file contains details of all the strings which matched the pattern file along with file
	    offsets and line numbers.  By default is run and its output is used to drive

Restrictions
       Given the current syntax of the patterns file, you cannot cause to ignore strings in comments that are longer than one line.

       You can specify only one rewrite string for all classes of pattern matches.

       The command does not extract strings from files you include with the directive.	You must run the commands on these files separately.

       Your terminal screen must contain at least 80 columns and 24 lines for to display its three windows.

       The command does not recognize strings that extend beyond one line.

Examples
       The following example shows the commands you issue to run the command, create a message catalog from the source message catalog,  and  com-
       pile the output source program:
       % extract -i newignore -p c_patterns remove.c
       % gencat remove.cat remove.msf
       % vi nl_remove.c
       % cc nl_remove.c

       In  this example, the command uses the file to determine which strings to ignore.  The command uses the file to determines which strings to
       match.  The input source program is named

       In response to this command, creates the source message catalog and the output source program

       You must edit to include the appropriate and function calls.

       The command creates a message catalog and the command creates an executable program.

See Also
       intro(3int), gencat(1int), strextract(1int), strmerge(1int), regex(3), catopen(3int), catgets(3int), patterns(5int)
       Guide to Developing International Software

																     extract(1int)
All times are GMT -4. The time now is 06:25 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy