Issue when using egrep to extract strings (too many strings)
Dear all,
I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error:
Hello guys,
should be a very easy questn for you:
I need to delete strings in file1 based on the list of strings in file2.
like file2:
word1_word2_
word3_word5_
word3_word4_
word6_word7_
file1:
word1_word2_otherwords..,word3_word5_others... (7 Replies)
Hi,
I want to extract some text between two strings in a line i am using following command i.e;
awk '/-string1/,/-string2/' filename
contents of file is---
line1
line2
aaa -bbb -ccc -string1 c,d,e -string2
line4
but it is showing complete line which is having searched strings.
aaa... (19 Replies)
The question is not as simple as the title... I have a file, it looks like this
<string name="string1">RZ-LED</string>
<string name="string2">2.0</string>
<string name="string2">Version 2.0</string>
<string name="string3">BP</string>
I would like to check for duplicate entries of... (11 Replies)
test.txt:
appleboy
orangeletter
sweetdeal
catracer
conducivelot
I want to only grep out lines that contain "appleboy" AND "sweetdeal". however, the closest thing to this that i can think of is this:
cat test.txt | egrep "appleboy|sweetdeal"
problem is this only searches for all... (9 Replies)
I have the following lines in a log file. It would be great if some one can help me to create a new file with the just entries in the below format.
66.150.161.195 HPSAC=Z05
66.150.161.196 HPSAC=A05
That is just extract the IP address and the string DPSAC=its value
66.150.161.195 -... (1 Reply)
Hi,
Im having some problems with this. I have loaded a file with html code. All code is placed in the same line. I want to get everything between two given strings (including these strings and get only the first appearance).
Example:
File contains <html><body><a href='a.html'>abc</a><a... (5 Replies)
Hi
I have a txt file and I would like to use egrep without using -v option to exclude the lines which matches with multiple Strings.
Let's say I have some text in the txt file. The command should not fetch lines if they have strings something like
CAT MAT DAT
The command should fetch me... (4 Replies)
Hello Everyone ,
Iam a newbie to shell programming and iam reaching out if anyone can help in this :-
I have two files
1) Insert.txt
2) partition_list.txt
insert.txt looks like this :-
insert into emp1 partition (partition_name)
(a1,
b2,
c4,
s6,
d8)
select
a1,
b2,
c4, (2 Replies)
Hello
i am stuck with this.
i have input which is as follows
/type/work /works/OL10627594W 3 2019-04-24T16:46:21.351549 {"created": {"type": "/type/datetime", "value": "2009-12-11T03:18:17.488715"}, "title": "Tog the dog", "covers": , "last_modified": {"type":... (3 Replies)
I am having the following output when executing a dig command :
dig @1.1.1.1 google.com +noall +answer +stats
; <<>> DiG 9.11.4-P1 <<>> @1.1.1.1 google.com +noall +answer +stats
; (1 server found)
;; global options: +cmd obodrm.prod.at.dmdsdp.com. 86154 IN A ... (1 Reply)
Discussion started by: liviusbr
1 Replies
LEARN ABOUT OSF1
strextract
strextract(1) General Commands Manual strextract(1)NAME
strextract - batch string extraction
SYNOPSIS
strextract [-p patternfile] [-i ignorefile] [-d] [source-program...]
OPTIONS
Ignore text strings specified in ignorefile. By default, the strextract command searches for ignorefile in the current working directory,
your home directory, and /usr/lib/nls.
If you omit the -i option, strextract recognizes all strings specified in the patterns file. Use patternfile to match strings in
the input source program. By default, the command searches for the pattern file in the current working directory, your home direc-
tory, and finally /usr/lib/nls.
If you omit the -p option, the strextract command uses a default patterns file that is stored in /usr/lib/nls/patterns. Disables
warnings of duplicate strings. If you omit the -d option, strextract prints warnings of duplicate strings in your source program.
DESCRIPTION
The strextract command extracts text strings from source programs. This command also writes the string it extracts to a message text file.
The message text file contains the text for each message extracted from your input source program. The strextract command names the file by
appending to the name of the input source program.
In the source-program argument, you name one or more source programs from which you want messages extracted. The strextract command does
not extract messages from source programs included using the #include directive. Therefore, you might want a source program and all the
source programs it includes on a single strextract command line.
You can create a patterns file (as specified by patternfile ) to control how the strextract command extracts text. The patterns file is
divided into several sections, each of which is identified by a keyword. The keyword must start at the beginning of a new line, and its
first character must be a dollar sign ($). Following the identifier, you specify a number of patterns. Each pattern begins on a new line
and follows the regular expression syntax you use in the regexp(3) routine. For more information on the patterns file, see the patterns(4)
reference page.
In addition to the patterns file, you can create a file that indicates strings that extract ignores. Each line in this ignore file con-
tains a single string to be ignored that follows the syntax of the regexp(3) routine.
When you invoke the strextract command, it reads the patterns file and the file that contains strings it ignores. You can specify a pat-
terns file and an ignore file on the strextract command line. Otherwise, the strextract command matches all strings and uses the default
patterns file.
If strextract finds strings which match the ERROR directive in the pattern file, it reports the strings to standard error (stderr.) but
does not write the string to the message file.
After running strextract, you can edit the message text file to remove text strings which do not need translating before running strmerge.
It is recommended that you use extract command as a visual front end to the strextract command rather than running strextract directly.
RESTRICTIONS
Given the default pattern file, you cannot cause strextract to ignore strings in comments that are longer than one line.
You can specify only one rewrite string for all classes of pattern matches.
The strextract command does not extract strings from files include with #include directive. You must run the strextract commands on these
files separately.
% strextract -p c_patterns prog.c prog2.c % vi prog.str % strmerge -p c_patterns prog.c prog2.c % gencat prog.cat prog.msg prog2.msg % vi
nl_prog.c % vi nl_prog2.c % cc nl_prog.c nl_prog2.c
In this example, the strextract command uses the c_patterns file to determine which strings to match. The input source programs are named
prog.c and prog2.c.
If you need to remove any of the messages or extract one of the created strings, edit the resulting message file, prog.str. Under no condi-
tions should you add to this file. Doing so could result in unpredictable behavior.
You issue the strmerge command to replace the extracted strings with calls to the message catalog. In response to this command, strmerge,
creates the source message catalogs, prog.msg and prog2.msg, and the output source programs, nl_prog.c and nl_prog2.c.
You must edit nl_prog.c and nl_prog2.c to include the appropriate catopen and catclose function calls.
The gencat command creates a message catalog and the cc command creates an executable program.
SEE ALSO gencat(1), extract(1), strmerge(1), regexp(3), catopen(3), patterns(4)
Writing Software for the International Market
strextract(1)