07-27-2009
extract unique pattern from large text file
Hi All,
I am trying to extract data from a large text file , I want to extract lines which contains a five digit number followed by a hyphen , like
12345- , i tried with egrep ,eg : egrep "[0-9]+[-]" text.txt
but which returns all the lines which contains any number of digits followed by hyhen , eg. 1- , 123- , 12345-
how can I modify this to extract only lines which starts with 5 digits followed by hyphen accurately.
Any suggestions in this regard is highly appreciated..
Thanks
Shiju V.Joseph
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
the text line looks like this:
"test1" " " "test2" "test3" "test4" "10" "test 10 12" "00:05:58" "filename.bin" "3.3MB" "/dir/name" "18459"
what's the best way to select any of it? So I can for example get only the time or size and so on.
I was trying awk -F""" '{print $N}' but... (3 Replies)
Discussion started by: TehOne
3 Replies
2. Shell Programming and Scripting
Hi All!!
I have a large file containing millions of record. My purpose is to extract 7 characters immediately after text '19' from this file (including text '19') and save the result in new file.
So, my OUTPUT would be as under :
191234561
194567894
192789005
198839408
and so on.....
... (7 Replies)
Discussion started by: parshant_bvcoe
7 Replies
3. Shell Programming and Scripting
This is my first post, please be nice. I have tried to google and read different tutorials.
The task at hand is:
Input file input.txt (example)
abc123defhij-E-1234jslo
456ujs-W-abXjklp
From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
Discussion started by: TestTomas
5 Replies
4. UNIX for Dummies Questions & Answers
Hello all,
I have a file with following sample data
2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0.
2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0.
2009-08-26... (5 Replies)
Discussion started by: simonsimon
5 Replies
5. UNIX for Dummies Questions & Answers
Hi Gurus,
I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those... (3 Replies)
Discussion started by: Unilearn
3 Replies
6. Shell Programming and Scripting
Hi,
I have a file with 20GB Pipe Delimited file where i have too many duplicate records.
I need an awk script to extract the unique records from the file and put it into another file.
Kindly help.
Thanks,
Arun (1 Reply)
Discussion started by: Arun Mishra
1 Replies
7. Shell Programming and Scripting
Hi
This is my first post and I'm just a beginner. So please be nice to me.
I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file?
I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies
8. Shell Programming and Scripting
Hi
I have a big text file. I want to extract all the sentences that matches at least 70% (seventy percent) of the words from each sentence based on a word list called A.
Say the format of the text file is as given below:
This is the first sentence which consists of fifteen words... (4 Replies)
Discussion started by: my_Perl
4 Replies
9. Shell Programming and Scripting
Hi all,
I got a txt here and I need to extract all D 8888 44 and D 8888 43 + next field
=",g("en")];f._sn&&(f._sn= "og."+f._sn);for(var n in f)l.push("&"),l.push(g(n)),l.push("="),l.push(g(f));l.push("&emsg=");l.push(g(d.name+":"+d.message));var m=l.join("");Ea(m)&&(m=m.substr(0,2E3));c=m;var... (5 Replies)
Discussion started by: stinkefisch
5 Replies
10. UNIX for Beginners Questions & Answers
Dear Users,
Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file
input file.txt
scaffold1 928 929 C/T +
scaffold1 942 943 G/C +
scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies
LEARN ABOUT CENTOS
zipgrep
ZIPGREP(1L) ZIPGREP(1L)
NAME
zipgrep - search files in a ZIP archive for lines matching a pattern
SYNOPSIS
zipgrep [egrep_options] pattern file[.zip] [file(s) ...] [-x xfile(s) ...]
DESCRIPTION
zipgrep will search files within a ZIP archive for lines matching the given string or pattern. zipgrep is a shell script and requires
egrep(1) and unzip(1L) to function. Its output is identical to that of egrep(1).
ARGUMENTS
pattern
The pattern to be located within a ZIP archive. Any string or regular expression accepted by egrep(1) may be used. file[.zip] Path
of the ZIP archive. (Wildcard expressions for the ZIP archive name are not supported.) If the literal filename is not found, the
suffix .zip is appended. Note that self-extracting ZIP files are supported, as with any other ZIP archive; just specify the .exe
suffix (if any) explicitly.
[file(s)]
An optional list of archive members to be processed, separated by spaces. If no member files are specified, all members of the ZIP
archive are searched. Regular expressions (wildcards) may be used to match multiple members:
* matches a sequence of 0 or more characters
? matches exactly 1 character
[...] matches any single character found inside the brackets; ranges are specified by a beginning character, a hyphen, and an end-
ing character. If an exclamation point or a caret (`!' or `^') follows the left bracket, then the range of characters within
the brackets is complemented (that is, anything except the characters inside the brackets is considered a match).
(Be sure to quote any character that might otherwise be interpreted or modified by the operating system.)
[-x xfile(s)]
An optional list of archive members to be excluded from processing. Since wildcard characters match directory separators (`/'),
this option may be used to exclude any files that are in subdirectories. For example, ``zipgrep grumpy foo *.[ch] -x */*'' would
search for the string ``grumpy'' in all C source files in the main directory of the ``foo'' archive, but none in any subdirectories.
Without the -x option, all C source files in all directories within the zipfile would be searched.
OPTIONS
All options prior to the ZIP archive filename are passed to egrep(1).
SEE ALSO
egrep(1), unzip(1L), zip(1L), funzip(1L), zipcloak(1L), zipinfo(1L), zipnote(1L), zipsplit(1L)
URL
The Info-ZIP home page is currently at
http://www.info-zip.org/pub/infozip/
or
ftp://ftp.info-zip.org/pub/infozip/ .
AUTHORS
zipgrep was written by Jean-loup Gailly.
Info-ZIP 20 April 2009 ZIPGREP(1L)