extract unique pattern from large text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract unique pattern from large text file
# 1  
Old 07-27-2009
Question extract unique pattern from large text file

Hi All,

I am trying to extract data from a large text file , I want to extract lines which contains a five digit number followed by a hyphen , like

12345- , i tried with egrep ,eg : egrep "[0-9]+[-]" text.txt

but which returns all the lines which contains any number of digits followed by hyhen , eg. 1- , 123- , 12345-

how can I modify this to extract only lines which starts with 5 digits followed by hyphen accurately.

Any suggestions in this regard is highly appreciated..

Thanks
Shiju V.Joseph
# 2  
Old 07-27-2009
try this
Code:
'[0-9]{5}-'

# 3  
Old 07-27-2009
shiju, you can use as John suggested but, you should probably use like this...

'^[0-9]{5}-'

otherwise it would list 5 or more digits before '-' symbol. you might end up extracting data like
123456-
1234567-
...
# 4  
Old 07-27-2009
extract unique pattern from large text file

hi johnbach,

Thanks very much for replying ,
I tried this method ,but am getting
12345-
123456- etc

I am not getting the lines which has exactly 5digits followed by hyphen

Thanks
Shiju



Quote:
Originally Posted by johnbach
try this
Code:
'[0-9]{5}-'



---------- Post updated at 03:52 AM ---------- Previous update was at 03:50 AM ----------

Hi ilan ,

Thank you very much for replying , i tried

egrep '^[0-9]{5}-' text.txt

but now also it is returning
12345-
123456- etc

I am not getting the line which exactly has 5digits followed by a hyphen ,like 12345-

Thanks
Shiju


Quote:
Originally Posted by ilan
shiju, you can use as John suggested but, you should probably use like this...

'^[0-9]{5}-'

otherwise it would list 5 or more digits before '-' symbol. you might end up extracting data like
123456-
1234567-
...
# 5  
Old 07-27-2009
try this dirty solution,

Code:
egrep  '[0-9]{5}-'  file |egrep -v '[0-9]{6}'

# 6  
Old 07-27-2009
Hi John,

I tried that but unfortuantely dint work , returned nothing

egrep '[0-9]{5}-' text.txt gave output
egrep '[0-9]{5}-' text.txt |egrep -v '[0-9]{6}' dint give any ouput

shiju.joseph@linux-kmy7:~/Desktop> egrep '[0-9]{5}-' text.txt
123456-123213sdfsdfsdsdfsdfsd
654331-2342342342342342342342
454545-4353453453453453453453
345345-34534534534534534534534
57645756-32542352345235235234523
4234324157-2314234234234234234
shiju.joseph@linux-kmy7:~/Desktop> egrep '[0-9]{5}-' text.txt |egrep -v '[0-9]{6}'
MEA\shiju.joseph@linux-kmy7:~/Desktop>

Thanks
Shiju
Quote:
Originally Posted by johnbach
try this dirty solution,

Code:
egrep  '[0-9]{5}-'  file |egrep -v '[0-9]{6}'

# 7  
Old 07-27-2009
Things are so simple

grep -vw "[0-9]\{5\}-" filename

Try v and w -vw , it will work
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

2. Shell Programming and Scripting

Extract pattern from text

Hi all, I got a txt here and I need to extract all D 8888 44 and D 8888 43 + next field =",g("en")];f._sn&&(f._sn= "og."+f._sn);for(var n in f)l.push("&"),l.push(g(n)),l.push("="),l.push(g(f));l.push("&emsg=");l.push(g(d.name+":"+d.message));var m=l.join("");Ea(m)&&(m=m.substr(0,2E3));c=m;var... (5 Replies)
Discussion started by: stinkefisch
5 Replies

3. Shell Programming and Scripting

Extract all the sentences from a text file that matches a pattern list

Hi I have a big text file. I want to extract all the sentences that matches at least 70% (seventy percent) of the words from each sentence based on a word list called A. Say the format of the text file is as given below: This is the first sentence which consists of fifteen words... (4 Replies)
Discussion started by: my_Perl
4 Replies

4. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

5. Shell Programming and Scripting

Extract UNIque records from File

Hi, I have a file with 20GB Pipe Delimited file where i have too many duplicate records. I need an awk script to extract the unique records from the file and put it into another file. Kindly help. Thanks, Arun (1 Reply)
Discussion started by: Arun Mishra
1 Replies

6. UNIX for Dummies Questions & Answers

Extract unique combination of rows from text files

Hi Gurus, I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those... (3 Replies)
Discussion started by: Unilearn
3 Replies

7. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Hello all, I have a file with following sample data 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26... (5 Replies)
Discussion started by: simonsimon
5 Replies

8. Shell Programming and Scripting

sed: Find start of pattern and extract text to end of line, including the pattern

This is my first post, please be nice. I have tried to google and read different tutorials. The task at hand is: Input file input.txt (example) abc123defhij-E-1234jslo 456ujs-W-abXjklp From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
Discussion started by: TestTomas
5 Replies

9. Shell Programming and Scripting

Need to extract 7 characters immediately after text '19' from a large file.

Hi All!! I have a large file containing millions of record. My purpose is to extract 7 characters immediately after text '19' from this file (including text '19') and save the result in new file. So, my OUTPUT would be as under : 191234561 194567894 192789005 198839408 and so on..... ... (7 Replies)
Discussion started by: parshant_bvcoe
7 Replies

10. Shell Programming and Scripting

Extract pattern from text line

Hi, the text line looks like this: "test1" " " "test2" "test3" "test4" "10" "test 10 12" "00:05:58" "filename.bin" "3.3MB" "/dir/name" "18459" what's the best way to select any of it? So I can for example get only the time or size and so on. I was trying awk -F""" '{print $N}' but... (3 Replies)
Discussion started by: TehOne
3 Replies
Login or Register to Ask a Question