counting lines that match pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting counting lines that match pattern
# 1  
Old 10-11-2012
counting lines that match pattern

I have a file of 1.3 millions lines.


some are with the same word twice on the line, some line have two diffrent words.
each line has two words, one in brackets.


example:
Code:
foo      (foo)
bar      (bar)
thae    (awvd)
beladf  (vswvw)


I am sure this can be done with one line of awk of sed, but my brain is done for the day.

I know I can do it with shell, but it would run very slow for 1.3 million lines.

Last edited by Scrutinizer; 10-12-2012 at 01:33 AM.. Reason: code tags
# 2  
Old 10-11-2012
You have explained the data, but not explained what your expected output will be.
What pattern? ... for example
# 3  
Old 10-11-2012
sorry, just need a count of lines that have same word match.

for for sample data, output of "2"
# 4  
Old 10-12-2012
TrY:
Code:
grep -c '\(.*\).*(\1)' infile

This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 10-12-2012
@Scrutinizer : I thought \(.*\) would consume everything till the character just before the "(" and the following .* will be left with nothing. But, \(.*\) took exactly the 1st word, leaving the .* to consume spaces.

Please help me in understanding how the .* consumed the spaces?

Guru.
# 6  
Old 10-12-2012
.* will always be greedy and match as much as possible (the whole line) but the parentheses and back-refs (in this case) force the regexp engine to back-track and give up one character of the matched string, at a time, to try if the overall match is possible.

Last edited by elixir_sinari; 10-13-2012 at 12:22 AM..
This User Gave Thanks to elixir_sinari For This Post:
# 7  
Old 10-12-2012
I thought of a case where it would not work correctly. If we have
Code:
foobar   (foo)

Then it would still be counted, so perhaps we would need something like:
Code:
grep -c '^ *\(.*\) .*(\1)' infile

if only spaces are used to separate the fields...

Last edited by Scrutinizer; 10-12-2012 at 03:02 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match Pattern and print pattern and multiple lines into one line

Hello Experts , require help . See below output: File inputs ------------------------------------------ Server Host = mike id rl images allocated last updated density vimages expiration last read <------- STATUS ------->... (4 Replies)
Discussion started by: tigerhills
4 Replies

2. Shell Programming and Scripting

Print lines that do not match the pattern

I need to print the lines that do not match a pattern. I tried using grep -v and sed -n '/pattern/!p', but both of them are not working as I am passing the pattern as variable and it can be null some times. Example ........ abcd...... .........abcd...... .........abcd......... (4 Replies)
Discussion started by: sunny1234
4 Replies

3. UNIX for Dummies Questions & Answers

Join the lines until next pattern match

Hi, I have a data file where data is splitted into multiple lines. And, each valid record starts with a patten date | <?xml and ends with pattern </dmm> e.g. 20120924|<?xml record 1 line1....record 1 line1....record 1 line1.... record 1 line2....record 1 line2....record 1 line2.... record 1... (3 Replies)
Discussion started by: Dipalik
3 Replies

4. Shell Programming and Scripting

Need one liner to search pattern and print everything expect 6 lines from where pattern match made

i need to search for a pattern from a big file and print everything expect the next 6 lines from where the pattern match was made. (8 Replies)
Discussion started by: chidori
8 Replies

5. Shell Programming and Scripting

Print lines before and after pattern match

I am using Solaris, I want to print 3 lines before pattern match pattern 5 lines after pattern match Pattern is abcd to be searched in a.txt. Looking for the solution in sed/awk/perl. Thanks .. Input File a.txt: ================= 1 2 3 abcd 4 5 6 7 8 (7 Replies)
Discussion started by: manuswami
7 Replies

6. UNIX for Dummies Questions & Answers

Awk counting lines with field match

Hi, Im trying to create a script that reads throught every line in a file and then counts how many lines there with a certain field that matches a input, and also ausing another awk it has to do the same as the above but to then use sort anduniq to get rid of all the unique lines with another... (8 Replies)
Discussion started by: fredted40x
8 Replies

7. Shell Programming and Scripting

counting the lines matching a pattern, in between two pattern, and generate a tab

Hi all, I'm looking for some help. I have a file (very long) that is organized like below: >Cluster 0 0 283nt, >01_FRYJ6ZM12HMXZS... at +/99% 1 279nt, >01_FRYJ6ZM12HN12A... at +/99% 2 281nt, >01_FRYJ6ZM12HM4TS... at +/99% 3 283nt, >01_FRYJ6ZM12HM946... at +/99% 4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies

8. Shell Programming and Scripting

Match a pattern and copy above two lines

Dear experts, i want to search pattern tre and copy this line and above two lines in a seperate file:: Thanks for the help SEV="MAJOR": RX-TX HW-FAILURE DOMAIN="alcomc2_BSS_20 unit-type % bts nbr % 24 SBL-type % tre nbr % 4 subnb % 255 BR, Danish (16 Replies)
Discussion started by: Danish Shakil
16 Replies

9. UNIX for Dummies Questions & Answers

How to delete lines do NOT match a pattern

On Unix, it is easy to get those lines that match a pattern, by grep pattern file or those lines that do not, by grep -v pattern file but I am editing a file on Windows with Ultraedit. Ultraedit support regular expression based search and replace. I can delete all the lines that match a... (1 Reply)
Discussion started by: JumboGeng
1 Replies

10. Shell Programming and Scripting

Counting files in a directory that match a pattern

I have 20 files in a direcotry like BARE01_DLY_MKT_YYYYMMDD. The MKT differes for all these files but the remaining syntax remains the same for a particular day. If I am checking for today I need to make sure that there are 20 files that start with BARE01_DLY_MKT_20060720. How can I write a... (31 Replies)
Discussion started by: dsravan
31 Replies
Login or Register to Ask a Question