Visit Our UNIX and Linux User Community


fgrep - printing pattern and filename


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users fgrep - printing pattern and filename
# 1  
Old 06-03-2011
fgrep - printing pattern and filename

Hi,

I have a patternfile with following pattern
cat
dog
cow
pig

Let's say I have thousand files
file0001
file0002
file0003
.
.
.
file1000

Each pattern can occur multiple times in multiple files. How can I search for pattern so the output of pattern and the filename is printed only once. I want the output to look like this

cat file0003 ( cat occurs in file0003 5 times but is printed only once)
cat file0500
cat file0699
dog file0001
dog file1000
pig file0999

and so on.
I used fgrep -f patternfile /directory/file* but that prints multiple lines for pattern occuring in same file.
# 2  
Old 06-03-2011
I didn't try it so this may require some code fix but you get the idea :

Code:
while read a
do
awk -v P="$a" '$0~P{s[FILENAME" in "P]+=1}END{for(i in s) print i"appear in "s[i]" lines"}' file* 
done <patternfile

But this will lead to cartesian product of I/O when scanning files

For better performance, i would go for another way :

First print all filesnames and their content:
Code:
awk '{print FILENAME":"$0}' file* >bigone

and then process the bigone output for the calculation:
Code:
awk -F: 'NR==FNR{P[$0];next}{for(i in P) s[i":"$1]+=gsub(i,i,$0)}END{for(k in s) print k" appears "s[k]" times"}' patternfile bigone

Store Pattern as index of associative array 'P'
Build an associative array 's' indexed with [pattern:FILENAME] storing the sum of match for that pattern
At the end of the scanning, print the result

You can alternately put it all in one :
Code:
awk '{print FILENAME":"$0}' file* | awk -F: 'NR==FNR{P[$0];next}{for(i in P)  s[i":"$1]+=gsub(i,i,$0)}END{for(k in s) print k" appears "s[k]" times"}'  patternfile -

I didn't test the code so maybe it is not perfect but this is for you to get the idea.

....

Code:
$ ls ts*
tst   tst2
$ cat tst
Alpha> lh ru warpA read DL_PM_PA0_C0
Beta> lh ru warpA read DL_PM_PA0_C0
Gamma> lh ru warpA read DL_PM_PA0_C0
Delta> lh ru warpA read DL_PM_PA0_C0
$ lhsh BXP_0_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_0_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_0_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_0_1 warpA read DL_PM_PA0_C0
BXP_0_1: Value 0x01CC9739 (30185273) read from address 0x00000B8F.
BXP_0_1: Value 0x050A2F06 (84553478) read from address 0x00000B8F.
BXP_0_1: Value 0x02563DEF (39206383) read from address 0x00000B8F.
BXP_0_1: Value 0x01CB58B7 (30103735) read from address 0x00000B8F.
$ lhsh BXP_1_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_1_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_1_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_1_1 warpA read DL_PM_PA0_C0
BXP_1_1: Value 0x05033922 (84097314) read from address 0x00000B8F.
BXP_1_1: Value 0x01CCEFB6 (30207926) read from address 0x00000B8F.
BXP_1_1: Value 0x01CED447 (30331975) read from address 0x00000B8F.
BXP_1_1: Value 0x0218E0BA (35184826) read from address 0x00000B8F.
$ lhsh BXP_2_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_2_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_2_1 warpA read DL_PM_PA0_C0
$ lhsh BXP_2_1 warpA read DL_PM_PA0_C0
BXP_2_1: Value 0x0236B631 (37140017) read from address 0x00000B8F.
BXP_2_1: Value 0x01CE0AF3 (30280435) read from address 0x00000B8F.
BXP_2_1: Value 0x050FAD30 (84913456) read from address 0x00000B8F.
BXP_2_1: Value 0x01CCCC5A (30198874) read from address 0x00000B8F.

Code:
$ cat tst2
sos,WINXP,1,2,3,4,5,6,7,,9
sos,WINVISTA,1,2,3,4,5,6,7,,9
sos,MAC,1,2,3,4,5,6,7,,9
sos,LINUX,1,2,3,4,5,6,7,,9
tos,winxp,1,2,3,4,5,6,7,winxp,9
tos,winvista,1,2,3,4,5,6,7,winvista,9
tos,mac,1,2,3,4,5,6,7,mac,9
tos,linux,1,2,3,4,5,6,7,linux,9

f4 is my patternfile for testing
Code:
$ cat f4
0
1

Code:
$ nawk '{print FILENAME":"$0}' ts* | nawk -F: 'NR==FNR{P[$0];next}{for(i in P) s[i":"$1]+=gsub(i,i,$0)}END{for(k in s) print k" appears "s[k]" times"}' f4 -
1:tst appears 49 times
0:tst2 appears 0 times
1:tst2 appears 8 times
0:tst appears 156 times

read the colon ":" as "in"
Example:
1:tst2 appears 8 times
means :
the pattern "1" in the file "tst2" appears 8 times

Last edited by ctsgnb; 06-03-2011 at 08:35 PM..

Previous Thread | Next Thread
Test Your Knowledge in Computers #617
Difficulty: Medium
If you place two string literals side by side, they are automatically concatenated by Python.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

My awk executable is not printing FILENAME...: why?

Hello, there! I am running an executable awk script with 'source.awk input_file' and found that when I need to print FILENAME, or ENVIRON or even FNR nothing happens ... However, if I run it with 'awk -f source.awk input_file', then those variables are printed... What is the reason for that... (1 Reply)
Discussion started by: fbird3
1 Replies

2. Shell Programming and Scripting

My awk executable is not printing FILENAME...: why?

Hello, there! I am running an executable awk script with 'source.awk input_file' and found that when I need to print FILENAME, or ENVIRON or even FNR nothing happens ... However, if I run it with 'awk -f source.awk input_file', then those variables are printed... What is the reason for that... (6 Replies)
Discussion started by: fbird3
6 Replies

3. Shell Programming and Scripting

Issue when printing filename through cygwin using a variable with awk

Hi, If i were to do this an print out the file, it will show as it is in the command $ awk '/Privilege Use/ {P=0} /Object Access/ {P=1} P' AdvancedAudit.txt Object Access File System No Auditing Registry No Auditing Kernel... (1 Reply)
Discussion started by: alvinoo
1 Replies

4. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

5. Shell Programming and Scripting

fgrep not printing non matching lines

I'm using this: fgrep -f file1.txt file2.txt To find lines in file1 that match patterns found in file2. When I add -v: egrep -v -f file1.txt file2.txt It won't return non matching lines, I just get a blank. Can anyone help? PS. file1.txt contains 3 million lines...each string... (2 Replies)
Discussion started by: Nonito84
2 Replies

6. Shell Programming and Scripting

grep for pattern in filename

Hey guys, here is my code: #!/bin/bash filter=('ubb' 'um2' 'uuu' 'uvv' 'uw1' 'uw2' 'uwh') let num=`ls -l | grep 'sk' | wc -l` read -a lines <<< `ls -l | grep 'sk' | awk '{print $8}'` let finum=${#fi} for ((i=1;i<=$num;i++)) do for ((c=4;c<6;c++)) ... (2 Replies)
Discussion started by: jkobori
2 Replies

7. Shell Programming and Scripting

Problem in Using fgrep Command with pattern file option

Hi, i am using fgrep command with following syntax fgrep -v -f pattern_file_name file file contains few line and have the pattern which i am giving in pattern file. My Problem is : its is not giving any output. while i am using fgrep -f pattern_file_name file it is showing all... (4 Replies)
Discussion started by: emresearch
4 Replies

8. Shell Programming and Scripting

Help with printing the calling script filename

Consider the following sample scripts..... filename: f1 # Call f3 f3 filename: f2 # Call f3 f3 filename: f3 # f3 echo "$x called me" (2 Replies)
Discussion started by: frozensmilz
2 Replies

9. UNIX for Dummies Questions & Answers

Pattern matching and Printing Filename

Hi, My requirement is to search for a paritcular string from a group of .gz files and to print the lines containing that string and the name of the files in which that string is present. Daily 500 odd .gz files will be generated in a directory(directory name will be in the form of... (4 Replies)
Discussion started by: krao
4 Replies

10. Shell Programming and Scripting

Does Filename Match Pattern

Hi, I am writing a BASH script. I have a list of files and I would like to make sure that each is of a specific pattern (ie *.L2). If not I would like to remove that file. How do I test whether a filename matches a given pattern? Thanks a lot. Mike (10 Replies)
Discussion started by: msb65
10 Replies

Featured Tech Videos