Sponsored Content
Top Forums Shell Programming and Scripting Count number of pattern matches per line for all files in directory Post 302898771 by Don Cragun on Wednesday 23rd of April 2014 09:46:58 PM
Old 04-23-2014
Assuming that I am correct in believing that the desired bonus output you provided:
Code:
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   1   1
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   3   4   2
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   5   2   2
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   2   1
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   1   1

should have been:
Code:
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   1   1
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   3   4   2
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   5   2   2
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   2   1
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   3   1   1

and with the sets of three spaces changed to tabs, the following script (using awk instead of perl) seems to also do what you want:
Code:
#!/bin/ksh
awk '
{	nm = nc = ncM = 0
	for(i = 1; i <= NF; i++)
		if(match($i, /comp[0-9]/)) {
			nm++
			if(++nc > ncM)
				ncM = nc
		} else	nc = 0
	if(nm)	printf("%s\t%d\t%d\t%d\n", FILENAME, FNR, nm, ncM)
}' $(cat IDs)

producing the output:
Code:
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	1	1	1
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	3	4	2
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	5	2	2
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	1	2	1
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	3	1	1

If someone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count the number of files in a directory

Hi All, How do i find out the number of files in a directory using unix command ? (14 Replies)
Discussion started by: Raynon
14 Replies

2. Shell Programming and Scripting

awk to count pattern matches

i have an awk statement which i am using to count the number of occurences of the number ,5, in the file: awk '/,5,/ {count++}' TRY.txt | awk 'END { printf(" Total parts: %d",count)}' i know there is a total of 10 matches..what is wrong here? thanks (16 Replies)
Discussion started by: npatwardhan
16 Replies

3. Shell Programming and Scripting

count number of files in a directory

what's the script to do that? i want to only count the number of files in that directory, not including any sub directories at all (5 Replies)
Discussion started by: finalight
5 Replies

4. Shell Programming and Scripting

Perl line count if it matches a pattern

#!/usr/bin/perl use Shell; open THEFILE, "C:\galileo_integration.txt" || die "Couldnt open the file!"; @wholeThing = <THEFILE>; close THEFILE; foreach $line (@wholeThing){ if ($line =~ m/\\0$/){ @nextThing = $line; if ($line =~ s/\\0/\\LATEST/g){ @otherThing =... (2 Replies)
Discussion started by: nmattam
2 Replies

5. UNIX for Dummies Questions & Answers

Read directory files and count number of lines

Hello, I'm trying to create a BASH file that can read all the files in my working directory and tell me how many words and lines are in that file. I wrote the following code: FILES="*" for f in "$FILES" do echo -e `wc -l -w $f` done My issue is that my file is outputting in one... (4 Replies)
Discussion started by: jl487
4 Replies

6. UNIX for Dummies Questions & Answers

Count number of files in directory excluding existing files

Hi, Please let me know how to find out number of files in a directory excluding existing files..The existing file format will be unknown..each time.. Thanks (3 Replies)
Discussion started by: ammu
3 Replies

7. Shell Programming and Scripting

How to count the number of files starting with a pattern in a Directory

Hi! In our current directory there are around 35000 files. Out of these a few thousands(around 20000) start with, "testfiles9842323879838". I want to count the number of files that have filenames starting with the above pattern. Please help me with the command i could use. Thank... (7 Replies)
Discussion started by: atechcorp
7 Replies

8. Shell Programming and Scripting

grep - match files containing minimum number of pattern matches

I want to search a bunch of files and list only those containing a minimum number of pattern matches. So if I want to identify files containing 3 (or more) instances of the pattern "said:" and I have file1 that contains the lines: He said: She said: and file2 that contains the lines: He... (3 Replies)
Discussion started by: stumpyuk
3 Replies

9. Shell Programming and Scripting

How to count number of files in directory and write to new file with number of files and their name?

Hi! I just want to count number of files in a directory, and write to new text file, with number of files and their name output should look like this,, assume that below one is a new file created by script Number of files in directory = 25 1. a.txt 2. abc.txt 3. asd.dat... (20 Replies)
Discussion started by: Akshay Hegde
20 Replies

10. Shell Programming and Scripting

Count the number of subset of files in a directory

hi I am trying to write a script to count the number of files, with slightly different subset name, in a directory for example, in directory /data, there are a subset of files that are name as follow /data/data_1_(1to however many).txt /data/data_2_(1 to however many).txt... (12 Replies)
Discussion started by: piynik
12 Replies
GIT-GREP(1)							    Git Manual							       GIT-GREP(1)

NAME
git-grep - Print lines matching a pattern SYNOPSIS
git grep [-a | --text] [-I] [-i | --ignore-case] [-w | --word-regexp] [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] [-F | --fixed-strings] [-n] [-l | --files-with-matches] [-L | --files-without-match] [-z | --null] [-c | --count] [--all-match] [-q | --quiet] [--max-depth <depth>] [--color[=<when>] | --no-color] [-A <post-context>] [-B <pre-context>] [-C <context>] [-f <file>] [-e] <pattern> [--and|--or|--not|(|)|-e <pattern>...] [--cached | --no-index | <tree>...] [--] [<pathspec>...] DESCRIPTION
Look for specified patterns in the tracked files in the work tree, blobs registered in the index file, or blobs in given tree objects. OPTIONS
--cached Instead of searching tracked files in the working tree, search blobs registered in the index file. --no-index Search files in the current directory, not just those tracked by git. -a, --text Process binary files as if they were text. -i, --ignore-case Ignore case differences between the patterns and the files. -I Don't match the pattern in binary files. --max-depth <depth> For each <pathspec> given on command line, descend at most <depth> levels of directories. A negative value means no limit. -w, --word-regexp Match the pattern only at word boundary (either begin at the beginning of a line, or preceded by a non-word character; end at the end of a line or followed by a non-word character). -v, --invert-match Select non-matching lines. -h, -H By default, the command shows the filename for each match. -h option is used to suppress this output. -H is there for completeness and does not do anything except it overrides -h given earlier on the command line. --full-name When run from a subdirectory, the command usually outputs paths relative to the current directory. This option forces paths to be output relative to the project top directory. -E, --extended-regexp, -G, --basic-regexp Use POSIX extended/basic regexp for patterns. Default is to use basic regexp. -F, --fixed-strings Use fixed strings for patterns (don't interpret pattern as a regex). -n Prefix the line number to matching lines. -l, --files-with-matches, --name-only, -L, --files-without-match Instead of showing every matched line, show only the names of files that contain (or do not contain) matches. For better compatibility with git diff, --name-only is a synonym for --files-with-matches. -z, --null Output instead of the character that normally follows a file name. -c, --count Instead of showing every matched line, show the number of lines that match. --color[=<when>] Show colored matches. The value must be always (the default), never, or auto. --no-color Turn off match highlighting, even when the configuration file gives the default to color output. Same as --color=never. -[ABC] <context> Show context trailing (A -- after), or leading (B -- before), or both (C -- context) lines, and place a line containing -- between contiguous groups of matches. -<num> A shortcut for specifying -C<num>. -p, --show-function Show the preceding line that contains the function name of the match, unless the matching line is a function name itself. The name is determined in the same way as git diff works out patch hunk headers (see Defining a custom hunk-header in gitattributes(5)). -f <file> Read patterns from <file>, one per line. -e The next parameter is the pattern. This option has to be used for patterns starting with - and should be used in scripts passing user input to grep. Multiple patterns are combined by or. --and, --or, --not, ( ... ) Specify how multiple patterns are combined using Boolean expressions. --or is the default operator. --and has higher precedence than --or. -e has to be used for all patterns. --all-match When giving multiple pattern expressions combined with --or, this flag is specified to limit the match to files that have lines to match all of them. -q, --quiet Do not output matched lines; instead, exit with status 0 when there is a match and with non-zero status when there isn't. <tree>... Instead of searching tracked files in the working tree, search blobs in the given trees. -- Signals the end of options; the rest of the parameters are <pathspec> limiters. <pathspec>... If given, limit the search to paths matching at least one pattern. Both leading paths match and glob(7) patterns are supported. EXAMPLES
git grep time_t -- *.[ch] Looks for time_t in all tracked .c and .h files in the working directory and its subdirectories. git grep -e '#define' --and ( -e MAX_PATH -e PATH_MAX ) Looks for a line that has #define and either MAX_PATH or PATH_MAX. git grep --all-match -e NODE -e Unexpected Looks for a line that has NODE or Unexpected in files that have lines that match both. AUTHOR
Originally written by Linus Torvalds <torvalds@osdl.org[1]>, later revamped by Junio C Hamano. DOCUMENTATION
Documentation by Junio C Hamano and the git-list <git@vger.kernel.org[2]>. GIT
Part of the git(1) suite NOTES
1. torvalds@osdl.org mailto:torvalds@osdl.org 2. git@vger.kernel.org mailto:git@vger.kernel.org Git 1.7.1 07/05/2010 GIT-GREP(1)
All times are GMT -4. The time now is 03:29 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy