Sponsored Content
Top Forums Shell Programming and Scripting Count number of pattern matches per line for all files in directory Post 302898771 by Don Cragun on Wednesday 23rd of April 2014 09:46:58 PM
Old 04-23-2014
Assuming that I am correct in believing that the desired bonus output you provided:
Code:
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   1   1
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   3   4   2
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   5   2   2
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   2   1
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   1   1

should have been:
Code:
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   1   1
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   3   4   2
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   5   2   2
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   1   2   1
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt   3   1   1

and with the sets of three spaces changed to tabs, the following script (using awk instead of perl) seems to also do what you want:
Code:
#!/bin/ksh
awk '
{	nm = nc = ncM = 0
	for(i = 1; i <= NF; i++)
		if(match($i, /comp[0-9]/)) {
			nm++
			if(++nc > ncM)
				ncM = nc
		} else	nc = 0
	if(nm)	printf("%s\t%d\t%d\t%d\n", FILENAME, FNR, nm, ncM)
}' $(cat IDs)

producing the output:
Code:
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	1	1	1
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	3	4	2
ACYPI55796-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	5	2	2
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	1	2	1
ACYPI000008-PA.aa.afa.afa.trim_phyml_tree_fullnames_fullhomolog.txt	3	1	1

If someone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count the number of files in a directory

Hi All, How do i find out the number of files in a directory using unix command ? (14 Replies)
Discussion started by: Raynon
14 Replies

2. Shell Programming and Scripting

awk to count pattern matches

i have an awk statement which i am using to count the number of occurences of the number ,5, in the file: awk '/,5,/ {count++}' TRY.txt | awk 'END { printf(" Total parts: %d",count)}' i know there is a total of 10 matches..what is wrong here? thanks (16 Replies)
Discussion started by: npatwardhan
16 Replies

3. Shell Programming and Scripting

count number of files in a directory

what's the script to do that? i want to only count the number of files in that directory, not including any sub directories at all (5 Replies)
Discussion started by: finalight
5 Replies

4. Shell Programming and Scripting

Perl line count if it matches a pattern

#!/usr/bin/perl use Shell; open THEFILE, "C:\galileo_integration.txt" || die "Couldnt open the file!"; @wholeThing = <THEFILE>; close THEFILE; foreach $line (@wholeThing){ if ($line =~ m/\\0$/){ @nextThing = $line; if ($line =~ s/\\0/\\LATEST/g){ @otherThing =... (2 Replies)
Discussion started by: nmattam
2 Replies

5. UNIX for Dummies Questions & Answers

Read directory files and count number of lines

Hello, I'm trying to create a BASH file that can read all the files in my working directory and tell me how many words and lines are in that file. I wrote the following code: FILES="*" for f in "$FILES" do echo -e `wc -l -w $f` done My issue is that my file is outputting in one... (4 Replies)
Discussion started by: jl487
4 Replies

6. UNIX for Dummies Questions & Answers

Count number of files in directory excluding existing files

Hi, Please let me know how to find out number of files in a directory excluding existing files..The existing file format will be unknown..each time.. Thanks (3 Replies)
Discussion started by: ammu
3 Replies

7. Shell Programming and Scripting

How to count the number of files starting with a pattern in a Directory

Hi! In our current directory there are around 35000 files. Out of these a few thousands(around 20000) start with, "testfiles9842323879838". I want to count the number of files that have filenames starting with the above pattern. Please help me with the command i could use. Thank... (7 Replies)
Discussion started by: atechcorp
7 Replies

8. Shell Programming and Scripting

grep - match files containing minimum number of pattern matches

I want to search a bunch of files and list only those containing a minimum number of pattern matches. So if I want to identify files containing 3 (or more) instances of the pattern "said:" and I have file1 that contains the lines: He said: She said: and file2 that contains the lines: He... (3 Replies)
Discussion started by: stumpyuk
3 Replies

9. Shell Programming and Scripting

How to count number of files in directory and write to new file with number of files and their name?

Hi! I just want to count number of files in a directory, and write to new text file, with number of files and their name output should look like this,, assume that below one is a new file created by script Number of files in directory = 25 1. a.txt 2. abc.txt 3. asd.dat... (20 Replies)
Discussion started by: Akshay Hegde
20 Replies

10. Shell Programming and Scripting

Count the number of subset of files in a directory

hi I am trying to write a script to count the number of files, with slightly different subset name, in a directory for example, in directory /data, there are a subset of files that are name as follow /data/data_1_(1to however many).txt /data/data_2_(1 to however many).txt... (12 Replies)
Discussion started by: piynik
12 Replies
PESCETTI(1)						      General Commands Manual						       PESCETTI(1)

NAME
pescetti -- Pseudo-Duplimate Generator SYNOPSIS
pescetti DESCRIPTION
This manual page documents briefly the pescetti command. OPTIONS
Here are a list of the available options and what they do. You must specify exactly one from --demo, --generate or --load. --help Prints the help text --demo Demonstration mode. Generates one hand with permutations and the tutorial for how to use them. --generate=N Generate N random boards --load=boards.txt Load boards+analysis from boards.txt --load-dds=boards.dds Load boards from boards.dds in dds format --load-analysis=tricks.txt Load analysis from tricks.txt --permutations=permutations.txt Generate the permutations and save them to the given file --curtains=curtains.txt Save curtain cards to file curtains.txt --save=boards.txt Save the boards+analysis to boards.txt --save-dds=boards.dds Save the boards to boards.dds in dds format --save-analysis=tricks.txt Save the analysis to tricks.txt --format=html|txt|pdf Set the output mode to the given format --title=title Set the title for the output --output=hands.txt Print the hands to hands.txt, rather than to standard output --stats Generate statistics about the set of boards; included in the hands output --analyze Run the dds analyzer on the boards and print the resulting numberof tricks (warning SLOW) --criteria= A list of criteria to apply to each generated hand to generate specific hand types. The list should be space separated and each item may be suffixed with a colon and a (fractional) probability value which can be used to weight the criteria. E.g. --criteria="weaknt:0.8 strongnt:0.5" Valid criteria are: unbalanced weaknt strongnt twont strongtwo weaktwo three twoclubs 4441 singlesuit twosuits partscore game slam game-invite slam-invite jumpshift jumpfit splinter bacon weird --probability=factor Generate hands matching the criteria with only the given probability. Factor is in the range 0 to 1. On each attempt to generate a board it is rejected if it doesn't match the criteria with the given probability. A factor of about 0.8 gives roughly half matching boards AUTHOR
This manual page was written by Matthew Johnson <debian@matthew.ath.cx>. Permission is granted to copy, distribute and/or modify this docu- ment under the terms of the GNU General Public License, Version 2 as published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. PESCETTI(1)
All times are GMT -4. The time now is 11:50 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy