awk repeats counter


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers awk repeats counter
# 1  
Old 11-29-2011
awk repeats counter

if I wanted to know if the word DOG(followed by several random numbers) appears in col 1, how many times will that same word DOG* appeared in col 2? This is a very large file

Thanks!
# 2  
Old 11-29-2011
I really don't understand what you are asking. Please clarify the question.

Thanks.

Last edited by Scott; 11-29-2011 at 02:20 PM.. Reason: Thought it would be more constructive asking for clarity!
# 3  
Old 11-29-2011
Sorry for being so ambiguous. Basically I want to quantify how many times a substring is found in a column with respect to another column. Meaning If a substring is present in column 1, check column 2 and count how many times it appears in col 2.
# 4  
Old 11-29-2011
Usually posting sample input and an example of the expected output greatly improves response time ...
# 5  
Old 11-29-2011
-- a late example:

col1 col2
DOG1223312 DOG442
DOG009 DOG1223312
DOG93 DOG1223312
DOG1223312

so basically if it appears in column 1, and there are repeats of it in column 2, count them
# 6  
Old 11-29-2011
Small script that accepts the string and column numbers on the command line, then prints the stats. Assumes file is to be read from stdin.

Code:
#!/usr/bin/env ksh

awk -v str=${1:-DOG} -v col1=${2:-1}  -v col2=${3:-2} '
    {
        if( index( $(col1), str ) )
        {
            if( index( $(col2), str ) )
                both++;
            else
                one++;
        }
        else
            if( index( $(col2), str ) )
                two++;
    }
    END {
        printf( "input lines: %d  both: %d  first-only: %d  second-only: %d\n", NR, both, one, two );
    }
'
exit

If placed in stats.ksh, then invoke like this:
Code:
 stats.ksh DOG 1 2 <data-file

DOG, column 1 and column2 are the defaults if no parameters given on the command line.
# 7  
Old 11-30-2011
Thank you,

I tried this script and the output was:

input lines: 32577 both: 12943 first-only: 19634 second-only: 0

I am not sure if the script is taking into account the random numbers after the word DOG (like DOG[0-9]).
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to lowercase the values in a column in awk and include a dynamic counter?

Hi, I am trying to incorporate 2 functions into my `awk` command. I want to lower case Column 2 (which is essentially the same information in Col1, except in Col1 I want to maintain the capitalization) and I want to count from 0-N that begins and ends with the start of certain markers that I... (6 Replies)
Discussion started by: owwow14
6 Replies

2. Shell Programming and Scripting

awk - Skip x Number of Lines in Counter

Hello, I am new to AWK and in UNIX in general. I am hoping you can help me out here. Here is my data: root@ubuntu:~# cat circuits.list WORD1 AA BB CC DD Active ISP1 ISP NAME1 XX-XXXXXX1 WORD1 AA BB CC (9 Replies)
Discussion started by: tattoostreet
9 Replies

3. Shell Programming and Scripting

awk line instance counter

I Have a text file with several thousand lines of text. Occasionally there will be a "sysAlive" line of text (every so often) What would be an awk command to print every line of text, and to put in incrementing counter ONLY on the "sysAlive" lines For example: >cat file.txt lineAAA a b c d... (4 Replies)
Discussion started by: ajp7701
4 Replies

4. Shell Programming and Scripting

Remove brackets repeats and separate in columns

Hi all, I want to remove the remove bracket sign ( ) and put in the separate column I also want to remove the repeated entry like in first row in below input (PA156) is repeated ESR1 (PA156) leflunomide (PA450192) (PA156) leflunomide (PA450192) CHST3 (PA26503) docetaxel... (4 Replies)
Discussion started by: manigrover
4 Replies

5. UNIX for Dummies Questions & Answers

Can't figure out why this repeats

#!/bin/sh while IFS=: read address port; do : ${port:=443} address=$address port=$port cd $f_location number=`grep "$address" thing.txt -A 1 | grep "addresses=" | cut -d'"' -f2` echo "$address,$port,$number,$answer" >>... (9 Replies)
Discussion started by: shade917
9 Replies

6. Shell Programming and Scripting

sed behaving oddly, repeats lines

Hi, all. Here's the problem: sed '/FOO/,/BAR/p' That should print anything between FOO and BAR, right? Well, let's say I have file.txt that contains just one line "how are you today?". Then I run something like the above and get: $ sed '/how/,/today/p' file.txt how are you... (9 Replies)
Discussion started by: pereyrax
9 Replies

7. Shell Programming and Scripting

AWK counter problem

Hi I have a file like below ############################################ # ParentFolder Flag SubFolders Colateral 1 Source1/Checksum CVA 1 Source1/Checksum Flexing 1 VaR/Checksum Flexing 1 SVaR/Checksum FX 1 ... (5 Replies)
Discussion started by: manas_ranjan
5 Replies

8. Shell Programming and Scripting

word frequency counter - awk solution?

Dear all, i need your help on this. There is a text file, i need to count word frequency for each word with frequency >40 in each line of file and output it into another file with columns like this: word1,word2,word3, ...wordn 0,0,1 1,2,0 3,2,0 etc -- each raw represents... (13 Replies)
Discussion started by: irrevocabile
13 Replies

9. UNIX for Dummies Questions & Answers

Search for repeats in text file - how?

I have a text file that I want to search for repeated lines and print those lines. These would be lines in the file that appear more than once. Is there a way to do this? Thanks (4 Replies)
Discussion started by: aarondesk
4 Replies
Login or Register to Ask a Question