AWK: Substring search


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK: Substring search
# 1  
Old 08-17-2011
AWK: Substring search

Hi

I have a table like this

Quote:
aaacgt cgt
cggaat acg
acgt
cgtgha
jhaja
I want to know how many times the string in 2nd column appears in the first column as substring.

For example the first string of 2nd column "cgt" occurs 3 times in the 1st column and "acg" one time.

So my desired output is

Quote:
cgt 3
acg 1
THank you very much in advanceSmilie
# 2  
Old 08-17-2011
how big is your table (or file)?
# 3  
Old 08-17-2011
As a clarification, your file sometimes has two columns, some other times has one column?
# 4  
Old 08-17-2011
Does it have to be AWK? If not, then try this script:
Code:
#!/usr/bin/perl
$c=q#awk '$2{print $2}' # . $ARGV[0];
chomp(@x=`$c`);
open I, "$ARGV[0]";
while (<I>){
  @F=split / /;
  for $s (@x){
    @t=$F[0]=~/$s/g;
    $i{$s}+=$#t+1;
  }
}
for $s (@x){
  print "$s $i{$s}\n";
}

Run it as: ./script.pl your_file
# 5  
Old 08-17-2011
Here is my version, also written in perl. I know it is a bit crude but it works. I like bartus11's version, though.

Code:
#!/usr/bin/perl

@a = <>;
%h;

$i=0;

while ($i < $#a + 1)
{
    my($x, $y) = split(/ /, $a[$i]);

    chomp($x);
    chomp($y);

    #   Keep just the first column ($x) in $a.

    $a[$i] = $x;

    if ($y ne '') { $h{$y} = 0; }

    $i++;
}

foreach $key (keys (%h))
{
    my @g = grep(/$key/, @a);

    $h{$key} = ($#g + 1);
}

foreach $key (keys (%h))
{
    print "$key $h{$key}\n";
}

Run it as:
Code:
cat YOUR_FILE_NAME | ./script.pl

# 6  
Old 08-17-2011
Bug

Thanks a lot for your replies. My file is not that big...it has 30000 rows. and It always has 2 columns.

I would prefer awk script because I have been trying to learn and work with awk since recent past.

So I will wait for AWK script for a while. thanks again.Smilie
# 7  
Old 08-17-2011
If you want a solution using basic unix commands:
Code:
#!/bin/ksh
cut -d' ' -f1 b > b1
cut -d' ' -f2 b > b2
while read mStr; do
  mTot=$(egrep -c ${mStr} b1)
  echo "Found ${mTot} for ${mStr}"
done < b2

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed question for substring search

i have this data where i am looking for a two digit number 01,03,05 or 07. if not found i should detect that . this sed command gives me the matching rows . I want the opposite , i want the rows if the match is NOT found . also the sed command is only looking for 01, can i add 03, 05, 07 to... (7 Replies)
Discussion started by: boncuk
7 Replies

2. Shell Programming and Scripting

Search substring in a column of file

Hi all, I have 2 files, the first one containing a list of ids and the second one is a master file. I want to search each id from the first file from the 5th col in the second file. The 5th column in master file has values separated by ';', if not a single value is present. Each id must occur... (2 Replies)
Discussion started by: ritakadm
2 Replies

3. Shell Programming and Scripting

To Search for a pattern and substring text in a file

I have the following data in a text file. "A",1,"MyTextfile.CSV","200","This is ,line one" "B","EFG",23,"MyTextfile1.csv","5621",562,"This is ,line two" I want to extract the fileNames MyTextfile.CSV and MyTextfile1.csv. The problem is not all the lines are delimited with "," There are... (3 Replies)
Discussion started by: AshTrak
3 Replies

4. Shell Programming and Scripting

Extract a substring using SED/AWK

Hi All, I have a log file in which name and version of applications are coming in the following format name It may look like following, based on the name of the application and version: XYZ OR xyz OR XyZ OR xyz I want to separate out the name and version and store them into variables.... (4 Replies)
Discussion started by: bhaskar_m
4 Replies

5. UNIX for Advanced & Expert Users

awk if/substring/append help

Hi All, I need some help with an awk command: What I'm trying to do is append "MYGROUP: " to text with the substring "AT_" the input file follows this format: AT_xxxxxx Name1 Name2 AT_xxxxxx NameA NameB I want the output to be: MYGROUP: AT_xxxxx Name1 Name2 MYGROUP:... (2 Replies)
Discussion started by: bikecraft
2 Replies

6. Shell Programming and Scripting

Getting substring with awk

Hi Team, How to get the last 3 characters of a String irrespective of their length using awk? Thanks Kinny (5 Replies)
Discussion started by: kinny
5 Replies

7. UNIX for Dummies Questions & Answers

search for string and return substring

Hi, I have a file with the following contents: I need to create a script or search command that will search for this string 'ENDC' in the file. This string is unique and only occurs in one record. Once it finds the string, I would like it to return positions 101-109 ( this is the date of... (0 Replies)
Discussion started by: Lenora2009
0 Replies

8. Shell Programming and Scripting

Substring using sed or awk

I am trying to get a substring from a string stored in a variable. I tried sed with a bit help from this forum, but not successful. Here is my problem. My string is: "REPLYFILE=myfile.txt" And I need: myfile.txt (everything after the = symbol). My string is: "myfile.txt.gz.20091120.enc... (5 Replies)
Discussion started by: jamjam10k
5 Replies

9. UNIX for Dummies Questions & Answers

grep exact string/ avoid substring search

Hi All, I have 2 programs running by the following names: a_testloop.sh testloop.sh I read these programs names from a file and store each of them into a variable called $program. On the completion of the above programs i should send an email. When i use grep with ps to see if any of... (3 Replies)
Discussion started by: albertashish
3 Replies

10. UNIX for Dummies Questions & Answers

substring using AWK

can we do substring fuctionality using AWK say I have string "sandeep" can i pick up only portion "nde" from it. Thanks and Regards Sandeep Ranade (3 Replies)
Discussion started by: mahabunta
3 Replies
Login or Register to Ask a Question