Sponsored Content
Full Discussion: AWK: Substring search
Top Forums Shell Programming and Scripting AWK: Substring search Post 302548883 by alister on Friday 19th of August 2011 02:46:08 PM
Old 08-19-2011
Quote:
Originally Posted by polsum
I want to know how many times the string in 2nd column appears in the first column as substring.
Neither of the perl suggestions nor the ksh/egrep script handle this properly. They are all taking the second column and treating it as a regular expression. Should the text in that column contain any regexp metacharacters, they yield erroneous results. And if the text in the second column forms an invalid regular expression, they will error out.

Place a lone asterisk in the second column and you may see something similar to:
Code:
Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE / at substring.pl line 26, <> line 6.

Worse, a valid regular expression with metacharacters (.*) will silently return incorrect results. Perhaps your real data consists of nothing but alphanumerics, in which case the code provided should be adequate, but that wasn't made clear and your problem statement asks for fixed string matching.

Quote:
Originally Posted by polsum
Hi

I have a table like this
Code:
aaacgt cgt
cggaat acg
acgt
cgtgha
jhaja

I want to know how many times the string in 2nd column appears in the first column as substring.

For example the first string of 2nd column "cgt" occurs 3 times in the 1st column and "acg" one time.

So my desired output is
Code:
cgt 3
acg 1

THank you very much in advanceSmilie

Quote:
Originally Posted by polsum
Thanks a lot for your replies. My file is not that big...it has 30000 rows. and It always has 2 columns.
The only sample data you've provided contradicts that statement; some of the lines have only one line.

Also, should the string appearing in the second column be searched for in column 1 of all rows or only rows that follow it? If only those that follow, does that include the current row? You should clarify, because you state above that "acg" only occurs once in column 1, and your desired output shows a 1 for "acg", but "acg" actually occurs twice in the first column.

All suggestions so far scan the entire column an in fact will return 2 where you say it should be 1.

Also, should the string occurring more than once in column1 of a single row be counted as one instance or multiple instances? One of the perl solutions and the ksh/egrep will count "acgacg" as one instance of "acg", while the remaining perl solution would count it as two instances.

Regards,
Alister

Last edited by alister; 08-19-2011 at 03:56 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

substring using AWK

can we do substring fuctionality using AWK say I have string "sandeep" can i pick up only portion "nde" from it. Thanks and Regards Sandeep Ranade (3 Replies)
Discussion started by: mahabunta
3 Replies

2. UNIX for Dummies Questions & Answers

grep exact string/ avoid substring search

Hi All, I have 2 programs running by the following names: a_testloop.sh testloop.sh I read these programs names from a file and store each of them into a variable called $program. On the completion of the above programs i should send an email. When i use grep with ps to see if any of... (3 Replies)
Discussion started by: albertashish
3 Replies

3. Shell Programming and Scripting

Substring using sed or awk

I am trying to get a substring from a string stored in a variable. I tried sed with a bit help from this forum, but not successful. Here is my problem. My string is: "REPLYFILE=myfile.txt" And I need: myfile.txt (everything after the = symbol). My string is: "myfile.txt.gz.20091120.enc... (5 Replies)
Discussion started by: jamjam10k
5 Replies

4. UNIX for Dummies Questions & Answers

search for string and return substring

Hi, I have a file with the following contents: I need to create a script or search command that will search for this string 'ENDC' in the file. This string is unique and only occurs in one record. Once it finds the string, I would like it to return positions 101-109 ( this is the date of... (0 Replies)
Discussion started by: Lenora2009
0 Replies

5. Shell Programming and Scripting

Getting substring with awk

Hi Team, How to get the last 3 characters of a String irrespective of their length using awk? Thanks Kinny (5 Replies)
Discussion started by: kinny
5 Replies

6. UNIX for Advanced & Expert Users

awk if/substring/append help

Hi All, I need some help with an awk command: What I'm trying to do is append "MYGROUP: " to text with the substring "AT_" the input file follows this format: AT_xxxxxx Name1 Name2 AT_xxxxxx NameA NameB I want the output to be: MYGROUP: AT_xxxxx Name1 Name2 MYGROUP:... (2 Replies)
Discussion started by: bikecraft
2 Replies

7. Shell Programming and Scripting

Extract a substring using SED/AWK

Hi All, I have a log file in which name and version of applications are coming in the following format name It may look like following, based on the name of the application and version: XYZ OR xyz OR XyZ OR xyz I want to separate out the name and version and store them into variables.... (4 Replies)
Discussion started by: bhaskar_m
4 Replies

8. Shell Programming and Scripting

To Search for a pattern and substring text in a file

I have the following data in a text file. "A",1,"MyTextfile.CSV","200","This is ,line one" "B","EFG",23,"MyTextfile1.csv","5621",562,"This is ,line two" I want to extract the fileNames MyTextfile.CSV and MyTextfile1.csv. The problem is not all the lines are delimited with "," There are... (3 Replies)
Discussion started by: AshTrak
3 Replies

9. Shell Programming and Scripting

Search substring in a column of file

Hi all, I have 2 files, the first one containing a list of ids and the second one is a master file. I want to search each id from the first file from the 5th col in the second file. The 5th column in master file has values separated by ';', if not a single value is present. Each id must occur... (2 Replies)
Discussion started by: ritakadm
2 Replies

10. Shell Programming and Scripting

sed question for substring search

i have this data where i am looking for a two digit number 01,03,05 or 07. if not found i should detect that . this sed command gives me the matching rows . I want the opposite , i want the rows if the match is NOT found . also the sed command is only looking for 01, can i add 03, 05, 07 to... (7 Replies)
Discussion started by: boncuk
7 Replies
All times are GMT -4. The time now is 11:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy