Sponsored Content
Full Discussion: Text string parsing in awk
Top Forums Shell Programming and Scripting Text string parsing in awk Post 302892634 by Don Cragun on Thursday 13th of March 2014 10:14:46 PM
Old 03-13-2014
You haven't shown us what a single line in your input file looks like, and you haven't even shown us one complete action section of your awk script. How did you determine that this section of code is your only performance problem?

With the following complete awk script (which incorporates code very similar to the code you showed us):
Code:
awk '
{	alarm = $0
	len = split(alarm,a," ")
	ent = a[3]
	chem = a[4]
	for (i = 5; i<= len; i++)
		chem = chem " " a[i]
	print "ent=" ent " chem=" chem
}' file

and with file containing 10,000 lines starting with the following:
Code:
junk1-1 junk2-1 ent1 chem1-1 chem2-1 chem3-1 chem4-1 chem5-1 chem6-1 chem7-1 chem7-1 chem8-1
junk1-2 junk2-2 ent2 chem1-2 chem2-2 chem3-2 chem4-2 chem5-2 chem6-2 chem7-2 chem7-2 chem8-2
junk1-3 junk2-3 ent3 chem1-3 chem2-3 chem3-3 chem4-3 chem5-3 chem6-3 chem7-3 chem7-3 chem8-3
junk1-4 junk2-4 ent4 chem1-4 chem2-4 chem3-4 chem4-4 chem5-4 chem6-4 chem7-4 chem7-4 chem8-4
junk1-5 junk2-5 ent5 chem1-5 chem2-5 chem3-5 chem4-5 chem5-5 chem6-5 chem7-5 chem7-5 chem8-5
junk1-6 junk2-6 ent6 chem1-6 chem2-6 chem3-6 chem4-6 chem5-6 chem6-6 chem7-6 chem7-6 chem8-6
junk1-7 junk2-7 ent7 chem1-7 chem2-7 chem3-7 chem4-7 chem5-7 chem6-7 chem7-7 chem7-7 chem8-7
junk1-8 junk2-8 ent8 chem1-8 chem2-8 chem3-8 chem4-8 chem5-8 chem6-8 chem7-8 chem7-8 chem8-8
junk1-9 junk2-9 ent9 chem1-9 chem2-9 chem3-9 chem4-9 chem5-9 chem6-9 chem7-9 chem7-9 chem8-9
junk1-10 junk2-10 ent10 chem1-10 chem2-10 chem3-10 chem4-10 chem5-10 chem6-10 chem7-10 chem7-10 chem8-10

I get 10,000 lines of output starting with:
Code:
ent=ent1 chem=chem1-1 chem2-1 chem3-1 chem4-1 chem5-1 chem6-1 chem7-1 chem7-1 chem8-1
ent=ent2 chem=chem1-2 chem2-2 chem3-2 chem4-2 chem5-2 chem6-2 chem7-2 chem7-2 chem8-2
ent=ent3 chem=chem1-3 chem2-3 chem3-3 chem4-3 chem5-3 chem6-3 chem7-3 chem7-3 chem8-3
ent=ent4 chem=chem1-4 chem2-4 chem3-4 chem4-4 chem5-4 chem6-4 chem7-4 chem7-4 chem8-4
ent=ent5 chem=chem1-5 chem2-5 chem3-5 chem4-5 chem5-5 chem6-5 chem7-5 chem7-5 chem8-5
ent=ent6 chem=chem1-6 chem2-6 chem3-6 chem4-6 chem5-6 chem6-6 chem7-6 chem7-6 chem8-6
ent=ent7 chem=chem1-7 chem2-7 chem3-7 chem4-7 chem5-7 chem6-7 chem7-7 chem7-7 chem8-7
ent=ent8 chem=chem1-8 chem2-8 chem3-8 chem4-8 chem5-8 chem6-8 chem7-8 chem7-8 chem8-8
ent=ent9 chem=chem1-9 chem2-9 chem3-9 chem4-9 chem5-9 chem6-9 chem7-9 chem7-9 chem8-9
ent=ent10 chem=chem1-10 chem2-10 chem3-10 chem4-10 chem5-10 chem6-10 chem7-10 chem7-10 chem8-10

in 0.26 to 0.27 seconds.

The following script:
Code:
awk '
{	alarm = $0
	match(alarm, /^[^ ]* [^ ]* [^ ]* /)
	chem = substr(alarm, RLENGTH + 1)
	s = RLENGTH
	match(alarm, /^[^ ]* [^ ]* /)
	ent = substr(alarm, RLENGTH + 1, s - RLENGTH - 1)
	print "ent=" ent " chem=" chem
}' file

with the same input file produces exactly the same output in 0.06 seconds.

These tests were run using awk on Mac OS X Version 10.7.5 on a MacBook Pro laptop. There is no guarantee that you will see this type of speed up on your system with your data, but it should give you an idea to examine.
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need help parsing a string

Hi, I'm writing a shell script that outputs, among other things, some of the information that is outputted by the mysqladmin status command. The output of the command looks like this: Uptime: 816351 Threads: 19 Questions: 80719739 Slow queries: 1419 Opens: 15903523 Flush tables: 1 Open tables:... (6 Replies)
Discussion started by: achieve
6 Replies

2. UNIX for Dummies Questions & Answers

Parsing string

I am passing argument 1-13 to a sh file. I want to parse the string and the get the numbers on either side of "-" in two different variables. I am not familiar with unix .. how can i do this? (3 Replies)
Discussion started by: rolex.mp
3 Replies

3. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

4. Shell Programming and Scripting

String parsing with awk/sed/?

If I have a string that has some name followed by an ID#(ex.B123456) followed by some more #'s and/or letters, would it be possible to just grab the ID portion of this string? If so how? I am pretty new with these text tools so any help is appreciated. Example: "Name_One-B123456A-12348A" (2 Replies)
Discussion started by: airon23bball
2 Replies

5. Shell Programming and Scripting

choose random text between constant string.. using awk?

Hallo I have maybe a little bit advanced request.... I need to choose one random part betwen %.... so i have this.. % text1 text1 text1 text1 text1 text1 text1 text1 text1 % text2 text2 text2 text2 text2 % text3 text3 text3 tetx3 % this choose text between % awk ' /%/... (8 Replies)
Discussion started by: sandwich
8 Replies

6. Shell Programming and Scripting

how to extract a paticular string from the text file with awk.

hello forum members I have txt file which consists the following information. Server: abababa.xyz.ap.mxmx.com Address: 111.143.211.202 Name: rmxd.ipc.ap.mxmx.com Address: 144.111.99.9 from the abovefile i have to extract only string "rmxd.ipc.ap.mxmx.com" through awk command.... (1 Reply)
Discussion started by: rajkumar_g
1 Replies

7. Shell Programming and Scripting

Parsing a long string string problem for procmail

Hi everyone, I am working on fetchmail + procmail to filter mails and I am having problem with parsing a long line in the body of the email. Could anyone help me construct a reg exp for this string below. It needs to match exactly as this string. GetRyt... (4 Replies)
Discussion started by: cwiggler
4 Replies

8. Shell Programming and Scripting

awk + gsub to search multiple input values & replace with located string + extra text

Hi all. I have the following command that is successfully searching for any one of the strings on all lines of a file and replacing it with the instructed value. cat inputFile | awk '{gsub(/aaa|bbb|ccc|ddd/,"1234")}1' > outputFile This does in fact replace any occurrence of aaa, bbb,... (2 Replies)
Discussion started by: dazhoop
2 Replies

9. Shell Programming and Scripting

Complex text parsing with speed/performance problem (awk solution?)

I have 1.6 GB (and growing) of files with needed data between the 11th and 34th line (inclusive) of the second column of comma delimited files. There is also a lot of stray white space in the file that needs to be trimmed. They have DOS-like end of lines. I need to transpose the 11th through... (13 Replies)
Discussion started by: Michael Stora
13 Replies

10. Shell Programming and Scripting

awk to change specific string to new value if found in text file

I am trying to use awk to change a specific string in a field, if it is found, to another value. In the tab-delimited file the text in bold in $3 contains the string 23, which is always right before a ., if it is present. I am trying to change that string to X, keeping the formatting and the... (3 Replies)
Discussion started by: cmccabe
3 Replies
All times are GMT -4. The time now is 11:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy