Well. I'm always interested in competing to awk with other languages. I obviously can not compete in brevity(which is very impressive, when I see the solutions presented in this forum - but they may twist my brains sometimes which seems a horror to me, when coming back to a solution: WTF did I think, when I wrote that pile of crazy code?) so far, but I try to do in maintainability, efficiency(IO-request and memory economy) and runtime speed:
I don't know if you even are able to use ruby, but here's my suggestion in ruby(Just for the fun of learning).
/Rant
Use it like this:
------ Post updated at 01:56 PM ------
Or here with OOP:
------ Post updated at 03:26 PM ------
I would suggest this little change to vgersh solution:
Quote:
Originally Posted by vgersh99
------ Post updated at 06:23 PM ------
Edit
My change is not needed. Even the umodified version prior to the input data specification change (without f++ but f=1) works.
And well: That awk solution is not really that complicated....
Thanks to both Stomp and vgersh99 for spending your time and helping me out. This latest awk code works magically great and the results obtained are as expected:
As a systems guy, i would always choose awk, if possible.
Not perhaps for parsing xml...
Examine the strace -c <code> reports :
For awk :
And for ruby :
The difference will not be noticed on a system when parsing one file.
But in a situation where you need to parse tens of thousands ...
Possibly the ruby code can be written to do it better, but doubtful it will ever surpass awk code in performance.
This is to say if two ideal coders write a program in ruby and awk to do one thing best it can and start forking it
So, to conclude, in my opinion higher level languages are to be used in situations where your program needs many libraries to ease up the job - connect to multiple API endpoints, databases, versioning systems, complex math and such.
You could do all that in awk, but tremendous effort will be required and will beat the propose of short programs which do one thing quick and efficient.
This is just my rant
Regards
Peasant.
These 3 Users Gave Thanks to Peasant For This Post:
noted that the awk solution is in this case far ahead even it is used in that inefficient way that it is reading the input file twice...
Thanks for "strace -c". Never used that before. Good bloat indicator.
------ Post updated at 05:51 PM ------
Well. I thought I did my ruby script fairly good, but it's an absolute desaster. I generated an xml data file of just 10 MB. This is the result.
AWK Resources Ruby procedural Ruby OOP
I assume the string-concatenation is really bad here.
Ruby OOP(without storing the result in memory) So if one wants speed and low memory footprint, one can tune a lot with [high-level-programming-language] or just take awk
In terms of system calls ruby won here(probably because of the double reading of the file with awk) but it's 10 times slower. I think has some fat base whereas awk is very lean, so as more complex the task is, the less relevant is the basic bloat.
Last edited by stomp; 08-16-2018 at 02:05 PM..
Reason: Fixed false results
Todays OS and filesystems are smart, they cache, prefetch and similar math magic being done falling into probability and combinatorics division.
So far and deep in HW that they give you other users data when asked nicely
Filesystems will cache the first 10 MB read, so second read will be amazingly fast(er).
Be sure to take above into consideration during testing.
This was not done to compare ruby or awk per se, just to point out not to limit yourself to certain path, but use the right tool for the task.
As for the strace options, i've read the manual a bit before, to find an option, since i was sure GNU stuff has that nicely formatted without effort
Filesystems will cache the first 10 MB read, so second read will be amazingly fast(er).
Of course. I assume cache is voiding any significant normal read times here. I can create additional processing overhead by reading in too small portions or improve performance by reading larger chunks. This is good, because so now the times here are processing times only.
Quote:
This was not done to compare ruby or awk per se, just to point out not to limit yourself to certain path, but use the right tool for the task.
My curiosity here is NOT "the right tool for the right job" at the moment. My point is: Is [some high-level-programming-language] too bloated and not able to compete in this single task with awk in terms of speed? If not, how much it is behind?
I already tested the same algorithm which is used for awk here in ruby. It's roughly 3 times faster(still 2-3 times slower than awk), but far less elegant than the awk code. That's a first interesting insight. Along with the other realization that line based processing seems to be a lot faster than my chunk-based processing. I've got an idea too, what of my codeparts are a worse and it is good to see actually how much the difference for those "little" things is.
I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file:
<EMAIL>***</EMAIL>
<CUSTOMER_ID>****</CUSTOMER_ID>
<BRANDID>***</BRANDID>
Now I want to grep the values of all these specified... (1 Reply)
I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help
File:
<xml><object1>house</object1><object2>child</object2>... (9 Replies)
Hi Guys
Here is my Input :
<?xml version="1.0" encoding="UTF-8"?>
<xn:MeContext id="01736">
<xn:VsDataContainer id="01736">
<xn:attributes>
<xn:vsDataType>vsDataMeContext</xn:vsDataType>
... (12 Replies)
Hi All,
I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me.
<A>testing_Location</A>
<value>LA</value>
<zone>US</zone>
<B>Region</B>
<value>Russia</value>
<zone>Washington</zone>
<C>Country</C>... (0 Replies)
Hi All,
Find the following code:
<Universal>D38x82j1JJ
</Universal>
I want to retrieve the value of <Universal> tag as below:
Please help me. (3 Replies)
Hi,
I have the following code in my xml file:
<aaaRule loginIdPattern=".*"
orgIdPattern=".*" deny="false" />
<aaaRuleGroup name="dpaas">
<aaaRule loginIdPattern=".*" orgIdPattern=".*"
deny="false" />
I want to retrieve orgIdPattern and loginIdPattern parameter value based on... (2 Replies)
We have 2 XML file 1. ORIGINAL.xml file and 2. ATTRIBUTE.xml files, In the ORIGINAL.xml we need some modification as <resourceCode>431048</resourceCode>under <item type="Manufactured"> tag - we need to grab the 431048 value from tag and pass it to database table in unix shell script to find the... (0 Replies)
Hi Jean
I require your help in writing a shell script. Iam zero in Unix programming. I have a large file about 400 MB of data, which contains about 50000 XML messages seperated by a Tab, I think. I need to extract only 4 values from each XML message and write it onto a new file. Please help me... (2 Replies)
sorry to trouble u guys again...
i have this document here called record.txt and it contains this:
2005-12-05 10:53:17,551 INFO - message received...
2005-12-05 10:53:17,557 INFO - The XML message **************<berth_allocation xmln... (13 Replies)