xargs won't help you here. It will put the file names found onto the command line as parameters to awk, like the shell does in above proposal. In either case, awk will work on that input stream writing ALL results to stdout. If you want the output by input file name, you need to redirect within awk.
The 1st post in this thread explicitly requested that every field in your input files be searched for PVALUE=number. But, the sample data provided never shows more than once such string on an input line and, on lines that do have something matching that pattern, it always appears in the last field on the line. But, we have no indication of whether or not the sample data provided in post #5 in this thread is representative of the actual data that needs to be processed. From the code samples posted, it appears that the submitter wants one output file produced for each input file that contains matched lines. The submitter seems to also want to have 15 copies of awk running in parallel (which only makes sense if those 15 awk commands won't be thrashing CPU and/or disk accesses.
Assuming that parallel processing won't really help much here (and might actually slow down processing), avoiding xargs completely, and assuming that an input line may contain more than one of the patterns above; I would try something more like:
Note that this doesn't create an output file for every input file; it only creates an output file if one or more lines in the corresponding input file meets your criteria.
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
thanks Don
Parallel processing is not an issue here as I am doing it on a cluster.
The string to match appears invariably in third column but the order of variables NRHITS and PVALUE in third column might vary.
While the code does not write separate output files for each input, I am wondering if a combination of xargs and sed can help. If so, how ?
Thanks
Quote:
Originally Posted by Don Cragun
The 1st post in this thread explicitly requested that every field in your input files be searched for PVALUE=number. But, the sample data provided never shows more than once such string on an input line and, on lines that do have something matching that pattern, it always appears in the last field on the line. But, we have no indication of whether or not the sample data provided in post #5 in this thread is representative of the actual data that needs to be processed. From the code samples posted, it appears that the submitter wants one output file produced for each input file that contains matched lines. The submitter seems to also want to have 15 copies of awk running in parallel (which only makes sense if those 15 awk commands won't be thrashing CPU and/or disk accesses.
Assuming that parallel processing won't really help much here (and might actually slow down processing), avoiding xargs completely, and assuming that an input line may contain more than one of the patterns above; I would try something more like:
Note that this doesn't create an output file for every input file; it only creates an output file if one or more lines in the corresponding input file meets your criteria.
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
---------- Post updated at 01:35 PM ---------- Previous update was at 01:12 PM ----------
I received a suggestion of using perl -ne. I used the following command.
thanks Don
Parallel processing is not an issue here as I am doing it on a cluster.
The string to match appears invariably in third column but the order of variables NRHITS and PVALUE in third column might vary.
While the code does not write separate output files, I am wondering if a combination of xargs and sed can help. If so, how ?
Thanks
OK. I completely misunderstood your example. I thought your field separator was <semicolon>, but now I'm guessing that <tab> is your field separator, and <semicolon> is a subfield separator in your third field.
And you are wrong. The code I suggested produces a separate output file for each input file that contains lines that meet your criteria.
Using your updated description (but assuming that no <semicolon> characters appear anywhere in the 1st two fields in your input files AND assuming that a single <tab> character separates the first three fields), my code adjusted for your new description of the problem is:
And, with the following input files: file1.txt: file2.txt: file3.txt: file4.txt:
It produces the output files: file1.txt.fail: file2.txt.fail: file3.txt.fail:
Note that there is no file4.txt.fail file because no line in file4.txt meets your criteria.
This User Gave Thanks to Don Cragun For This Post:
WHY do you insist on xargs? You have received some proposals working entirely without it, although they may be somewhat off target as the target is not THAT clear. perl, sed, awk - they all will do what (we think) you need on an input stream of the desired file names.
The below code is a simple modified sample from a file with millions of lines containing hundreds of extra columns xxx="yyy" ...
<app addr="1.2.3.4" rem="1000" type="aaa" srv="server1" usr="user1"/>
<app usr="user2" srv="server2" rem="1001" type="aab" addr="1.2.3.5"/>What's the most efficient awk... (2 Replies)
Hello experts,
I'm stuck with this script for three days now. Here's what i need.
I need to split a large delimited (,) file into 2 files based on the value present in the last field.
Samp: Something.csv
bca,adc,asdf,123,12C
bca,adc,asdf,123,13C
def,adc,asdf,123,12A
I need this split... (6 Replies)
Thanks for giving your time and effort to answer questions and helping newbies like me understand awk.
I have a huge file, millions of lines, so perl takes quite a bit of time, I'd like to convert these perl one liners to awk.
Basically I'd like all lines with ISA sandwiched between... (9 Replies)
Hello,
I have two files...
File #1
1 3
2 5
File #2
3 5 3
1 3 7
9 1 5
2 5 8
3 3 1
I need to extract all lines from File #2 where the first two columns match each line of File #1. So in the example, the output would be:
1 3 7
2 5 8
Is there a quick one-liner that would... (4 Replies)
Sorry for such a basic question, but I have spent hours trying to work this out! I need an awk command (or similar) that will look at a text file and output to the screen if the 4th column of each line has a value greater than or equal to x.
data.txt
This is the 1 line
This is the 2 line
This... (6 Replies)