In the awk I have a very large tab-delimeted file that I am trying to extract the DP= value put it in $16 and add
specific text to $16 with . (dot) in $11-$15 and $18. Only the lines (there are several) that have the formating below in file
will have an empty $16. Other lines will be in a different format and have a value in $16 and can be skipped. Thank you .
awk
file
desired output tab-delimeted
Last edited by cmccabe; 11-29-2017 at 10:06 AM..
Reason: fixed format
Other than adding text to the end of lines that start with a #, what is the code you're trying to use doing wrong?
What are you hoping that /DP=/ will print. I would expect it to print 1 if the string DP= appears somewhere in the current line and print 0 otherwise. Are you looking for the value between the first = and the first ; in field #8? If so, it would seem that you could split field 8 using /[=;]/ as the ERE to use as the subfield delimiter for field 8 and then print subfield 2. But making wild suggestions like this based on a sample size of 1 with no description of that field's contents is extremely dangerous.
This User Gave Thanks to Don Cragun For This Post:
desired output
other format of lines to skip
If I split on $8 using DP= and print the value up to the ;, since that tag is iin both lines, wont it print for both? However since $16 has a value in it that line can be skipped (the much larger of the two lines).
awk
Not sure is the above is any closer and I did not really know what would print, but understand better know? Thank you .
I also tried the perl below which executes but does not print any new output the desired output, but seems close.
perl
output of line to change --- seems to extract 224 and doesn't print homref though qw=0/0)
Last edited by cmccabe; 11-29-2017 at 03:08 PM..
Reason: fixed format and added perl
Are you just trying to print lines that need to have fields added? Or, do you want to print all lines in the file whether they change or not? Your code makes no attempt to ignore the header/comment lines at the start of your file, but you don't seem to want them to be changed even though none of them have 16 tab separated fields and you try to change every line that doesn't have a field #16.
I suggested using /[=;]/ as an ERE argument for split(); you used [=;] instead (which I would expect to give you a syntax error).
I suggested printing array[2] IF DP=value; is ALWAYS at the start of field #8. You printed array[1] and showed us an example where DP=value; is the third subfield in field #8.
I know that you like writing awk 1-liners (instead of readable code). But, until you fully understand the syntax of an if statement in akw, you would be much better off writing awk code longhand so you can actually see how the pieces are supposed to fit together. (And please show us the output you get from the code you're running including the diagnostic messages as well as the normal output.)
Those of us who are trying to help you get tired quickly when there is no clear specification of what you are trying to do and no clear specification of the format of the data you're trying to interpret and modify.
This User Gave Thanks to Don Cragun For This Post:
The DP= is in each line of the input file, however it is always at the start of $8 in the lines that the awk is goiog to update. $11 will also always be empty (like in line1, in line2 there is a value so nothing is done). All lines are printed that are in the input file, that is the comments and each line is printed. In the example since line2 is good it is only printed as is, but line1 needs to be updated brfore it is printed. Do the comment lines need to be skipped as they do not have more then twwo fields in them? Thank you .
file 11 tab-delimeted fields --- the actual file is over 22 million lines in the below format ----
line1 and all lines like it are the one that the awk updates
line2 and all lines like it are skipped(nothing needs to be done to them)
file
awk
I have syntax errors in the awk but added comments that I hope will help. I use one-liners cause for whatever reason I have a hard time not using them (as evident by the syntax errors).... never-the-less commented seems to help. Thank you .
Last edited by cmccabe; 11-29-2017 at 07:34 PM..
Reason: fixed format
Assuming that your sample data is representative, the following seems to do what you want:
Note that this assumes that lines that are to be modified only have 10 fields (i.e., 9 <tab>s) as in your sample. It verifies that field #8 does start with DP= and assigns a diagnostic message to field #16 if it isn't. This seemed safer to me than blindly assuming that the input file is correctly formatted. (And, I added a line to my sample input file to be sure that it worked as I expected. It didn't the first two times I tried it.)
When the contents of file is:
then the output produced is:
Hopefully, this will come close to what you want.
This User Gave Thanks to Don Cragun For This Post:
I am trying to use awk to match the NM_ in file with $1 of id which is tab-delimited. The NM_ will always be in the line of file that starts with > and be after the second _. When there is a match between each NM_ and id, then the value of $2 in id is substituted or used to update the NM_. Each NM_... (3 Replies)
I am trying to use awk skip each line with a ## or # and check each line after for STB= and if that value in greater than or = to 0.8, then at the end of line the text "STRAND BIAS" is written in else "GOOD".
So in the file of 4 entries attached.
awk tried:
awk NR > "##"' "#" -F"STB="... (6 Replies)
I'm trying to remove a specific number of lines, above and below a specific line of text, highlighted in red:
<STMTTRN>
<TRNTYPE>CREDIT
<DTPOSTED>20151205000001
<TRNAMT>10
<FITID>667800001
<CHECKNUM>667800001
<MEMO>BALANCE
</STMTTRN>
<STMTTRN>
<TRNTYPE>DEBIT
<DTPOSTED>20151207000001... (8 Replies)
Hello All,
this is my first post so I don't know if I am doing this right.
I would like to append entries from a series of strings (contained in a text file) consecutively at the end of specifically labeled lines in another file.
As an example:
- the file that contains the values to be... (3 Replies)
Hi friends,
This is sed & awk type question.
I have a text file which has numbers spread all over the file. I want to sum the series of numbers whenever i find it and produce an output file with the sum. For example
###start of input text file ####
abc
def
ghi
1
2
3
4
kjld
random... (3 Replies)
Hello,
i've got this output text:
and i need it to look something like this:
which means that there won't be absolute path of each directory, just it's size and the last word after last '/' in each line, and i also don't need last line '1.7M /tmp'
Looks like there is a simple... (5 Replies)
Hi,
I have the following text file:
8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm
9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm
10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm
11 T1mapping_flip15 ok 128 108 30... (2 Replies)
I am attempting to insert multiple lines of text into a specific place in a text file based on the lines above or below it.
For example, Here is a portion of a zone file.
IN NS ns1.domain.tld.
IN NS ns2.domain.tld.
IN ... (2 Replies)
Hi,
I am having trouble converting a text file. I have been working for this whole day now, still i couldn't make it.
Here is how the text file looks:
_______________________________________________________
DEVICE STATUS INFORMATION FOR LOCATION 1:
OPER STATES: Disabled E:Enabled ... (5 Replies)