awk to extract multiple values from file and add two additional fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to extract multiple values from file and add two additional fields
# 1  
Old 10-10-2016
awk to extract multiple values from file and add two additional fields

In the attached file I am trying to use awk to extract multiple values and create the tab-delimited desired output.
In the output R_Index is a the sequential # and Pre_Enrichment is defaulted to ..
I can extract from the values to the side of the keywords, but most are above and I can not extract those. There is most likely a better way to do this but I included my attempt as well.Thank you Smilie.

first part of awk adds R_Index, second part od awk defaults Pre_Enrichment to ..

Code:
awk -F'\t' -v OFS='\t' '{$0=((NR==1) ? "R_Index" : (NR - 1)) OFS $0} 1' | awk -F'\t' 'NR==1{Q=NF;print} NR>1{for(i=1;i<=Q;i++){if(!$i){$i="."}};print}' OFS="\t" | awk '{for (I=1;I<=NF;I++) if ($I == "Live") {print $(I+2)};}' test.txt|

desired output (--- do not exist in test.txt just added for clarification)
Code:
R_Index     1   --- not in test.txt added hopefully in awk
ISPLoading     84%
Pre-Enrichment     .      -- not in test.txt defaulted to .
TotalReads     75,130,408
ReadLength     203 bp
KeySignal     80
UsableSequence     61%
Enrichment     99.2%   --- this is called Live in test.txt
Polyclonal     30.0%
LowQuality     09.0%
TestFragment     88%
AlignedBases     99.1%
UnalignedBases     0.9%


Last edited by cmccabe; 10-10-2016 at 02:27 PM.. Reason: fixed format, added awk
# 2  
Old 10-10-2016
Hello cmccabe,

Could you please be more clear into your requirements, not clear like whichever conditions you need to get your expected output. Good that you are showing your attempts to us, would like to request you to let us know all conditions/requirements that you need to get your expected output.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 10-10-2016
In the attached test.txt each one of the below $1 strings can be found and has a value above it that I am trying to include as $2.

Code:
          (the --- are the location of the strings and values)
ISP Loading     84%      ---- row 3 $1
TotalReads     75,130,408  ---row 2 $2
ReadLength     203 bp    ---- row 3 $3[, the mean value is used
KeySignal     80     ---  row 2 $2
UsableSequence     61%  ---- row 3 $2
Polyclonal     30.0%    --- row 10 $3
LowQuality     09.0%   --- row 11 $3
TestFragment     88%   --- row 20 $3
AlignedBases     99.1%   --- row 29 $3
UnalignedBases     0.9%    ---- row 30 $3

The first portion of the awk before the first |adds R_Index in $1 and sequentially #'s it in $2 as the first row in the desired output.

The second portion of the awk after the first | is an attempt at defaulting Pre-Enrichmentto . in $2, but I am unsure of home to put that label in $1


Enrichment is called Live and has a value of 99.2%. The third portion of the awk after the | was an attempt to extract the value from test.txt. Since this is the only value that is after the keyword (not above), I think I am close.

The final output is tab-delimited and looks like this:
Code:
R_Index     1
ISP Loading     84%
Pre-Enrichment     .
Total Reads     75,130,408
Read Length     203 bp
Key Signal     80
UsableSequence     61%
Enrichment     99.2%
Polyclonal     30.0%
Low Quality     09.0%
Test Fragment     88%
Aligned Bases     99.1%
Unaligned Bases     0.9%

I hope this helps and thank you very much Smilie.

I need to update this post as my desired output has changed. I am not in my office and it is too hard from my phone and will do so from there in about 2 hours.. Thank you Smilie.

here is the new edit:
new desired output
Code:
R_Index ISP Loading Pre-Enrichment Total Reads Key Signal Usable Sequence Enrichment Polyclonal Low Quality Test Fragment Aligned Bases Unaligned Bases
     1 84 . 75130408 203 80 61 99.2 30 9 88 99.1 0.9

Description:
The tab-delimited output has a header row in it in row 1. These are the key words in the txt file where data is extracted or the additional two fields R_Index and Pre-Enrichment. The below is the data with each line commented only for clarification, I hope it helps and thank you Smilie.
Code:
R_Index 1 -- sequential #
ISP Loading     84% -- % removed
Pre-Enrichment     . -- always a dot
Total Reads     75,130,408 -- commas removed
Read Length     203 bp -- bp removed
Key Signal     80 -- just extracted as is
Usable Sequence     61% -- % removed
Enrichment     99.2% -- called live in the txt % removed
Polyclonal     30.0% -- decimal and % removed
Low Quality     09.0% -- leading 0  and % removed
Test Fragment     88% -- % removed
Aligned Bases     99.1% -- decimal and % removed
Unaligned Bases     0.9% -- % removed


Last edited by cmccabe; 10-11-2016 at 03:08 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with awk to extract additional info

Hi I use multipath linux command to get LUNs info and find out if any failed. # multipath -ll >/tmp/mpfail # cat /tmp/mpfail multipath.conf line 109, invalid keyword: user_friendly_names multipath.conf line 153, invalid keyword: user_friendly_names multipath.conf line 193, invalid... (4 Replies)
Discussion started by: prvnrk
4 Replies

2. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

Extract multiple values into corresponding variables

Here is my input # MANIFEST.MF Manifest-Version: 1.0 Build-Jdk: 1.6.0 Built-By: CM_TEAM Build_SvnRev: 662789 Build_Number: 13.0.0.0-JDK8 Build_Date: Wed 04/05/2017-20:48:19.17 Archiver-Version: Plexus Archiver Created-By: Apache Maven 3.1.0 Here is the expected output:... (4 Replies)
Discussion started by: kchinnam
4 Replies

4. Shell Programming and Scripting

awk to print line is values between two fields in separate file

I am trying to use awk to find all the $3 values in file2 that are between $2 and $3 in file1. If a value in $3 of file2 is between the file1 fields then it is printed along with the $6 value in file1. Both file1 and file2 are tab-delimited as well as the desired output. If there is nothing to... (4 Replies)
Discussion started by: cmccabe
4 Replies

5. Shell Programming and Scripting

Merge values from multiple directories into one file in awk or bash

I am trying to merge or combine all $1 values in validation.txt from multiple directories into one new file and output it here tab-delimited:/home/cmccabe/Desktop/20x/total/total.txt. Each $2 value and the header would then be a new field in total.txt. I am not sure how to go about this as cat is... (2 Replies)
Discussion started by: cmccabe
2 Replies

6. Shell Programming and Scripting

Inserting additional comma delimiters in a csv file, after and before certian fields.

Hello I have a csv file which I need to insert addtional commas into. The csv is of the format field1,field2,field3,field4,...etc...,field13,field14 I need to add extra commas in each record so that the final output looks like ... (1 Reply)
Discussion started by: kamal_p_99
1 Replies

7. Shell Programming and Scripting

Handling multiple fields of a database file for toupper() function in awk

hello everyone.... script is: To convert the contents of a database file into uppercase my code is: printf "%s\n" , $2 | awk '{print toupper($2)}' emp.lst i m able to do only for one field.....didn't get any sources for handling multiple fields. please suggest me for multiple... (1 Reply)
Discussion started by: Priyanka Bhati
1 Replies

8. Shell Programming and Scripting

Compare two files using awk or sed, add values in a column if their previous fields are same

Hi All, I have two files file1: abc,def,ghi,5,jkl,mno pqr,stu,ghi,10,vwx,xyz cba,ust,ihg,4,cdu,oqw file2: ravi,def,kishore ramu,ust,krishna joseph,stu,mike I need two output files as follows In my above example, each row in file1 has 6 fields and each row in file2 has 3... (3 Replies)
Discussion started by: yerruhari
3 Replies

9. UNIX for Dummies Questions & Answers

Compare two files using awk or sed, add values in a column if their previous fields are same

Hi All, I have two files file1: abc,def,ghi,5,jkl,mno pqr,stu,ghi,10,vwx,xyz cba,ust,ihg,4,cdu,oqw file2: ravi,def,kishore ramu,ust,krishna joseph,stu,mike I need two output files as follows In my above example, each row in file1 has 6 fields and each row in file2 has 3... (1 Reply)
Discussion started by: yerruhari
1 Replies

10. UNIX for Advanced & Expert Users

Compare two files using awk or sed, add values in a column if their previous fields are same

Hi All, I have two files file1: abc,def,ghi,5,jkl,mno pqr,stu,ghi,10,vwx,xyz cba,ust,ihg,4,cdu,oqw file2: ravi,def,kishore ramu,ust,krishna joseph,stu,mike I need two output files as follows In my above example, each row in file1 has 6 fields and each row in file2 has 3... (1 Reply)
Discussion started by: yerruhari
1 Replies
Login or Register to Ask a Question