awk to extract multiple values from file and add two additional fields

10-10-2016

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

awk to extract multiple values from file and add two additional fields

In the attached file I am trying to use awk to extract multiple values and create the tab-delimited desired output.
In the output R_Index is a the sequential # and Pre_Enrichment is defaulted to ..
I can extract from the values to the side of the keywords, but most are above and I can not extract those. There is most likely a better way to do this but I included my attempt as well.Thank you

.

first part of awk adds R_Index, second part od awk defaults Pre_Enrichment to ..

Code:

awk -F'\t' -v OFS='\t' '{$0=((NR==1) ? "R_Index" : (NR - 1)) OFS $0} 1' | awk -F'\t' 'NR==1{Q=NF;print} NR>1{for(i=1;i<=Q;i++){if(!$i){$i="."}};print}' OFS="\t" | awk '{for (I=1;I<=NF;I++) if ($I == "Live") {print $(I+2)};}' test.txt|

desired output (--- do not exist in test.txt just added for clarification)

Code:

R_Index     1   --- not in test.txt added hopefully in awk
ISPLoading     84%
Pre-Enrichment     .      -- not in test.txt defaulted to .
TotalReads     75,130,408
ReadLength     203 bp
KeySignal     80
UsableSequence     61%
Enrichment     99.2%   --- this is called Live in test.txt
Polyclonal     30.0%
LowQuality     09.0%
TestFragment     88%
AlignedBases     99.1%
UnalignedBases     0.9%

test.txt (6.3 KB)

Last edited by cmccabe; 10-10-2016 at 02:27 PM.. Reason: fixed format, added awk

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

10-10-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello cmccabe,

Could you please be more clear into your requirements, not clear like whichever conditions you need to get your expected output. Good that you are showing your attempts to us, would like to request you to let us know all conditions/requirements that you need to get your expected output.

Thanks,
R. Singh

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

10-10-2016

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

In the attached test.txt each one of the below $1 strings can be found and has a value above it that I am trying to include as $2.

Code:

          (the --- are the location of the strings and values)
ISP Loading     84%      ---- row 3 $1
TotalReads     75,130,408  ---row 2 $2
ReadLength     203 bp    ---- row 3 $3[, the mean value is used
KeySignal     80     ---  row 2 $2
UsableSequence     61%  ---- row 3 $2
Polyclonal     30.0%    --- row 10 $3
LowQuality     09.0%   --- row 11 $3
TestFragment     88%   --- row 20 $3
AlignedBases     99.1%   --- row 29 $3
UnalignedBases     0.9%    ---- row 30 $3

The first portion of the awk before the first |adds R_Index in $1 and sequentially #'s it in $2 as the first row in the desired output.

The second portion of the awk after the first | is an attempt at defaulting Pre-Enrichmentto . in $2, but I am unsure of home to put that label in $1

Enrichment is called Live and has a value of 99.2%. The third portion of the awk after the | was an attempt to extract the value from test.txt. Since this is the only value that is after the keyword (not above), I think I am close.

The final output is tab-delimited and looks like this:

Code:

R_Index     1
ISP Loading     84%
Pre-Enrichment     .
Total Reads     75,130,408
Read Length     203 bp
Key Signal     80
UsableSequence     61%
Enrichment     99.2%
Polyclonal     30.0%
Low Quality     09.0%
Test Fragment     88%
Aligned Bases     99.1%
Unaligned Bases     0.9%

I hope this helps and thank you very much

.

I need to update this post as my desired output has changed. I am not in my office and it is too hard from my phone and will do so from there in about 2 hours.. Thank you

.

here is the new edit:
new desired output

Code:

R_Index ISP Loading Pre-Enrichment Total Reads Key Signal Usable Sequence Enrichment Polyclonal Low Quality Test Fragment Aligned Bases Unaligned Bases
     1 84 . 75130408 203 80 61 99.2 30 9 88 99.1 0.9

Description:
The tab-delimited output has a header row in it in row 1. These are the key words in the txt file where data is extracted or the additional two fields R_Index and Pre-Enrichment. The below is the data with each line commented only for clarification, I hope it helps and thank you

Code:

R_Index 1 -- sequential #
ISP Loading     84% -- % removed
Pre-Enrichment     . -- always a dot
Total Reads     75,130,408 -- commas removed
Read Length     203 bp -- bp removed
Key Signal     80 -- just extracted as is
Usable Sequence     61% -- % removed
Enrichment     99.2% -- called live in the txt % removed
Polyclonal     30.0% -- decimal and % removed
Low Quality     09.0% -- leading 0  and % removed
Test Fragment     88% -- % removed
Aligned Bases     99.1% -- decimal and % removed
Unaligned Bases     0.9% -- % removed

Last edited by cmccabe; 10-11-2016 at 03:08 PM..

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

Shell Programming and Scripting

awk to extract multiple values from file and add two additional fields

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with awk to extract additional info

Discussion started by: prvnrk

2. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

Discussion started by: cmccabe

3. Shell Programming and Scripting

Extract multiple values into corresponding variables

Discussion started by: kchinnam

4. Shell Programming and Scripting

awk to print line is values between two fields in separate file

Discussion started by: cmccabe

5. Shell Programming and Scripting

Merge values from multiple directories into one file in awk or bash

Discussion started by: cmccabe

6. Shell Programming and Scripting

Inserting additional comma delimiters in a csv file, after and before certian fields.

Discussion started by: kamal_p_99

7. Shell Programming and Scripting

Handling multiple fields of a database file for toupper() function in awk

Discussion started by: Priyanka Bhati

8. Shell Programming and Scripting

Compare two files using awk or sed, add values in a column if their previous fields are same

Discussion started by: yerruhari

9. UNIX for Dummies Questions & Answers

Compare two files using awk or sed, add values in a column if their previous fields are same

Discussion started by: yerruhari

10. UNIX for Advanced & Expert Users

Compare two files using awk or sed, add values in a column if their previous fields are same

Discussion started by: yerruhari