awk to print specific line in file based on criteria

09-22-2016

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

awk to print specific line in file based on criteria

In the file below I am trying to extract a specific instance of path, if the adjacent plugin": "/rundb/api/v1/plugin/49/. Thank you

.

file

Code:

"path": "/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52", "plugin": "/rundb/api/v1/plugin/49/", "pluginName": "FileExporter", "pluginVersion": "5.0.3.1", "path": "/results/analysis/output/Home/Auto_user_S5-00580-7-Medexome_68_tn_035/plugin_out/variantCaller_out.49", "plugin": "/rundb/api/v1/plugin/41/", "pluginName": "variantCaller", "pluginVersion": "5.0.4.0",

awk

Code:

awk -F"\"[,:]* *\"*" '
        {for (i=1; i<NF; i++)(a=1; a<NF; a++)   {if ($i =="path") and $a =="plugin": /rundb/api/v1/plugin/49/"  print $(i+1)
                                }
        }
' file

desired output

Code:

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

09-22-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello cmccabe,

If your Input_file is same as shown sample then following may help you in same.

Code:

awk -F"[\"|:|,]" '{for(i=1;i<=NF;i++){if($i=="path" && $(i+6)=="plugin" && $(i+9)=="/rundb/api/v1/plugin/49/"){print $(i+3)}}}'  Input_file

Output will be as follows.

Code:

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52

Please do let us know if this helps you.

Thanks,
R. Singh

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

09-22-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Another way:

Code:

awk -v adj='plugin": "/rundb/api/v1/plugin/49/' '$0~adj{print p}{p=$4}' FS=\" RS=, file

Last edited by Scrutinizer; 09-22-2016 at 06:16 PM..

These 2 Users Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

09-22-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Hi cmccabe,
If I correctly understand your specification, you could also try this alternative using your original input field separator specification:

Code:

awk -F"\"[,:]* *\"*" '
{	for(i = 2; i < NF - 3; i++)
		if($i       == "path" &&
		   $(i + 2) == "plugin" &&
		   $(i + 3) == "/rundb/api/v1/plugin/49/")
			print $(i + 1)
}
' file

(and I must say that that is a nicely constructed field separator ERE). You could simplify it a little bit using single quotes to avoid needing to escape the double quotes: -F'"[,:]* *"*'.

Hi R. Singh,
Note that using -F"[\"|:|,]" specifies that any ", |, :, |, or , character is to be treated as a field separator because every character in a bracket expression (other than the backslash used to escape a double-quote in a shell double-quoted string) is a match. There is no alternation in a bracket expression. You could get the same effect (unless you really wanted | to be a field separator) with -F'[":,]' (using a matching bracket expression) or with -F'"|:|,' (using alternation without a bracket expression). And, if there were any | characters in the input, your expression would interpret them as field separators instead of as data.

Last edited by Don Cragun; 09-23-2016 at 05:13 AM.. Reason: Fix typo: s/needed/needing/

These 2 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

09-22-2016

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

Attached is my complete input .html. Using a small section of it I am running the below awk to capture a specific path (which works great... thank you to all

I am also trying to capture Aligned Reads, Alignments, and Sample Name as in the desired output.
I put the -- in each line of the desired output to show where the data is coming from but the -- does not exist in the input. I thought that the awk below would work and added comments of what I though was going to happen at each step, but after it runs only the path results in the output.

input

Code:

{"meta": {"limit": 20, "next": "/rundb/api/v1/pluginresult/?limit=20&offset=20", "offset": 0, "previous": null, "total_count": 39}, "objects": [{"apikey": null, "config": {"bamCreate": "on", "compressedType": "tar", "delimiter_select": "_", "fastqCreate": "off", "select_dialog": ["barcodename", "sample", "", "", "", "", ""], "sffCreate": "off", "vcfCreate": "on", "xlsCreate": "off", "zipBAM": "on", "zipFASTQ": "off", "zipSFF": "off", "zipVCF": "on", "zipXLS": "off"}, "duration": "3:07:29.481091", "endtime": "2016-09-21T16:07:48.000524+00:00", "id": 52, "inodes": "16", "jobid": 2254, "owner": "/rundb/api/v1/user/1/", "path": "/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52", "plugin": "/rundb/api/v1/plugin/49/", "pluginName": "FileExporter", "pluginVersion": "5.0.3.1", "reportLink": "/output/Home/Auto_user_S5-00580-4-Medexome_65_028/", "resource_uri": "/rundb/api/v1/pluginresult/52/", "result": "/rundb/api/v1/results/28/", "resultName": "Auto_user_S5-00580-4-Medexome_65", "size": "64889524476", "starttime": "2016-09-21T13:00:19.000043+00:00", "state": "Completed", "store": {}}, {"apikey": null, "config": {"basecalling": "off", "intermediate": "off", "output": "on", "sigproc": "off", "transport": "local_copy", "upload_path": "/media/testnfs"}, "duration": "0:00:04.767467", "endtime": "2016-09-20T23:28:56.000020+00:00", "id": 50, "inodes": "7", "jobid": 2212, "owner": "/rundb/api/v1/user/2/", "path": "/results/analysis/output/Home/Auto_user_S5-00580-7-Medexome_68_tn_035/plugin_out/DataXfer_out.50", "plugin": "/rundb/api/v1/plugin/47/", "pluginName": "DataXfer", "pluginVersion": "5.0.3.0", {"Aligned Reads": "R_2016_09_01_10_24_52_user_S5-00580-4-Medexome", "Configuration": "custom", "Library Type": "Whole Genome", "Target Loci": "Not using", "Target Regions": "LCHv2_IDP", "Trim Reads": false, "barcoded": "true", "barcodes": {"IonXpress_004": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 124, "het_snps": 4411, "homo_indels": 66, "homo_snps": 2572, "no_call": 0, "other": 48, "variants": 7221}}, "IonXpress_005": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 120, "het_snps": 4575, "homo_indels": 61, "homo_snps": 2659, "no_call": 0, "other": 57, "variants": 7472}}, "IonXpress_006": {"hotspots": {}, "targets_bed", Aligned Reads": "R_2016_09_20_12_47_36_user_S5-00580-7-Medexome", "Configuration": "custom", "Library Type": "Whole Genome", "Target Loci": "Not using", "Target Regions": "LCHv2_IDP", "Trim Reads": false, "barcoded": "true", "barcodes": {"IonXpress_007": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 0, "het_snps": 208, "homo_indels": 0, "homo_snps": 100, "no_call": 0, "other": 0, "variants": 308}}, "IonXpress_008": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 0, "het_snps": 86, "homo_indels": 0, "homo_snps": 69, "no_call": 0, "other": 0, "variants": 155}}, "IonXpress_009": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 0, "het_snps": 166, "homo_indels": 0, "homo_snps": 70, "no_call": 0, "other": 0, "variants": 236}}},
"barcodes": {"IonXpress_004": {"Alignments": "IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65", "Average base coverage depth": "310.7", "Average base coverage depth per target": "281.9", "Bases in target regions": "10755169", "Number of mapped reads": "33167416", "Number of unmerged targets": "55083", "Percent assigned target reads": "79.76%", "Percent base reads on target": "59.00%", "Percent reads on target": "79.76%", "Read Filters": "Non-duplicate", "Reference Genome": "hg19", "Sample Name": "MEV42", "Target Regions": "LCHv2_IDP", "Target base coverage at 100x": "93.13%", "Target base coverage at 1x": "99.31%", "Target base coverage at 20x": "99.03%", "Target base coverage at 500x": "10.95%", "Target bases with no strand bias": "98.45%", "Targets with base coverage at 100x": "90.88%", "Targets with base coverage at 1x": "99.19%", "Targets with base coverage at 20x": "99.02%", "Targets with base coverage at 500x": "8.20%", "Targets with full coverage": "99.01%", "Targets with no strand bias": "99.48%", "Total aligned base reads": "5663879075", "Total base reads on target": "3341529577", "Uniformity of base coverage": "97.40%", "Uniformity of base coverage per target": "97.28%"}, "IonXpress_005": {"Alignments": "IonXpress_005_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65", "Average base coverage depth": "242.0", "Average base coverage depth per target": "219.8", "Bases in target regions": "10755169", "Number of mapped reads": "25017639", "Number of unmerged targets": "55083", "Percent assigned target reads": "81.67%", "Percent base reads on target": "59.89%", "Percent reads on target": "81.67%", "Read Filters": "Non-duplicate", "Reference Genome": "hg19", "Sample Name": "MEV43", "Target Regions": "LCHv2_IDP", "Target base coverage at 100x": "87.28%", "Target base coverage at 1x": "99.40%", "Target base coverage at 20x": "99.03%", "Target base coverage at 500x": "4.96%", "Target bases with no strand bias": "97.84%", "Targets with base coverage at 100x": "83.97%", "Targets with base coverage at 1x": "99.30%", "Targets with base coverage at 20x": "99.02%", "Targets with base coverage at 500x": "3.67%", "Targets with full coverage": "99.12%", "Targets with no strand bias": "99.09%", "Total aligned base reads": "4345661140", "Total base reads on target": "2602585080", "Uniformity of base coverage": "97.15%", "Uniformity of base coverage per target": "96.99%"}, "IonXpress_006": {"Alignments": "IonXpress_006_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65", "Average base coverage depth": "228.4", "Average base coverage depth per target": "207.3", "Bases in target regions": "10755169", "Number of mapped reads": "24654316", "Number of unmerged targets": "55083", "Percent assigned target reads": "78.73%", "Percent base reads on target": "58.13%", "Percent reads on target": "78.73%", "Read Filters": "Non-duplicate", "Reference Genome": "hg19", "Sample Name": "MEV44", "Target Regions": "LCHv2_IDP", "Target base coverage at 100x": "86.69%", "Target base coverage at 1x": "99.28%", "Target base coverage at 20x": "98.90%", "Target base coverage at 500x": "3.86%", "Target bases with no strand bias": "98.28%", "Targets with base coverage at 100x": "82.95%", "Targets with base coverage at 1x": "99.15%", "Targets with base coverage at 20x": "98.88%", "Targets with base

awk

Code:

awk -F"[]\":{}, ]*" '  # field separator to split and allow for whitespace in name
{for(i = 2; i < NF - 3; i++ # begin for loop and capture desired and read into $i
if($i == "path" &&  # capture path
$(i + 2) == "plugin" &&  # capture path
$(i + 3) == "/rundb/api/v1/plugin/49/") print $(i + 1)  # capture specific path
if ($i =="Aligned Reads") print $(i+1)  # capture Aligned Reads and print string after
if ($i =="Alignments") print $(i+1) RS  # capture Alignments and print string afte, and newline for each
if ($i =="Sample Name") print $(i+1) RS  # capture Sample Name and print string after, add newline for each
}  # end loop
' file > output

current output

Code:

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52

desired output

Code:

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52    -- path from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome  -- Aligned Reads from index.html
IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  --Aligments from index.html
MEV42  -- Sample Name from index.html
IonXpress_005_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  -- Alignments from index.html
MEV43  -- Sample Name from index.html
IonXpress_006_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  -- Alignments from index.html
MEV44  -- Sample Name from index.html

@Don Cragun what does the line
{for(i = 2; i < NF - 3; i++?

index.html (121.7 KB)

Last edited by cmccabe; 09-22-2016 at 07:52 PM..

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

09-23-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by cmccabe

Attached is my complete input .html. Using a small section of it I am running the below awk to capture a specific path (which works great... thank you to all Smilie

Code:

{"meta": {"limit": 20, "next": ...truncated...

awk

Code:

awk -F"[]\":{}, ]*" '  # field separator to split and allow for whitespace in name
{for(i = 2; i < NF - 3; i++ # begin for loop and capture desired and read into $i
if($i == "path" &&  # capture path
$(i + 2) == "plugin" &&  # capture path
$(i + 3) == "/rundb/api/v1/plugin/49/") print $(i + 1)  # capture specific path
if ($i =="Aligned Reads") print $(i+1)  # capture Aligned Reads and print string after
if ($i =="Alignments") print $(i+1) RS  # capture Alignments and print string afte, and newline for each
if ($i =="Sample Name") print $(i+1) RS  # capture Sample Name and print string after, add newline for each
}  # end loop
' file > output

current output

Code:

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52

desired output

Code:

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52    -- path from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome  -- Aligned Reads from index.html
IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  --Aligments from index.html
MEV42  -- Sample Name from index.html
IonXpress_005_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  -- Alignments from index.html
MEV43  -- Sample Name from index.html
IonXpress_006_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  -- Alignments from index.html
MEV44  -- Sample Name from index.html

@Don Cragun what does the line
{for(i = 2; i < NF - 3; i++?

I presume that the line {for(i = 2; i < NF - 3; i++ in your code gave you a syntax error since the closing parenthesis for the for loop is missing. In the code I suggested:

Code:

for(i = 2; i < NF - 3; i++)

runs a for loop with the loop variable (i) having a starting value of 2, an final value the number of fields on the line minus 3, and incrementing by 1 every time through the loop. Since the first character of your input record was a " and " is a field separator, we know that the 1st field on every record is empty (so there is no need to look at field #1 to see if it might be the string path. And, since we are looking at four fields in a row to determine whether or not to print the 2nd field in that set of four fields, there is no need to look at the last three fields on the line.

Note that a for loop only executes one following command (unless you use braces to group commands together in a block). So, if you had included the missing closing parenthesis at the end of the for statement, only the 1st if statement in your code would have been run inside the loop. the remaining if statements in your code would only be run after the for loop had completed.

Note that the standards only require awk to process text files which have lines no longer than LINE_MAX bytes (which is 2048 on most systems). In practice, I don' know of any awk that fails if input record are limited to LINE_MAX bytes even if the records are not <newline> character terminated. I used the opening brace character as the record separator in the following script since they seem to appear regularly and frequently, and do not appear in the middle of anything that you want to evaluate together to determine which fields to print. Therefore, the bracket expression in the field separator I used does not contain the opening brace character. And, with <space>* after the bracket expression, there is no need to also include <space> in the bracket expression. And, to get rid of the need for backslashes to escape double quotes, I used single quotes to delimit the -F option-argument.

Note that I used else if instead of just if in the search for the field values other than path. There is no need to check to see if a field has the value Sample Name, Alignments, or Aligned Reads once we have already determined that that field's value is path. And, it leaves us with a single compound statement in the for loop (so we still don't need braces to include multiple statements in the loop. But, I included braces around the command to be run by the for loop for clarity.

Which leads us to the following awk script:

Code:

awk -F'"[]},:]* *"*' -v RS='{' '
{	for(i = 2; i < NF - 1; i++) {
		if($i       == "path" &&
		   $(i + 2) == "plugin" &&
		   $(i + 3) == "/rundb/api/v1/plugin/49/")
			print $(i+1) "    -- path from " FILENAME
		else if($i  == "Aligned Reads")
			print $(i+1) "    -- Aligned Reads from " FILENAME
		else if ($i == "Alignments")
			print $(i+1) "    -- Alignments from " FILENAME
		else if($i  == "Sample Name")
			print $(i+1) "    -- Sample Name from " FILENAME
	}
}' index.html

which, with the index.html file you attached, produces the output:

Code:

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52    -- path from index.html
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome    -- Aligned Reads from index.html
IonXpress_007_R_2016_09_20_12_47_36_user_S5-00580-7-Medexome_Auto_user_S5-00580-7-Medexome_68_tn    -- Alignments from index.html
MEV37    -- Sample Name from index.html
IonXpress_008_R_2016_09_20_12_47_36_user_S5-00580-7-Medexome_Auto_user_S5-00580-7-Medexome_68_tn    -- Alignments from index.html
MEV38    -- Sample Name from index.html
IonXpress_009_R_2016_09_20_12_47_36_user_S5-00580-7-Medexome_Auto_user_S5-00580-7-Medexome_68_tn    -- Alignments from index.html
MEV39    -- Sample Name from index.html
R_2016_09_20_10_12_41_user_S5-00580-6-Medexome    -- Aligned Reads from index.html
IonXpress_004_R_2016_09_20_10_12_41_user_S5-00580-6-Medexome_Auto_user_S5-00580-6-Medexome_67_tn    -- Alignments from index.html
MEV34    -- Sample Name from index.html
IonXpress_005_R_2016_09_20_10_12_41_user_S5-00580-6-Medexome_Auto_user_S5-00580-6-Medexome_67_tn    -- Alignments from index.html
MEV35    -- Sample Name from index.html
IonXpress_006_R_2016_09_20_10_12_41_user_S5-00580-6-Medexome_Auto_user_S5-00580-6-Medexome_67_tn    -- Alignments from index.html
MEV36    -- Sample Name from index.html
R_2016_09_01_13_20_02_user_S5-00580-5-Medexome    -- Aligned Reads from index.html
IonXpress_007_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66    -- Alignments from index.html
MEV45    -- Sample Name from index.html
IonXpress_008_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66    -- Alignments from index.html
MEV46    -- Sample Name from index.html
IonXpress_009_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66    -- Alignments from index.html
MEV47    -- Sample Name from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome    -- Aligned Reads from index.html
IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65    -- Alignments from index.html
MEV42    -- Sample Name from index.html
IonXpress_005_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65    -- Alignments from index.html
MEV43    -- Sample Name from index.html
IonXpress_006_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65    -- Alignments from index.html
MEV44    -- Sample Name from index.html
R_2016_09_01_13_20_02_user_S5-00580-5-Medexome    -- Aligned Reads from index.html
IonXpress_007_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66_tn    -- Alignments from index.html
MEV45    -- Sample Name from index.html
IonXpress_008_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66_tn    -- Alignments from index.html
MEV46    -- Sample Name from index.html
IonXpress_009_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66_tn    -- Alignments from index.html
MEV47    -- Sample Name from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome    -- Aligned Reads from index.html

from the single line of input in that file.

Does this come close to what you wanted?

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

09-23-2016

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

Thank you very much for the explanations, I am trying to process it and think it makes sense and will be very helpful later on. I appreciate it

.

The awk using the entire one line of index.html returns the above output (which is very close). In each Alignments there is the portion in bold IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 that will match the portion in bold in path, /results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52. Thus grouping all the matching strings together in the same order aalready outputting. That way any duplicates can be identified (like line 1 and the last line) and removed. If there is no match found then the next line is processed (nothing needs to happen). I am not sure why there in the index.html, but it looks like the API used to retrieve that file has duplicates in it. In the --Alignments from index.html only the portion up to the second _ is needed. So in IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 only IonXpress_004. I will write something as well, but I'm sure it will need work.

Thank you very much for all of your help

.

So using the above as an example:

IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 -- Alignments from index.html portion in bold matches portion in bold from

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52 -- path from index.html so all the user_S5-00580-4-Medexome are grouped together.

Code:

/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52    -- path from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome    -- Aligned Reads from index.html
IonXpress_004    -- Alignments from index.html
MEV42    -- Sample Name from index.html
IonXpress_005    -- Alignments from index.html
MEV43    -- Sample Name from index.html
IonXpress_006    -- Alignments from index.html
MEV44    -- Sample Name from index.html

awk attempt

Code:

awk -F'"[]},:]* *"*' -v RS='{' '
{for(i = 2; i < NF - 1; i++) {
if($i == "path" &&
   $(i + 2) == "plugin" &&
   $(i + 3) == "/rundb/api/v1/plugin/49/")
     print $(i+1) "    -- path from " FILENAME
      else if ($i  == "Aligned Reads")
              print $(i+1) | awk '!x[$0]++' "    -- Aligned Reads from " FILENAME
      else if ($i == "Alignments")
              print $(i+1) | awk -F_R_* '{print $1}' "    -- Alignments from " FILENAME  
      else if($i  == "Sample Name")
              print $(i+1) "    -- Sample Name from " FILENAME
    }
}' index.html | awk 'match($0, /_user\([^_]+)/) { print substr( $0, RSTART, RLENGTH )}' > out

1 bold marking removes duplicates in Aligned Reads
2 bold marking parses Alignments using the second _ removing everything after
3 bold marking groups all _user in Alignments with path (I don't think this will work as I just removed the _user from Alignments

Last edited by cmccabe; 09-23-2016 at 04:17 PM.. Reason: added details, added awk

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

Shell Programming and Scripting

awk to print specific line in file based on criteria

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk/sed/cut to filter out records from a file based on criteria

Discussion started by: MIA651

2. Shell Programming and Scripting

awk to print line based on two keywords

Discussion started by: cmccabe

3. Shell Programming and Scripting

Need a Linux command for find/replace column based on specific criteria.

Discussion started by: transat

4. Shell Programming and Scripting

Only print specific xml values that meet two criteria in python

Discussion started by: brianjb

5. Shell Programming and Scripting

Extract error records based on specific criteria from Unix file

Discussion started by: ratheesh2011

6. Shell Programming and Scripting

Passing parameter in sed or awk commands to print for the specific line in a file

Discussion started by: satyasrin82

7. Shell Programming and Scripting

AWK Print Line If Specific Character Is Matched

Discussion started by: PointyWombat

8. Shell Programming and Scripting

Extract data based on specific search criteria

Discussion started by: jaygamini

9. Shell Programming and Scripting

Append specific lines to a previous line based on sequential search criteria

Discussion started by: jesse

10. Shell Programming and Scripting

To print a specific line in Shell or awk.

Discussion started by: tushar_tus