Sponsored Content
Top Forums Shell Programming and Scripting awk to print specific line in file based on criteria Post 302982055 by cmccabe on Thursday 22nd of September 2016 05:58:25 PM
Old 09-22-2016
Attached is my complete input .html. Using a small section of it I am running the below awk to capture a specific path (which works great... thank you to all Smilie I am also trying to capture Aligned Reads, Alignments, and Sample Name as in the desired output.
I put the -- in each line of the desired output to show where the data is coming from but the -- does not exist in the input. I thought that the awk below would work and added comments of what I though was going to happen at each step, but after it runs only the path results in the output.

input
Code:
{"meta": {"limit": 20, "next": "/rundb/api/v1/pluginresult/?limit=20&offset=20", "offset": 0, "previous": null, "total_count": 39}, "objects": [{"apikey": null, "config": {"bamCreate": "on", "compressedType": "tar", "delimiter_select": "_", "fastqCreate": "off", "select_dialog": ["barcodename", "sample", "", "", "", "", ""], "sffCreate": "off", "vcfCreate": "on", "xlsCreate": "off", "zipBAM": "on", "zipFASTQ": "off", "zipSFF": "off", "zipVCF": "on", "zipXLS": "off"}, "duration": "3:07:29.481091", "endtime": "2016-09-21T16:07:48.000524+00:00", "id": 52, "inodes": "16", "jobid": 2254, "owner": "/rundb/api/v1/user/1/", "path": "/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52", "plugin": "/rundb/api/v1/plugin/49/", "pluginName": "FileExporter", "pluginVersion": "5.0.3.1", "reportLink": "/output/Home/Auto_user_S5-00580-4-Medexome_65_028/", "resource_uri": "/rundb/api/v1/pluginresult/52/", "result": "/rundb/api/v1/results/28/", "resultName": "Auto_user_S5-00580-4-Medexome_65", "size": "64889524476", "starttime": "2016-09-21T13:00:19.000043+00:00", "state": "Completed", "store": {}}, {"apikey": null, "config": {"basecalling": "off", "intermediate": "off", "output": "on", "sigproc": "off", "transport": "local_copy", "upload_path": "/media/testnfs"}, "duration": "0:00:04.767467", "endtime": "2016-09-20T23:28:56.000020+00:00", "id": 50, "inodes": "7", "jobid": 2212, "owner": "/rundb/api/v1/user/2/", "path": "/results/analysis/output/Home/Auto_user_S5-00580-7-Medexome_68_tn_035/plugin_out/DataXfer_out.50", "plugin": "/rundb/api/v1/plugin/47/", "pluginName": "DataXfer", "pluginVersion": "5.0.3.0", {"Aligned Reads": "R_2016_09_01_10_24_52_user_S5-00580-4-Medexome", "Configuration": "custom", "Library Type": "Whole Genome", "Target Loci": "Not using", "Target Regions": "LCHv2_IDP", "Trim Reads": false, "barcoded": "true", "barcodes": {"IonXpress_004": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 124, "het_snps": 4411, "homo_indels": 66, "homo_snps": 2572, "no_call": 0, "other": 48, "variants": 7221}}, "IonXpress_005": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 120, "het_snps": 4575, "homo_indels": 61, "homo_snps": 2659, "no_call": 0, "other": 57, "variants": 7472}}, "IonXpress_006": {"hotspots": {}, "targets_bed", Aligned Reads": "R_2016_09_20_12_47_36_user_S5-00580-7-Medexome", "Configuration": "custom", "Library Type": "Whole Genome", "Target Loci": "Not using", "Target Regions": "LCHv2_IDP", "Trim Reads": false, "barcoded": "true", "barcodes": {"IonXpress_007": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 0, "het_snps": 208, "homo_indels": 0, "homo_snps": 100, "no_call": 0, "other": 0, "variants": 308}}, "IonXpress_008": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 0, "het_snps": 86, "homo_indels": 0, "homo_snps": 69, "no_call": 0, "other": 0, "variants": 155}}, "IonXpress_009": {"hotspots": {}, "targets_bed": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed", "variants": {"het_indels": 0, "het_snps": 166, "homo_indels": 0, "homo_snps": 70, "no_call": 0, "other": 0, "variants": 236}}},
"barcodes": {"IonXpress_004": {"Alignments": "IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65", "Average base coverage depth": "310.7", "Average base coverage depth per target": "281.9", "Bases in target regions": "10755169", "Number of mapped reads": "33167416", "Number of unmerged targets": "55083", "Percent assigned target reads": "79.76%", "Percent base reads on target": "59.00%", "Percent reads on target": "79.76%", "Read Filters": "Non-duplicate", "Reference Genome": "hg19", "Sample Name": "MEV42", "Target Regions": "LCHv2_IDP", "Target base coverage at 100x": "93.13%", "Target base coverage at 1x": "99.31%", "Target base coverage at 20x": "99.03%", "Target base coverage at 500x": "10.95%", "Target bases with no strand bias": "98.45%", "Targets with base coverage at 100x": "90.88%", "Targets with base coverage at 1x": "99.19%", "Targets with base coverage at 20x": "99.02%", "Targets with base coverage at 500x": "8.20%", "Targets with full coverage": "99.01%", "Targets with no strand bias": "99.48%", "Total aligned base reads": "5663879075", "Total base reads on target": "3341529577", "Uniformity of base coverage": "97.40%", "Uniformity of base coverage per target": "97.28%"}, "IonXpress_005": {"Alignments": "IonXpress_005_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65", "Average base coverage depth": "242.0", "Average base coverage depth per target": "219.8", "Bases in target regions": "10755169", "Number of mapped reads": "25017639", "Number of unmerged targets": "55083", "Percent assigned target reads": "81.67%", "Percent base reads on target": "59.89%", "Percent reads on target": "81.67%", "Read Filters": "Non-duplicate", "Reference Genome": "hg19", "Sample Name": "MEV43", "Target Regions": "LCHv2_IDP", "Target base coverage at 100x": "87.28%", "Target base coverage at 1x": "99.40%", "Target base coverage at 20x": "99.03%", "Target base coverage at 500x": "4.96%", "Target bases with no strand bias": "97.84%", "Targets with base coverage at 100x": "83.97%", "Targets with base coverage at 1x": "99.30%", "Targets with base coverage at 20x": "99.02%", "Targets with base coverage at 500x": "3.67%", "Targets with full coverage": "99.12%", "Targets with no strand bias": "99.09%", "Total aligned base reads": "4345661140", "Total base reads on target": "2602585080", "Uniformity of base coverage": "97.15%", "Uniformity of base coverage per target": "96.99%"}, "IonXpress_006": {"Alignments": "IonXpress_006_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65", "Average base coverage depth": "228.4", "Average base coverage depth per target": "207.3", "Bases in target regions": "10755169", "Number of mapped reads": "24654316", "Number of unmerged targets": "55083", "Percent assigned target reads": "78.73%", "Percent base reads on target": "58.13%", "Percent reads on target": "78.73%", "Read Filters": "Non-duplicate", "Reference Genome": "hg19", "Sample Name": "MEV44", "Target Regions": "LCHv2_IDP", "Target base coverage at 100x": "86.69%", "Target base coverage at 1x": "99.28%", "Target base coverage at 20x": "98.90%", "Target base coverage at 500x": "3.86%", "Target bases with no strand bias": "98.28%", "Targets with base coverage at 100x": "82.95%", "Targets with base coverage at 1x": "99.15%", "Targets with base coverage at 20x": "98.88%", "Targets with base

awk
Code:
awk -F"[]\":{}, ]*" '  # field separator to split and allow for whitespace in name
{for(i = 2; i < NF - 3; i++ # begin for loop and capture desired and read into $i
if($i == "path" &&  # capture path
$(i + 2) == "plugin" &&  # capture path
$(i + 3) == "/rundb/api/v1/plugin/49/") print $(i + 1)  # capture specific path
if ($i =="Aligned Reads") print $(i+1)  # capture Aligned Reads and print string after
if ($i =="Alignments") print $(i+1) RS  # capture Alignments and print string afte, and newline for each
if ($i =="Sample Name") print $(i+1) RS  # capture Sample Name and print string after, add newline for each
}  # end loop
' file > output

current output
Code:
/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52

desired output
Code:
/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52    -- path from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome  -- Aligned Reads from index.html
IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  --Aligments from index.html
MEV42  -- Sample Name from index.html
IonXpress_005_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  -- Alignments from index.html
MEV43  -- Sample Name from index.html
IonXpress_006_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65  -- Alignments from index.html
MEV44  -- Sample Name from index.html

@Don Cragun what does the line
{for(i = 2; i < NF - 3; i++?

Last edited by cmccabe; 09-22-2016 at 07:52 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To print a specific line in Shell or awk.

Hi, I want to echo the 15th line from a file named as abc.txt, also i want to echo only the values in that line not the line number. Thanks in advance:) (4 Replies)
Discussion started by: tushar_tus
4 Replies

2. Shell Programming and Scripting

Append specific lines to a previous line based on sequential search criteria

I'll try explain this as best I can. Let me know if it is not clear. I have large text files that contain data as such: 143593502 09-08-20 09:02:13 xxxxxxxxxxx xxxxxxxxxxx 09-08-20 09:02:11 N line 1 test line 2 test line 3 test 143593503 09-08-20 09:02:13... (3 Replies)
Discussion started by: jesse
3 Replies

3. Shell Programming and Scripting

Extract data based on specific search criteria

I have a huge file (about 2 millions records) contains data separated by “,” (comma). As part of the requirement, I can't change the format. The objective is to remove some of the records with the following condition. If the 23rd field on each line start with 302 , I need to remove that from the... (4 Replies)
Discussion started by: jaygamini
4 Replies

4. Shell Programming and Scripting

AWK Print Line If Specific Character Is Matched

Hello, I have a file as such: FFFFFFF6C000000 225280 225240 - - rwxs- FFFFFFFF79C00000 3240 3240 - - rwxs- FFFFFFFF7A000000 4096 4096 - - rwxs- FFFFFFFF7A400000 64 64 ... (3 Replies)
Discussion started by: PointyWombat
3 Replies

5. Shell Programming and Scripting

Passing parameter in sed or awk commands to print for the specific line in a file

Hi, I am trying to print a specific line in a file through sed or awk. The line number will be passed as a parameter from the previous step. My code looks as below. TEMP3=`sed -n '$TEMP2p' $FILEPATH/Log.txt` $TEMP2, I am getting from the previous step which is a numerical value(eg:3). ... (2 Replies)
Discussion started by: satyasrin82
2 Replies

6. Shell Programming and Scripting

Extract error records based on specific criteria from Unix file

Hi, I look for a awk one liner for below issue. input file ABC 1234 abc 12345 ABC 4567 678 XYZ xyz ght 678 ABC 787 yyuu ABC 789 7890 777 zxr hyip hyu mno uii 678 776 ABC ty7 888 All lines should be started with ABC as first field. If a record has another value for 1st... (7 Replies)
Discussion started by: ratheesh2011
7 Replies

7. Shell Programming and Scripting

Only print specific xml values that meet two criteria in python

I have a large XML file that I want to parse, and only print one specific value if two values are met. This is the code so far: #!/usr/local/bin/python import xml.etree.ElementTree as ET tree = ET.parse('onedb-dhcp.xml') root = tree.getroot() # This successfully gets all... (1 Reply)
Discussion started by: brianjb
1 Replies

8. Shell Programming and Scripting

Need a Linux command for find/replace column based on specific criteria.

I'm new to shell programming, I have a huge text file in the following format, where columns are separated by single space: ACA MEX 4O_ $98.00 $127.40 $166.60 0:00 0:00 0 ; ACA YUL TS_ $300.00 $390.00 $510.00 0:00 0:00 0 ; ACA YYZ TS_ $300.00 $390.00 $510.00 0:00 0:00 0 ; ADZ YUL TS_ $300.00... (3 Replies)
Discussion started by: transat
3 Replies

9. Shell Programming and Scripting

awk to print line based on two keywords

I am starting to write a multi-line awk and using the file below which is tab-delimited, print only the line with oncomineGeneClass and oncomineVariantClass and PASS. The script execute but seems to be printing the entire file, not the desired line. Thank you :). file ... (8 Replies)
Discussion started by: cmccabe
8 Replies

10. Shell Programming and Scripting

Awk/sed/cut to filter out records from a file based on criteria

I have two files and would need to filter out records based on certain criteria, these column are of variable lengths, but the lengths are uniform throughout all the records of the file. I have shown a sample of three records below. Line 1-9 is the item number "0227546_1" in the case of the first... (15 Replies)
Discussion started by: MIA651
15 Replies
All times are GMT -4. The time now is 10:18 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy