Attached is my complete input .html. Using a small section of it I am running the below awk to capture a specific path (which works great... thank you to all I am also trying to capture Aligned Reads, Alignments, and Sample Name as in the desired output.
I put the -- in each line of the desired output to show where the data is coming from but the -- does not exist in the input. I thought that the awk below would work and added comments of what I though was going to happen at each step, but after it runs only the path results in the output.
input
Code:
{"meta": {"limit": 20, "next": ...truncated...
awk
Code:
awk -F"[]\":{}, ]*" ' # field separator to split and allow for whitespace in name
{for(i = 2; i < NF - 3; i++ # begin for loop and capture desired and read into $i
if($i == "path" && # capture path
$(i + 2) == "plugin" && # capture path
$(i + 3) == "/rundb/api/v1/plugin/49/") print $(i + 1) # capture specific path
if ($i =="Aligned Reads") print $(i+1) # capture Aligned Reads and print string after
if ($i =="Alignments") print $(i+1) RS # capture Alignments and print string afte, and newline for each
if ($i =="Sample Name") print $(i+1) RS # capture Sample Name and print string after, add newline for each
} # end loop
' file > output
/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52 -- path from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome -- Aligned Reads from index.html
IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 --Aligments from index.html
MEV42 -- Sample Name from index.html
IonXpress_005_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 -- Alignments from index.html
MEV43 -- Sample Name from index.html
IonXpress_006_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 -- Alignments from index.html
MEV44 -- Sample Name from index.html
@Don Cragun what does the line {for(i = 2; i < NF - 3; i++?
I presume that the line {for(i = 2; i < NF - 3; i++ in your code gave you a syntax error since the closing parenthesis for the for loop is missing. In the code I suggested:
Code:
for(i = 2; i < NF - 3; i++)
runs a for loop with the loop variable (i) having a starting value of 2, an final value the number of fields on the line minus 3, and incrementing by 1 every time through the loop. Since the first character of your input record was a " and " is a field separator, we know that the 1st field on every record is empty (so there is no need to look at field #1 to see if it might be the string path. And, since we are looking at four fields in a row to determine whether or not to print the 2nd field in that set of four fields, there is no need to look at the last three fields on the line.
Note that a for loop only executes one following command (unless you use braces to group commands together in a block). So, if you had included the missing closing parenthesis at the end of the for statement, only the 1st if statement in your code would have been run inside the loop. the remaining if statements in your code would only be run after the for loop had completed.
Note that the standards only require awk to process text files which have lines no longer than LINE_MAX bytes (which is 2048 on most systems). In practice, I don' know of any awk that fails if input record are limited to LINE_MAX bytes even if the records are not <newline> character terminated. I used the opening brace character as the record separator in the following script since they seem to appear regularly and frequently, and do not appear in the middle of anything that you want to evaluate together to determine which fields to print. Therefore, the bracket expression in the field separator I used does not contain the opening brace character. And, with <space>* after the bracket expression, there is no need to also include <space> in the bracket expression. And, to get rid of the need for backslashes to escape double quotes, I used single quotes to delimit the -F option-argument.
Note that I used else if instead of just if in the search for the field values other than path. There is no need to check to see if a field has the value Sample Name, Alignments, or Aligned Reads once we have already determined that that field's value is path. And, it leaves us with a single compound statement in the for loop (so we still don't need braces to include multiple statements in the loop. But, I included braces around the command to be run by the for loop for clarity.
which, with the index.html file you attached, produces the output:
Code:
/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52 -- path from index.html
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome -- Aligned Reads from index.html
IonXpress_007_R_2016_09_20_12_47_36_user_S5-00580-7-Medexome_Auto_user_S5-00580-7-Medexome_68_tn -- Alignments from index.html
MEV37 -- Sample Name from index.html
IonXpress_008_R_2016_09_20_12_47_36_user_S5-00580-7-Medexome_Auto_user_S5-00580-7-Medexome_68_tn -- Alignments from index.html
MEV38 -- Sample Name from index.html
IonXpress_009_R_2016_09_20_12_47_36_user_S5-00580-7-Medexome_Auto_user_S5-00580-7-Medexome_68_tn -- Alignments from index.html
MEV39 -- Sample Name from index.html
R_2016_09_20_10_12_41_user_S5-00580-6-Medexome -- Aligned Reads from index.html
IonXpress_004_R_2016_09_20_10_12_41_user_S5-00580-6-Medexome_Auto_user_S5-00580-6-Medexome_67_tn -- Alignments from index.html
MEV34 -- Sample Name from index.html
IonXpress_005_R_2016_09_20_10_12_41_user_S5-00580-6-Medexome_Auto_user_S5-00580-6-Medexome_67_tn -- Alignments from index.html
MEV35 -- Sample Name from index.html
IonXpress_006_R_2016_09_20_10_12_41_user_S5-00580-6-Medexome_Auto_user_S5-00580-6-Medexome_67_tn -- Alignments from index.html
MEV36 -- Sample Name from index.html
R_2016_09_01_13_20_02_user_S5-00580-5-Medexome -- Aligned Reads from index.html
IonXpress_007_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66 -- Alignments from index.html
MEV45 -- Sample Name from index.html
IonXpress_008_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66 -- Alignments from index.html
MEV46 -- Sample Name from index.html
IonXpress_009_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66 -- Alignments from index.html
MEV47 -- Sample Name from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome -- Aligned Reads from index.html
IonXpress_004_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 -- Alignments from index.html
MEV42 -- Sample Name from index.html
IonXpress_005_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 -- Alignments from index.html
MEV43 -- Sample Name from index.html
IonXpress_006_R_2016_09_01_10_24_52_user_S5-00580-4-Medexome_Auto_user_S5-00580-4-Medexome_65 -- Alignments from index.html
MEV44 -- Sample Name from index.html
R_2016_09_01_13_20_02_user_S5-00580-5-Medexome -- Aligned Reads from index.html
IonXpress_007_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66_tn -- Alignments from index.html
MEV45 -- Sample Name from index.html
IonXpress_008_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66_tn -- Alignments from index.html
MEV46 -- Sample Name from index.html
IonXpress_009_R_2016_09_01_13_20_02_user_S5-00580-5-Medexome_Auto_user_S5-00580-5-Medexome_66_tn -- Alignments from index.html
MEV47 -- Sample Name from index.html
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome -- Aligned Reads from index.html
from the single line of input in that file.
Does this come close to what you wanted?
This User Gave Thanks to Don Cragun For This Post:
Hi,
I want to echo the 15th line from a file named as abc.txt, also i want to echo only the values in that line not the line number.
Thanks in advance:) (4 Replies)
I'll try explain this as best I can. Let me know if it is not clear.
I have large text files that contain data as such:
143593502 09-08-20 09:02:13 xxxxxxxxxxx xxxxxxxxxxx 09-08-20 09:02:11 N line 1 test
line 2 test
line 3 test
143593503 09-08-20 09:02:13... (3 Replies)
I have a huge file (about 2 millions records) contains data separated by “,” (comma). As part of the requirement, I can't change the format. The objective is to remove some of the records with the following condition. If the 23rd field on each line start with 302 , I need to remove that from the... (4 Replies)
Hi,
I am trying to print a specific line in a file through sed or awk. The line number will be passed as a parameter from the previous step. My code looks as below.
TEMP3=`sed -n '$TEMP2p' $FILEPATH/Log.txt`
$TEMP2, I am getting from the previous step which is a numerical value(eg:3).
... (2 Replies)
Hi,
I look for a awk one liner for below issue.
input file
ABC 1234 abc 12345
ABC 4567 678 XYZ
xyz ght 678
ABC 787 yyuu
ABC 789 7890 777
zxr hyip hyu
mno uii 678 776
ABC ty7 888
All lines should be started with ABC as first field. If a record has another value for 1st... (7 Replies)
I have a large XML file that I want to parse, and only print one specific value if two values are met.
This is the code so far:
#!/usr/local/bin/python
import xml.etree.ElementTree as ET
tree = ET.parse('onedb-dhcp.xml')
root = tree.getroot()
# This successfully gets all... (1 Reply)
I'm new to shell programming, I have a huge text file in the following format, where columns are separated by single space:
ACA MEX 4O_ $98.00 $127.40 $166.60 0:00 0:00 0 ;
ACA YUL TS_ $300.00 $390.00 $510.00 0:00 0:00 0 ;
ACA YYZ TS_ $300.00 $390.00 $510.00 0:00 0:00 0 ;
ADZ YUL TS_ $300.00... (3 Replies)
I am starting to write a multi-line awk and using the file below which is
tab-delimited, print only the line with oncomineGeneClass
and oncomineVariantClass and PASS. The script execute but
seems to be printing the entire file, not the desired line. Thank you :).
file
... (8 Replies)
I have two files and would need to filter out records based on certain criteria, these column are of variable lengths, but the lengths are uniform throughout all the records of the file. I have shown a sample of three records below. Line 1-9 is the item number "0227546_1" in the case of the first... (15 Replies)