awk output different between two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk output different between two files
# 1  
Old 09-21-2016
awk output different between two files

The awk below when run using the contents of file, works great with the desired output of
Code:
expName
barcodeSampleInfo barcodedSamples

.
However, when the complete file is used (attached) I get different output. It looks like the same data is there but the ordering is off. Both data sets are html and I am not sure why the difference. Thank you Smilie.

file
Code:
{"barcodeId": "IonXpress", "barcodedSamples": {"MEV34": {"barcodeSampleInfo": {"IonXpress_004": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_004"]}, "MEV35": {"barcodeSampleInfo": {"IonXpress_005": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_005"]}, "MEV36": {"barcodeSampleInfo": {"IonXpress_006": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_006"]}}, "chipDescription": "540", "chipInstrumentType": "S5", "chipType": "540", "date": "2016-09-20T15:14:38+00:00", "expName": "R_2016_09_20_10_12_41_user_S5-00580-6-Medexome", "flows": 500
{"barcodeId": "IonXpress", "barcodedSamples": {"MEV45": {"barcodeSampleInfo": {"IonXpress_007": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_007"]}, "MEV46": {"barcodeSampleInfo": {"IonXpress_008": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_008"]}, "MEV47": {"barcodeSampleInfo": {"IonXpress_009": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_009"]}}, "chipDescription": "540", "chipInstrumentType": "S5", "chipType": "540", "date": "2016-09-01T18:22:00+00:00", "expName": "R_2016_09_01_13_20_02_user_S5-00580-5-Medexome", "flows": 500,
{"meta": {"limit": 20, "next": null, "offset": 0, "previous": null, "total_count": 8}, "objects": [{"barcodeId": "IonXpress", "barcodedSamples": {"MEV37": {"barcodeSampleInfo": {"IonXpress_007": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_007"]}, "MEV38": {"barcodeSampleInfo": {"IonXpress_008": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_008"]}, "MEV39": {"barcodeSampleInfo": {"IonXpress_009": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_009"]}}, "chipDescription": "540", "chipInstrumentType": "S5", "chipType": "540", "date": "2016-09-20T17:49:30+00:00", "expName": "R_2016_09_20_12_47_36_user_S5-00580-7-Medexome", "flows": 500

output from file (desired)
Code:
R_2016_09_20_10_12_41_user_S5-00580-6-Medexome
IonXpress_004 MEV34
IonXpress_005 MEV35
IonXpress_006 MEV36
R_2016_09_01_13_20_02_user_S5-00580-5-Medexome
IonXpress_007 MEV45
IonXpress_008 MEV46
IonXpress_009 MEV47
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome
IonXpress_007 MEV37
IonXpress_008 MEV38
IonXpress_009 MEV39


awk
Code:
awk -F"[]\":{}, ]*" '
BEGIN   {for (n=split ("expName", T); n>0; n--) SRCH[T[n]] = n
        }
        {for (i=1; i<NF; i++) if ($i in SRCH) print $(i+1)
        }
        {for (i=1; i<NF; i++) if ($i =="barcodeSampleInfo") print $(i+1)" " $(i-1)
        }
' index.html > out

output using the complete file (attached)
Code:
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome
R_2016_09_20_10_12_41_user_S5-00580-6-Medexome
R_2016_09_01_13_20_02_user_S5-00580-5-Medexome
R_2016_09_01_10_24_52_user_S5-00580-4-Medexome
R_2016_08_03_10_42_57_user_S5-00580-2-Medical_Exome
R_2016_08_03_14_04_54_user_S5-00580-3-Medical_Exome
R_2016_07_23_08_40_18_user_S5-00580-1-IQOQ_RUN_Sample_2
R_2016_07_22_17_09_29_user_S5-00580-0-Test_Fragment_Run
IonXpress_007 MEV37
IonXpress_008 MEV38
IonXpress_009 MEV39
IonXpress_004 MEV34
IonXpress_005 MEV35
IonXpress_006 MEV36
IonXpress_007 MEV45
IonXpress_008 MEV46
IonXpress_009 MEV47
IonXpress_004 MEV42
IonXpress_005 MEV43
IonXpress_006 MEV44
IonXpress_001 MEC1
IonXpress_002 MEV40
IonXpress_003 MEV41
IonXpress_001 MEC1
IonXpress_002 MEV40
IonXpress_003 MEV41

# 2  
Old 09-21-2016
Hello cmccabe,

Could you please try following and let me know if this helps you.
Code:
awk 'function remov(a){gsub(/[\{\":,]/,X,a);return a} {if($0 ~ /expName/){getline;W=remov($0);if(Q){print W ORS Q;Q=W=""};};if($0 ~ /MEV/){E=remov($0);getline;getline;Q=Q?Q ORS remov($0) OFS E:remov($0) OFS E;}}' RS=" "   Input_file

Output will be as follows.
Code:
R_2016_09_20_10_12_41_user_S5-00580-6-Medexome
IonXpress_004 MEV34
IonXpress_005 MEV35
IonXpress_006 MEV36
R_2016_09_01_13_20_02_user_S5-00580-5-Medexome
IonXpress_007 MEV45
IonXpress_008 MEV46
IonXpress_009 MEV47
R_2016_09_20_12_47_36_user_S5-00580-7-Medexome
IonXpress_007 MEV37
IonXpress_008 MEV38
IonXpress_009 MEV39

EDIT: Adding a non-one liner form of solution too now.
Code:
awk 'function remov(a){
                        gsub(/[\{\":,]/,X,a);
                        return a
                      }
                      {
                        if($0 ~ /expName/){
                                                getline;
                                                W=remov($0);
                                                if(Q){
                                                        print W ORS Q;
                                                        Q=W=""
                                                     };
                                          };
                        if($0 ~ /MEV/)    {
                                                E=remov($0);
                                                getline;
                                                getline;
                                                Q=Q?Q ORS remov($0) OFS E:remov($0) OFS E;
                                          }
                      }
    ' RS=" "    Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 09-21-2016 at 02:52 PM.. Reason: Adding a non-one liner form of solution too now.
# 3  
Old 09-21-2016
Your file is ONE single line. So the first for loop is executed - and printed - first, then the second. If the output produced is not the desired one, you need to reconsider the script.

Try
Code:
awk -F"[]\":{}, ]*" '
        {for (i=1; i<NF; i++)   {if ($i =="expName") print $(i+1)
                                 if ($i =="barcodeSampleInfo") print $(i+1)" " $(i-1)
                                }
        }
' /tmp/index.html


Last edited by RudiC; 09-21-2016 at 01:39 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk to output matches between two files to one file and mismatches to two others

I am trying to output the matches between $1 of file1 to $3 of file2 into a new file match. I am also wanting to output the mismatches between those same 2 files and fields to two separate new files called missing from file1 and missing from file2. The input files are tab-delimited, but the... (9 Replies)
Discussion started by: cmccabe
9 Replies

2. Shell Programming and Scripting

Remove lines from output in files using awk

I have two large files (~250GB) that I am trying to remove the where GT: 0/0 or 1/1 or 2/2 for both files. I was going to use a bash with the below awk, which I think will find each line but how do I remove that line is that condition is found? Thank you :). Input 20 60055 . A ... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

Using awk and output to files

I currently have this code: awk ' BEGIN { FS = OFS = "|"} { l=length($1) $1 = sprintf("%s-%s-%s %s:%s:%s", substr($1, l - 13, 4), substr($1, l - 9, 2), substr($1, l - 7, 2), substr($1, l - 5, 2), substr($1, l - 3, 2), substr($1, l - 1)) print }' infile exit Lets say I also wanted to modify $11... (4 Replies)
Discussion started by: LDHB2012
4 Replies

4. Shell Programming and Scripting

awk: too many output files created from while loop

I am using awk to read lines from a CSV file then put data into other files. These other files are named using the value of a certain column. Column 7 is a name such as "att" or "charter" . I want to end up with file names with the value of column 7 appended to them, like this: ... (5 Replies)
Discussion started by: dodgerfan78
5 Replies

5. Shell Programming and Scripting

Compare two files and output difference, by first field using awk.

It seems like a common task, but I haven't been able to find the solution. vitallog.txt 1310,John,Hancock 13211,Steven,Mills 122,Jane,Doe 138,Thoms,Doe 1500,Micheal,May vitalinfo.txt 12122,Jane,Thomas 122,Janes,Does 123,Paul,Kite **OUTPUT** vitalfiltered.txt 12122,Jane,Thomas... (2 Replies)
Discussion started by: charles33
2 Replies

6. UNIX for Dummies Questions & Answers

df -> output files; comparison using awk or...

:wall: I am trying to do the following using awk (is that the best way?): Read 2 files created from the output of df (say, on different days) and compare the entries using the 1st (FileSys) and 6th (Mount) fields to see if the size has changed. Output (at least), to a new file (some header... (2 Replies)
Discussion started by: renata
2 Replies

7. Shell Programming and Scripting

output 2 files using awk

Hi guys, Basically, I have an END {print NR} statement in my awk script to count the number of records as I have concatenated multiple files. But instead of generating this total number of records to the same output data file. I want to put this in a separate control file using the same awk... (6 Replies)
Discussion started by: Det7
6 Replies

8. Shell Programming and Scripting

AWK Compare files, different fields, output

Hi All, Looking for a quick AWK script to output some differences between two files. FILE1 device1 1.1.1.1 PINGS device1 2.2.2.2 PINGS FILE2 2862 SITE1 device1-prod 1.1.1.1 icmp - 0 ... (4 Replies)
Discussion started by: stacky69
4 Replies

9. Shell Programming and Scripting

Redirecting to different output files with awk.

Well, it didn't take me long to get stumped again. I assure you that I'm not mentally deficient, just new to scripting. So, here's the gist. I want to redirect output from awk based off of which branch of an if-else statement under which it falls. #!/bin/bash #some variables... (2 Replies)
Discussion started by: mikesimone
2 Replies

10. Shell Programming and Scripting

Writing output into different files while processing file using AWK

Hi, I am trying to do the following using AWK program. 1. Read the input data file 2. Parse the record and see if it contains errors 3. If the record contains errors, then write it into Reject file, else, write into usual output file or display it on the screen Here is what I have done -... (6 Replies)
Discussion started by: vidyak
6 Replies
Login or Register to Ask a Question