Sponsored Content
Top Forums Shell Programming and Scripting awk to compare each file in two directores by storing in variable Post 302982654 by cmccabe on Saturday 1st of October 2016 09:44:24 AM
Old 10-01-2016
@RudiC and @RavinderSingh13, thank you both for all of your help.

it looks like the script reads all the vcf files from REF and puts them in a variable FN. How do the txt files from VAL get used by the awk. The awk looks at each REF file and compares it to each VAL file looking for what's common and what's different. If a difference is found it identifies which file the missing data came from. The awk portion works on individual files, but I have over 500to compare so a loop would help, however that is what I need help with Smilie.

REF there are 250 files all located at /home/cmccabe/Desktop/comparison/reference/10bp

Code:
F13_ref_FP_10bp.txt
H19_ref_FP_10bp.txt

Data structure in REF
Code:
Chr    Start    End    Ref    Alt    Func.refGene    Gene.refGene    Coverage    Score    A(#F,#R)    C(#F,#R)    G(#F,#R)    T(#F,#R)    Ins(#F,#R)    Del(#F,#R)    SNP    Mutation    Frequency    Sanger
12    52200340    52200340    A    C    exonic    SCN8A    4129    28.3    1560;1672    413;453    0;0    0;0    0;2    31;0        c.[5070A>C]+[=]    20.97    
2    51254914    51254914    C    T    exonic    NRXN1    1562    25.5    0;0    536;218    0;0    574;234    0;0    0;0        c.[498G>A]+[=]    51.73    
X    67433722    67433722    C    T    exonic    OPHN1    2747    25.6    0;0    46;37    0;0    1211;1443    1;8    5;5        c.[579G>A]+[579G>A]    96.61

VAL there are 250 files all located at /home/cmccabe/Desktop/comparison/validation/files

Code:
F13_epilepsy.vcf
H19_marfan.vcf

Data structure in VAL
Code:
Chr    Start    End    Ref    Alt    Func.refGene    Gene.refGene    GeneDetail.refGene    ExonicFunc.refGene    AAChange.refGene    avsnp147    PopFreqMax    1000G_ALL    1000G_AFR    1000G_AMR    1000G_EAS    1000G_EUR    1000G_SAS    ExAC_ALL    ExAC_AFR    ExAC_AMR    ExAC_EAS    ExAC_FIN    ExAC_NFE    ExAC_OTH    ExAC_SAS    ESP6500siv2_ALL    ESP6500siv2_AA    ESP6500siv2_EA    CG46    dpsi_max_tissue    dpsi_zscore    SIFT_score    SIFT_pred    Polyphen2_HDIV_score    Polyphen2_HDIV_pred    Polyphen2_HVAR_score    Polyphen2_HVAR_pred    LRT_score    LRT_pred    MutationTaster_score    MutationTaster_pred    MutationAssessor_score    MutationAssessor_pred    CLINSIG    CLNDBN    CLNACC    CLNDSDB    CLNDSDBID    Quality    Reads    Zygosity    Phred    Classification    HGMD    Sanger
chr1    43395635    43395635    C    T    exonic    SLC2A1    .    synonymous SNV    SLC2A1:NM_006516:exon5:c.588G>A:p.P196P    rs2229682    0.23    0.12    0.024    0.21    0.08    0.19    0.15    0.18    0.044    0.19    0.074    0.23    0.21    0.19    0.19    0.15    0.049    0.2    0.12    -0.1558    -0.594    .    .    .    .    .    .    .    .    .    .    .    .    Benign    not_specified    RCV000081436.5    MedGen    CN169374    GOOD    399    het    19
chr1    43396414    43396414    G    A    exonic    SLC2A1    .    synonymous SNV    SLC2A1:NM_006516:exon4:c.399C>T:p.C133C    rs11537641    0.24    0.14    0.08    0.21    0.1    0.19    0.16    0.19    0.094    0.2    0.098    0.24    0.21    0.2    0.2    0.16    0.096    0.2    0.14    -0.0227    -0.121    .    .    .    .    .    .    .    .    .    .    .    .    Benign    not_specified    RCV000081433.6    MedGen    CN169374    GOOD    400    het    21
chr1    172410967    172410967    G    A    exonic    PIGC    .    nonsynonymous SNV    PIGC:NM_002642:exon2:c.796C>T:p.P266S,PIGC:NM_153747:exon2:c.796C>T:p.P266S    rs1063412    0.66    0.45    0.06    0.54    0.66    0.6    0.57    0.55    0.14    0.64    0.64    0.59    0.58    0.57    0.57    0.42    0.15    0.56    0.41    .    .    0.13    T    1.0    D    1.0    D    0.000    D    0.000    P    1.515    L    .    .    .    .    .    GOOD    399    het    19

desired output (example not using these files that compares a REF file to a VAL file and finds what's in common, what's different, and where the difference comes from, it includes some additional data as well from another script)

Code:
Match:
Chr    Start    Ref    Alt    Func.refGene    Gene.refGene    Quality    Reads    Zygosity    Phred
chr15    68521889    C    T    exonic    CLN6    GOOD    50    het    4
chr7    147183143    A    G    intronic    CNTNAP2    GOOD    382    het    22
chr2    167099158    A    G    exonic    SCN9A    GOOD    210    hom    55
Missing in Reference but found in IDP:
Chr    Start    Ref    Alt    Func.refGene    Gene.refGene    Quality    Reads    Zygosity    Phred
chr2    51666313    T    C    intergenic    NRXN1,NONE    GOOD    108    het    7
chr2    166903445    T    C    exonic    SCN1A    GOOD    400    het    28
Missing in IDP but found in Reference:
Chr    Start    Ref    Alt    Func.refGene    Gene.refGene    Mutation Call    Coverage    Score    Mutant Allele Frequency    A(#F,#R)    C(#F,#R)    G(#F,#R)    T(#F,#R)    ins(#F,#R)    del(#F,#R)    SNP db_ref    Region    
2    166210776    C    T    exonic    SCN2A    c.[2994C>T]+[=]    3095    23.1    24.56    0:0    1158:1177    0;0    457;303    1;0    0;0        No low coverage
7    148106478    -    GT    intronic    CNTNAP2    c.3716-5_3716-4insGT    4168    28.6    51.01    0;0    0;1    0;0    2199;1967    1129;997    0;1    rs60451214    No low

I hope this helps and apologize for the long post but think these are all the details. Thank you Smilie.

Last edited by cmccabe; 10-01-2016 at 10:49 AM.. Reason: added details
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

storing output of awk in variable

HI I am trying to store the output of this awk command awk -F, {(if NR==2) print $1} test.sr in a variable when I am trying v= awk -F, {(if NR==2) print $1} test.sr $v = awk -F, {(if NR==2) print $1} test.sr but its not working out . Any suggestions Thanks Arif (3 Replies)
Discussion started by: mab_arif16
3 Replies

2. UNIX Desktop Questions & Answers

problem while storing the output of awk to variable

Hi, i have some files in one directory(say some sample dir) whose names will be like the following. some_file1.txt some_file2.txt. i need to get the last modified file size based on file name pattern like some_ here i am able to get the value of the last modified file size using the... (5 Replies)
Discussion started by: eswarreddya
5 Replies

3. Shell Programming and Scripting

Storing the contents of a file in a variable

There is a file named file.txt whose contents are: +-----------------------------------+-----------+ | Variable_name | Value | +-----------------------------------+-----------+ | Aborted_clients | 0 | | Aborted_connects | 25683... (6 Replies)
Discussion started by: proactiveaditya
6 Replies

4. Shell Programming and Scripting

Reading from a file and storing it in a variable

Hi folks, I'm using bash and would like to do the following. I would like to read some values from the file and store it in the variable and use it. My file is 1.txt and its contents are VERSION=5.6 UPDATE=4 I would like to read "5.6" and "4" and store it in a variable in shell... (6 Replies)
Discussion started by: scriptfriend
6 Replies

5. Shell Programming and Scripting

Storing lines of a file in a variable

i want to store the output of 'tail -5000 file' to a variable. If i want to access the contents of that variable, it becomes kinda difficult because when the data is stored in the variable, everything is mushed together. you dont know where a line begins or ends. so my question is, how can i... (3 Replies)
Discussion started by: SkySmart
3 Replies

6. Shell Programming and Scripting

storing a value from another file as a variable[solved]

Hi all, im having snags creating a variable which uses commands like cut and grep. In the instance below im simply trying to take a value from another file and assign it to a variable. When i do this it only prints the $a rather than the actual value. I know its simple but does anyone have any... (1 Reply)
Discussion started by: somersetdan
1 Replies

7. Shell Programming and Scripting

Storing multiple file paths in a variable

I am working on a script for Mac OS X that, among many other things, gets a list of all the installed Applications. I am pulling the list from the system_profiler command and formatting it using grep and awk. The problem is that I want to be able to use each result individually later in the script.... (3 Replies)
Discussion started by: cranfordio
3 Replies

8. Shell Programming and Scripting

Storing command output in a variable and using cut/awk

Hi, My aim is to get the md5 hash of a file and store it in a variable. var1="md5sum file1" $var1 The above outputs fine but also contains the filename, so somthing like this 243ASsf25 file1 i just need to get the first part and put it into a variable. var1="md5sum file1"... (5 Replies)
Discussion started by: JustALol
5 Replies

9. Shell Programming and Scripting

Storing awk command in a variable

I'm working on a script in which gives certain details in its output depending on user-specified options. So, what I'd like to do is something like: if then awkcmd='some_awk_command' else awkcmd='some_other_awk_command' fi Then, later in the script, we'd do something like: ... (5 Replies)
Discussion started by: treesloth
5 Replies

10. UNIX for Beginners Questions & Answers

Storing file contents to a variable

Hi All, I was trying a shell script. I was unable to store file contents to a variable in the script. I have tried the below but unable to do it. Input = `cat /path/op.diary` Input = $(<op.diary) I am using ksh shell. I want to store the 'op.diary' file contents to the variable 'Input'... (12 Replies)
Discussion started by: am24
12 Replies
All times are GMT -4. The time now is 05:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy