Sponsored Content
Top Forums Shell Programming and Scripting awk to match and apply condtions to matchijng files in directories Post 302983716 by cmccabe on Saturday 15th of October 2016 01:00:41 PM
Old 10-15-2016
awk to match and apply condtions to matchijng files in directories

I am trying to merge the below awk, which compares two files looking for a match in $2 and then prints the line if two conditions are meet.

awk
Code:
 awk 'FNR==NR{A[$2]=$0;next} ($2 in A){if($10>30 && $11>49){print A[$2]}}' F113.txt F113_tvc.bed

This code was improved and provided by @RavinderSingh13, thank you very much. I have ~500 files to process so I wanted to use all .txt files in /home/cmccabe/Desktop/comparison/missing and compare them to each matching numerical prefix in /home/cmccabe/Desktop/comparison/test_tvc all ending in .bed. Each filename in a directory will have a common numerical prefix:

So if there are three files, the three .txt files in home/cmccabe/Desktop/comparison/missing will look like:
Code:
F113.txt
H123.txt
S111.txt

and the three .bed files in /home/cmccabe/Desktop/comparison/test_tvc will look like:
Code:
F113_tvc.bed
H123_tvc.bed
S111_tvc.bed

So F113.txt would be compared to F113_tvc.bed, the matching numerical prefix is F113.

If a match between the $2 values in eaach file is made and both conditions if($10>30 && $11>49 are meet, then the matching line from the .txt file is printed in the out under Match in both files and meet criteria. If no match is found or the criterias is not meet then the line in the .txt is printed in the out under Missing in comparison:.

The below code provided by @Don Cragun works great but since my data has changed a bit I made some updates to it:

Code:
 (code that works perfect)
IAm=${0##*/}

InDir1='/home/cmccabe/Desktop/comparison/reference/10bp'
InDir2='/home/cmccabe/Desktop/comparison/validation/files'
OutDir='/home/cmccabe/Desktop/comparison/ref_val'

cd "$InDir1"
for file1 in *.txt
do    # Grab file prefix.
    p=${file1%%_*}

    # Find matching file2.
    file2=$(printf '%s' "$InDir2/$p"_*.vcf)
    if [ ! -f "$file2" ]
    then    printf '%s: No single file matching %s found.\n' "$IAm" \
            "$file1" >&2
        continue
    fi

    # Create matching output filename.
    out=${file2##*/}
    out=${out%.vcf}_comparison.txt

    printf '%s\t%s\t%s\n' "$InDir1/$file1" "$file2" "$OutDir/$out"
done | awk '
BEGIN {    FS = OFS = "\t"
}
{    in1 = $1
    in2 = $2
    out = $3
    print "Reading from " in1
    while((getline < in1) == 1)
        f1[$2 OFS $4 OFS $5]
    close(in1)
    print "Reading from " in2
    while((getline < in2) == 1)
        f2[$2 OFS $4 OFS $5]
    close(in2)
    print "Writing to " out
    print "Match:" > out
    for(k in f1)
        if(k in f2) {
            print k > out
            delete f1[k]
            delete f2[k]
        }
    print "Missing in Reference but found in IDP:" > out
    for(k in f2) {
        print k > out
        delete f2[k]
    }
    print "Missing in IDP but found in Reference:" > out
    for(k in f1) {
        print k > out
        delete f1[k]
    }
    close(out)
    print "***"
}'

updated version which does not run with comments marked by --

Code:
IAm=${0##*/}

InDir1='/home/cmccabe/Desktop/comparison/missing'   -- updated path to .txt files
InDir2='/home/cmccabe/Desktop/comparison/test_tvc'  -- updated path to .bed files
OutDir='/home/cmccabe/Desktop/comparison/final'  -- updated path to output

cd "$InDir1"
for file1 in *.txt
do    # Grab file prefix.
    p=${file1%%_*}

    # Find matching file2.
    file2=$(printf '%s' "$InDir2/$p"_*.bed)  -- updated extension
    if [ ! -f "$file2" ]
    then    printf '%s: No single file matching %s found.\n' "$IAm" \
            "$file1" >&2
        continue
    fi

    # Create matching output filename.
    out=${file2##*/}
    out=${out%.vcf}_final.txt  -- updated output

    printf '%s\t%s\t%s\n' "$InDir1/$file1" "$file2" "$OutDir/$out"
done | awk '
BEGIN {    FS = OFS = "\t"
}
{  in1 = $1
    in2 = $2
    out = $3
    print "Reading from " in1
    while((getline < in1) == 1)
        f1[$2]  -- updated to look for each $2 in the .txt file
    close(in1)
    print "Reading from " in2
    while((getline < in2) == 1)
        f2[$2] -- updated to look for each $2  from the .txt file in the matching .bed file
    close(in2)
    print "Writing to " out
    print "Match in both files and meet criteria:" > out
    for(k in f1)
        if(k in f2) {
            print k > out
            delete f1[k]
            delete f2[k]
        }
    print "Missing in comparison:" > out
    for(k in f2) {
        print k > out
        delete f2[k]
    }
    close(out)
    print "***"
}'

I am not sure how to perform the two if statements on the matching $2 values. Below are two sample input files as well as the desired output.

file1 (F113.txt)
Code:
Missing in IDP but found in Reference:
2   166848646   G   A   exonic  SCN1A   68  13  16;20   0;0 17;15   0;0 0;0 0;0     c.[5139C>T]+[=] 52.94
2   166245888   G   A   exonic  SCN1A   68  13  16;20   0;0 17;15   0;0 0;0 0;0     c.[5500G>T]+[=] 32

file2 (F113.bed)
Code:
Chrom    Position    Gene Sym    Target ID    Type    Zygosity    Genotype    Ref    Variant    Var Freq    Qual    Coverage    Ref Cov    Var Cov
chr2    166245425   SCN2A   AMPL5155065355  SNP Het C/T C   T   54  100   50    23  27
chr2    166848646   SCN1A   AMPL1543060606  SNP Het        G/A   G  A   52.9411764706   100 68  32  36

desired output
Code:
Match in both files and meet criteria:
2   166848646   G   A   exonic  SCN1A   68  13  16;20   0;0 17;15   0;0 0;0 0;0     c.[5139C>T]+[=] 52.94
Missing in comparison:
2   166245888   G   A   exonic  SCN1A   68  13  16;20   0;0 17;15   0;0 0;0 0;0     c.[5500G>T]+[=] 32

I hope I have included enough information and thank you Smilie.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK Script - Count Files In Directories

Hey, I'm very new to AWK and am trying to write a script that counts the number of files in all subdirectories. So, basically, my root has many subdirectories, and each subdirectory has many files. How can I get the total count? I haven't been able to figure out how to loop through the... (1 Reply)
Discussion started by: beefeater267
1 Replies

2. Shell Programming and Scripting

Apply 'awk' to all files in a directory or individual files from a command line

Hi All, I am using the awk command to replace ',' by '\t' (tabs) in a csv file. I would like to apply this to all .csv files in a directory and create .txt files with the tabs. How would I do this in a script? I have the following script called "csvtabs": awk 'BEGIN { FS... (4 Replies)
Discussion started by: ScKaSx
4 Replies

3. UNIX for Dummies Questions & Answers

Do UNIX Permission apply to sub directories?

Hi Guys, Can you tell me if unix permissions apply to sub dirs? Dir is /home/ops/batch/files/all /home is rwxrwxrwx ops is rwxrwxrwx batch is rwxr-wr-w files is rwxrwxrwx all is rwxrwxrwx Having problems writing to all (does the userid nee to be the batch owner... (1 Reply)
Discussion started by: Grueben
1 Replies

4. Shell Programming and Scripting

apply record separator to multiple files within a directory using awk

Hi, I have a bunch of records within a directory where each one has this form: (example file1) 1 2 50 90 80 90 43512 98 0909 79869 -9 7878 33222 8787 9090 89898 7878 8989 7878 6767 89 89 78676 9898 000 7878 5656 5454 5454 and i want for all of these files to be... (3 Replies)
Discussion started by: amarn
3 Replies

5. Shell Programming and Scripting

Finding the directories with same permission and then apply some default UNIX commands

HI there. My teacher asked us to write a code for this question Write a Unix shell script named 'mode' that accepts two or more arguments, a file mode, a command and an optional list of parameters and performs the given command with the optional parameters on all files with that given mode. ... (1 Reply)
Discussion started by: femchi
1 Replies

6. Homework & Coursework Questions

Finding the directories with same permission and then apply some default UNIX commands

Write a Unix shell script named 'mode' that accepts two or more arguments, a file mode, a command and an optional list of parameters and performs the given command with the optional parameters on all files with that given mode. For example, mode 644 ls -l should perform the command ls -l on all... (5 Replies)
Discussion started by: femchi
5 Replies

7. Shell Programming and Scripting

awk - Compare files in two different directories

Hi, My script works fine when I have both input files in the same directory but when I put on of the input file in another directory, the output does not show up. SCRIPT: awk ' BEGIN { OFS="\t" out = "File3.txt"} NR==FNR && NF {a=$0; next} function print_77_99() { if... (3 Replies)
Discussion started by: High-T
3 Replies

8. Shell Programming and Scripting

sed - pattern match - apply substitution

Greetings Experts, I am on AIX and in process of creating a re-startable script that connects to Oracle and executes the statements. The sample contents of the file1 is CREATE OR REPLACE VIEW DB_V.TAB1 AS SELECT * FROM DB_T.TAB1; .... CREATE OR REPLACE VIEW DB_V.TAB10 AS SELECT * FROM... (9 Replies)
Discussion started by: chill3chee
9 Replies

9. Shell Programming and Scripting

awk to match field between two files and use conditions on match

I am trying to look for $2 of file1 (skipping the header) in $2 of file2 (skipping the header) and if they match and the value in $10 is > 30 and $11 is > 49, then print the line from file1 to a output file. If no match is foung the line is not printed. Both the input and output are tab-delimited.... (3 Replies)
Discussion started by: cmccabe
3 Replies

10. Shell Programming and Scripting

awk move select fields to match file prefix in two directories

In the awk below I am trying to use the file1 as a match to file2. In file2 the contents of $5,&6,and $7 (always tab-delimited) and are copied to the output under the header Quality metrics. The below executes but the output is empty. I have added comments to help and show my thinking. Thank you... (0 Replies)
Discussion started by: cmccabe
0 Replies
All times are GMT -4. The time now is 04:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy