Sponsored Content
Top Forums Shell Programming and Scripting Bash lookup matching digits for secong file Post 302978133 by cmccabe on Tuesday 26th of July 2016 01:20:20 PM
Old 07-26-2016
The below bash (though not optimized), yields the desired result for one entry. That is depending on the digits in the file manually selected in the first process, the second file used is automatically selected using the matching digits along with the full path. The problem is this seems to work for the first file but not for others. Thank you Smilie.

file manually selected: 123_base_counts.txt
Code:
123_base_counts.txt
456_base_counts.txt

file selected automatically using the matching digits in (/home/user)
Code:
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt

bash
Code:
# manual selection of file
FILESDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools
ANNOVARDIR=/home/user

PS3="please select a file to analyze with a panel: " # specify file1
select file1 in $(cd ${FILESDIR};ls);do break;done
          file1=`basename ${FILESDIR}/${file1}`
          printf "FILE is: ${file1} and will be used

# automatic file based on match
FILESDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools # match directory
ANNOVARDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/vcf/overall/annovar # search directory
printf "\n\n"
printf "These are all vcf files in the directory: \n"
ls ${ANNOVARDIR}
file1=`basename ${FILESSDIR}/${file1}`  # file matched
file2=(${ANNOVARDIR}/${file1%%_*}*)
     printf "file2 is: ${file2} and will be used

output
Code:
1) 123_base_counts.txt 
2) 456_base_counts.txt 

please select a file to analyze with a panel: 1
FILE is: 123_base_counts.txt and will be used to filter reads, identify target bases and genes less than 20 and 30 reads, create a low coverage bed for visualization, calculate 20x and 30x coverage, and filter the vcf for the 98 gene epilepsy panel

These are all files in the new directory: 
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
file2 is: /home/cmccabe/Desktop/NGS/API/5-14-2016/vcf/overall/annovar/123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt and will be used 

second time results
1) 123_base_counts.txt  
2) 456_base_counts.txt


please select a file to analyze with a panel: 2
FILE is: 456_base_counts.txt and will be used

These are all files in the new directory: 
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
file2 is: /home/user/123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt and will be used

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Lookup the matching string

Dear all, I have two files like below. file1 ====== x y ==== === 123 test1 124 test2 125 test3 file2 ======= a b c === === ==== 123 ... (2 Replies)
Discussion started by: Nayanajith
2 Replies

2. UNIX for Advanced & Expert Users

Clueless about how to lookup and reverse lookup IP addresses under a file!!.pls help

Write a quick shell snippet to find all of the IPV4 IP addresses in any and all of the files under /var/lib/output/*, ignoring whatever else may be in those files. Perform a reverse lookup on each, and format the output neatly, like "IP=192.168.0.1, ... (0 Replies)
Discussion started by: choco4202002
0 Replies

3. UNIX Desktop Questions & Answers

matching 3 digits at the begining and the end of the line

I have a file with hundreds of records and I need to find those records that have three digits at the beginning and the same three digits at the end. $GREP '\(\)\(\)\(\)\3\2\1'I believe this is part of the script but I am not sure how to compare these 3 digits with the 3 digits at the end of... (2 Replies)
Discussion started by: bartsimpsong
2 Replies

4. Shell Programming and Scripting

BASH: remove digits from end of string

Hi there, im sure this is really simple but i have some strings like this e1000g123001 e1000g0 nge11101 nge3and i want to create two variables ($DRIVER and $INSTANCE). the first one containing the alpha characters that make up the first part of the string, e.g. e1000g or nge and the... (9 Replies)
Discussion started by: rethink
9 Replies

5. Shell Programming and Scripting

Bash script to replace text file from a lookup file

Hi. I need assistance with the replacing of text into a specific file via a bash script. My bash script, once run, currently provides a menu of computer names to choose.The script copies onto my system various files, depending what computer was selected in the menu.This is working OK. Now, I... (1 Reply)
Discussion started by: jonesn2000
1 Replies

6. Shell Programming and Scripting

Bash detecting number of digits in line

Hi I have a problem, I am attempting to write a bash script that goes through a file and can determine how many characters are at a set point in a line starting with QTY+113:100:PCE, If it detects 3 digits (number in bold) then pad it out with 12 zero's If there are only two digits then pad it... (8 Replies)
Discussion started by: firefox2k2
8 Replies

7. Shell Programming and Scripting

Use same file selected in first bash process that has matching digits in it fot the second

In the below portion of a bash script the user selects a file from a directory. select file in $(cd /home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools;ls);do break;done files in directory 123_base_counts.txt 456_base_counts.txt 789_base_counts.txt second portion of bash currently (user... (4 Replies)
Discussion started by: cmccabe
4 Replies

8. Shell Programming and Scripting

Find matching file in bash with variable file names but consisent prefixs

As part of a bash the below line strips off a numerical prefix from directory 1 to search for in directory 2. for file in /home/cmccabe/Desktop/comparison/missing/*.txt do file1=${file##*/} # Strip off directory getprefix=${file1%%_*.txt} ... (5 Replies)
Discussion started by: cmccabe
5 Replies

9. Shell Programming and Scripting

awk to lookup stored variable in file and print matching line

The bash bash below extracts the oldest folder from a directory and stores it in filename That result will match a line in bold in input. In the matching line there is an_xxx digit in italics that (once the leading zero is removed) will match a line in link. That is the lint to print in output.... (2 Replies)
Discussion started by: cmccabe
2 Replies

10. Shell Programming and Scripting

Array V-Lookup using UNIX bash

Hey everyone, I am trying to extract column values from a column in a tab-delimited text file and overlay them in a 2nd tab-delimited text file using a V-lookup type script in Unix bash. These are the 1st few rows of the 1st input file IN1: rsid chromosome position allele1 ... (10 Replies)
Discussion started by: Geneanalyst
10 Replies
VCF-ANNOTATE(1) 						   User Commands						   VCF-ANNOTATE(1)

NAME
vcf-annotate - annotate VCF file, add filters or custom annotations SYNOPSIS
cat in.vcf | vcf-annotate [OPTIONS] > out.vcf DESCRIPTION
About: Annotates VCF file, adding filters or custom annotations. Requires tabix indexed file with annotations. Currently annotates only the INFO column, but it will be extended on demand. OPTIONS
-a, --annotations <file.gz> The tabix indexed file with the annotations: CHR FROM[ TO][ VALUE]+. -c, --columns <list> The list of columns in the annotation file, e.g. CHROM,FROM,TO,-,INFO/STR,INFO/GN. The dash in this example indicates that the third column should be ignored. If TO is not present, it is assumed that TO equals to FROM. -d, --description <file|string> Header annotation, e.g. key=INFO,ID=HM2,Number=0,Type=Flag,Description='HapMap2 membership'. The descriptions can be read from a file, one annotation per line. -f, --filter <list> Apply filters, list is in the format flt1=value/flt2/flt3=value/etc. -h, -?, --help This help message. Filters: + Apply all filters with default values (can be overridden, see the example below). -X Exclude the filter X 1, StrandBias FLOAT Min P-value for strand bias (given PV4) [0.0001] 2, BaseQualBias FLOAT Min P-value for baseQ bias [1e-100] 3, MapQualBias FLOAT Min P-value for mapQ bias [0] 4, EndDistBias FLOAT Min P-value for end distance bias [0.0001] a, MinAB INT Minimum number of alternate bases [2] c, SnpCluster INT1,INT2 Filters clusters of 'INT1' or more SNPs within a run of 'INT2' bases [] D, MaxDP INT Maximum read depth [10000000] d, MinDP INT Minimum read depth [2] q, MinMQ INT Minimum RMS mapping quality for SNPs [10] Q, Qual INT Minimum value of the QUAL field [10] r, RefN Reference base is N [] W, GapWin INT Window size for filtering adjacent gaps [10] w, SnpGap INT SNP within INT bp around a gap to be filtered [10] Example: zcat in.vcf.gz | vcf-annotate -a annotations.gz -d descriptions.txt | bgzip -c >out.vcf.gz zcat in.vcf.gz | vcf-annotate -f +/-a/c=3,10/q=3/d=5/-D -a annotations.gz -d descriptions.txt | bgzip -c >out.vcf.gz Where descriptions.txt contains: key=INFO,ID=GN,Number=1,Type=String,Description='Gene Name' key=INFO,ID=STR,Number=1,Type=Integer,Description='Strand' vcf-annotate 0.1.5 July 2011 VCF-ANNOTATE(1)
All times are GMT -4. The time now is 07:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy