Bash lookup matching digits for secong file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash lookup matching digits for secong file
# 1  
Old 07-25-2016
Bash lookup matching digits for secong file

In the bash below the user selects the file to be used. The digits of each file are unique and used to automatically locate the next file to be used in the process. The problem I can not seem to fix is that the full path needs to be referenced in the second portion and it is not currently. Is there a better way? Thank you Smilie.

select1 files (user selects 123_base_counts.txt)
Code:
123_base_counts.txt
456_base_counts.txt

files used that match digits in file
Code:
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final (this one is automatically selected because it has the same starting digits as the original file)
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final

These are all files in the directory:
Code:
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt

Bash
Code:
FILESDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools
ANNOVARDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/vcf/overall/annovar

PS3="please select a file1 to analyze with a panel: " # specify file
select file1 in $(cd ${FILESDIR};ls);do break;done
        file1=`basename ${FILESDIR}/${file1}`
        printf "FILE is: ${file1} and will be used to filter reads, identify target bases and genes less than 20 and 30 reads, create a low coverage bed for vizulization, calculate 20x and 30x coverage, and filter the vcf for the 98 gene epilepsy panel"
logfile=/home/cmccabe/Desktop/NGS/API/5-14-2016/process.log
for file1 in /home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools/$file1; do
     bname=$(basename $file1)
     pref=${bname%%.txt}
     grep -wFf /home/cmccabe/Desktop/NGS/panels/EPILEPSY_unix_trim.bed $file1 > /home/cmccabe/Desktop/NGS/API/5-14-2016/panel/reads/${pref}_EPILEPSY.txt
     done >> "$logfile"
# filter vcf
printf "\n\n"
printf "These are all vcf files in the directory:  \n"
ls ${ANNOVARDIR}
file1=`basename ${FILESSDIR}/${file1}`  # file matched
file2=`basename ${ANNOVARDIR}/${file1%%_*}`
path=${ANNOVARDIR}/${file1%%_*}
     printf "The matching identifier for file2 is: ${file2} and will be used filtered using the epilepsy genes\n"
     echo "The full filename is $path"

Currently
Code:
1) 123_base_counts.txt
2) 456_base_counts.txt 
please select a file1 to analyze with a panel: 1
FILE is: 123_base_counts.txt and will be used to filter reads, identify target bases and genes less than 20 and 30 reads, create a low coverage bed for vizulization, calculate 20x and 30x coverage, and filter the vcf for the 98 gene epilepsy panel

These are all vcf files in the directory:  
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt

The matching identifier for file2 is: 123 and will be used filtered using the epilepsy genes
The full file name is /home/cmccabe/Desktop/NGS/API/5-14-2016/vcf/overall/annovar/123


Last edited by cmccabe; 07-25-2016 at 06:41 PM.. Reason: added current output
# 2  
Old 07-26-2016
We have seen most of this in earlier threads. And, we understand why your current bash code produces the output it produces (although I don't understand some of your code that seems to just be creating extra work for you).

What I don't understand is what output you are hoping to create that is different from the output you are currently getting???
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 07-26-2016
The below bash (though not optimized), yields the desired result for one entry. That is depending on the digits in the file manually selected in the first process, the second file used is automatically selected using the matching digits along with the full path. The problem is this seems to work for the first file but not for others. Thank you Smilie.

file manually selected: 123_base_counts.txt
Code:
123_base_counts.txt
456_base_counts.txt

file selected automatically using the matching digits in (/home/user)
Code:
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt

bash
Code:
# manual selection of file
FILESDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools
ANNOVARDIR=/home/user

PS3="please select a file to analyze with a panel: " # specify file1
select file1 in $(cd ${FILESDIR};ls);do break;done
          file1=`basename ${FILESDIR}/${file1}`
          printf "FILE is: ${file1} and will be used

# automatic file based on match
FILESDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools # match directory
ANNOVARDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/vcf/overall/annovar # search directory
printf "\n\n"
printf "These are all vcf files in the directory: \n"
ls ${ANNOVARDIR}
file1=`basename ${FILESSDIR}/${file1}`  # file matched
file2=(${ANNOVARDIR}/${file1%%_*}*)
     printf "file2 is: ${file2} and will be used

output
Code:
1) 123_base_counts.txt 
2) 456_base_counts.txt 

please select a file to analyze with a panel: 1
FILE is: 123_base_counts.txt and will be used to filter reads, identify target bases and genes less than 20 and 30 reads, create a low coverage bed for visualization, calculate 20x and 30x coverage, and filter the vcf for the 98 gene epilepsy panel

These are all files in the new directory: 
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
file2 is: /home/cmccabe/Desktop/NGS/API/5-14-2016/vcf/overall/annovar/123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt and will be used 

second time results
1) 123_base_counts.txt  
2) 456_base_counts.txt


please select a file to analyze with a panel: 2
FILE is: 456_base_counts.txt and will be used

These are all files in the new directory: 
123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
456_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt
file2 is: /home/user/123_variant_strandbias_readcount.vcf.hg19_multianno_removed_final.txt and will be used

# 4  
Old 07-26-2016
Why are you using an array for file2? I thought there was supposed to be a single file in both directories starting with the string that is the number before the first underscore in the name of the file selected in the first directory?

What output do you get when your run your script with tracing enabled:
Code:
bash -xv your_script_name

This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 07-26-2016
For a shell that has both select and arrays (neither of which are required by the standards), the following seems to work, if I correctly understand what you're trying to do:
Code:
#!/bin/ksh
# manual selection of file
ANNOVARDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/vcf/overall/annovar
FILESDIR=/home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools
PS3="please select a file to analyze with a panel: "

cd "$FILESDIR"
select file1 in $(ls)
do	[ "$file1" != "" ] && break
done
printf "FILE is: ${file1} and will be used\n\n"

# automatic file based on match
cd "$ANNOVARDIR"
printf "These are all vcf files in the directory:\n"
ls
file2=("${file1%%_*}"*)
file2=$PWD/$file2
printf "file2 is: $file2 and will be used\n"

Shells that provide both select and arrays include recent bash and 1993 or later versions of ksh (there may be others). This has been tested with both ksh (version: 93u+ 2012-08-01) and bash (version: 3.2.57(1)-release (x86_64-apple-darwin15)).

Last edited by Don Cragun; 07-26-2016 at 05:30 PM.. Reason: Optimize away a call to printf.
# 6  
Old 07-26-2016
what do recommend? you are right that

Quote:
I thought there was supposed to be a single file in both directories starting with the string that is the number before the first underscore in the name of the file selected in the first directory
I run the bash as part of a shell download.sh? Thank you Smilie.
# 7  
Old 07-26-2016
We may have crossed paths... See if post #5 helps.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Array V-Lookup using UNIX bash

Hey everyone, I am trying to extract column values from a column in a tab-delimited text file and overlay them in a 2nd tab-delimited text file using a V-lookup type script in Unix bash. These are the 1st few rows of the 1st input file IN1: rsid chromosome position allele1 ... (10 Replies)
Discussion started by: Geneanalyst
10 Replies

2. Shell Programming and Scripting

awk to lookup stored variable in file and print matching line

The bash bash below extracts the oldest folder from a directory and stores it in filename That result will match a line in bold in input. In the matching line there is an_xxx digit in italics that (once the leading zero is removed) will match a line in link. That is the lint to print in output.... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

Find matching file in bash with variable file names but consisent prefixs

As part of a bash the below line strips off a numerical prefix from directory 1 to search for in directory 2. for file in /home/cmccabe/Desktop/comparison/missing/*.txt do file1=${file##*/} # Strip off directory getprefix=${file1%%_*.txt} ... (5 Replies)
Discussion started by: cmccabe
5 Replies

4. Shell Programming and Scripting

Use same file selected in first bash process that has matching digits in it fot the second

In the below portion of a bash script the user selects a file from a directory. select file in $(cd /home/cmccabe/Desktop/NGS/API/5-14-2016/bedtools;ls);do break;done files in directory 123_base_counts.txt 456_base_counts.txt 789_base_counts.txt second portion of bash currently (user... (4 Replies)
Discussion started by: cmccabe
4 Replies

5. Shell Programming and Scripting

Bash detecting number of digits in line

Hi I have a problem, I am attempting to write a bash script that goes through a file and can determine how many characters are at a set point in a line starting with QTY+113:100:PCE, If it detects 3 digits (number in bold) then pad it out with 12 zero's If there are only two digits then pad it... (8 Replies)
Discussion started by: firefox2k2
8 Replies

6. Shell Programming and Scripting

Bash script to replace text file from a lookup file

Hi. I need assistance with the replacing of text into a specific file via a bash script. My bash script, once run, currently provides a menu of computer names to choose.The script copies onto my system various files, depending what computer was selected in the menu.This is working OK. Now, I... (1 Reply)
Discussion started by: jonesn2000
1 Replies

7. Shell Programming and Scripting

BASH: remove digits from end of string

Hi there, im sure this is really simple but i have some strings like this e1000g123001 e1000g0 nge11101 nge3and i want to create two variables ($DRIVER and $INSTANCE). the first one containing the alpha characters that make up the first part of the string, e.g. e1000g or nge and the... (9 Replies)
Discussion started by: rethink
9 Replies

8. UNIX Desktop Questions & Answers

matching 3 digits at the begining and the end of the line

I have a file with hundreds of records and I need to find those records that have three digits at the beginning and the same three digits at the end. $GREP '\(\)\(\)\(\)\3\2\1'I believe this is part of the script but I am not sure how to compare these 3 digits with the 3 digits at the end of... (2 Replies)
Discussion started by: bartsimpsong
2 Replies

9. UNIX for Advanced & Expert Users

Clueless about how to lookup and reverse lookup IP addresses under a file!!.pls help

Write a quick shell snippet to find all of the IPV4 IP addresses in any and all of the files under /var/lib/output/*, ignoring whatever else may be in those files. Perform a reverse lookup on each, and format the output neatly, like "IP=192.168.0.1, ... (0 Replies)
Discussion started by: choco4202002
0 Replies

10. Shell Programming and Scripting

Lookup the matching string

Dear all, I have two files like below. file1 ====== x y ==== === 123 test1 124 test2 125 test3 file2 ======= a b c === === ==== 123 ... (2 Replies)
Discussion started by: Nayanajith
2 Replies
Login or Register to Ask a Question