Bash to move specific files to directory based on match to file
I am trying to mv each of the .vcf files in the variants folder to the folder in /home/cmccabe/f2 that the .vcf id is found in file. $2 in file will always have the id of a .vcf in the variants folder. The line in blue staring with R_2019 in file up to the -v5.6 will always be an exact match to a folder in /home/cmccabe/f2. There may be multiple folders in /home/cmccabe/f2 but will only have one match in file. There also may be mulitple id's but always only one .vcf in /home/cmccabe/f1/variants.
When a match is found between the folder in /home/cmccabe/f2 and the R_ in file, then the id(s) in $2 will be found in /home/cmccabe/f1/variants as a .vcf. Each .vcf is then moved to the matching folder in /home/cmccabe/f2 in a the sub-folder variants. This is the last step of a procedure that I am stuck on. I have included an attempt in bash and included comments, but im sure there is a better way. Thank you .
for file in /home/cmccabe/f1/variants/*.vcf ; do
bname=$(basename $file) # strip of path
VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension
f=$(printf '%s' /home/cmccabe/f1/file/${VCF}) ## # Find matching id
FILE2=$(awk '{print $2}' $f') # set VCF lookup to column
for RDIR in "$DIR"/R_2019* ; do FOLDER=${RDIR%%-v5.6*}; done ## trim folder match in RDIR from -v5.6 and store in FOLDER
if [[ $VCF = $FILE2 ]] # only execute file on match
then
mkdir -p /home/cmccabe/f2/$FOLDER/variants ## create variants sub-folder
mv /home/cmccabe/f1/file/$VCF /home/cmccabe/f2/$FOLDER/$VCF/variants ## move vcf to folder/id/variants
fi ## end if
done ## close loop
What operating system are you using for this exercise?
It seems that the text description of your problem says that everything you need to find the files to be moved and the locations to which they should be moved is found in a file named /home/cmccabe/f1/file, but your script is treating that regular file as a directory. What am I missing?
Furthermore, you go to a lot of work to create a variable named VCF which contains the name of a file after stripping off the .vcf filename extension. But when you start moving the .vcf files, you use $VCF as the name of those files without reinstating the filename extension???
I then got completely lost when you started a loop on all of the R_2019* files in $DIR. Note that the DIR variable is never defined in your script and is never mentioned in your description of what you are trying to do.
I'm having a hard time guessing at what files are being processed by the code:
Code:
FILE2=$(awk '{print $2}' $f')
(which should have "$f" instead of $f). I'm guessing that this will set FILE2 to a list of filenames that you are then treating as a single filename; but since I don't know what the contents are of the file that has been selected by $f; I'm lost.
I'm assuming that you have tried running your script and it is failing to work. What diagnostics is it printing, or if there aren't any, in what way is it failing to do what you want it to do?
Please indent your code to show its structure. Then comments like "end if" and "end loop" won't be needed and we won't have to wonder where the start of the "if" and "loop" are located. I know the shell doesn't care about indentation, but you are a human and you're asking humans on this forum to read your code. Lack of indentation makes it make difficult for humans (including you) to understand what your code is trying to do.
These 2 Users Gave Thanks to Don Cragun For This Post:
/home/cmccabe/f1/file is the path to file (which has all the necessary information for the move, (folder name, ids).
The for loop on RDIR was for trimming the R_2019 in file to match the folder name in /home/cmccabe/f2 but is undefined and maybe should be /home/cmccabe/f1/file. The FILE2=$(awk '{print $2}' $f') was then intended to read each id from file1 in FILE2. The code executes but nothing is moved and set -x shows the variables not being populated correctly as you already knew .I indented the code above but add comments to help me learn and help me in my logic. Thank you for your help.
I rewrote the script (well a portion) and most of the variables seem good: $STRING is the same as FILE2, I just changed the name to hopefully be more clear as I am looking for a string. However, the loop is not working so only the first id is retained in $STRING. I think I am on the right track but is there a better way? Thank you .
Code:
set -x
DIR=/home/cmccabe/f1
DEST=/home/cmccabe/f2
for file in "$DIR"/variants/*.vcf ; do
bname=$(basename $file) # strip of path
VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension
for f in "$DIR"/file; do STRING=( $(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file) ); echo "This is the string" "$STRING"; done
done
set -x
Code:
cmccabe@Satellite-M645:~$ set -x
cmccabe@Satellite-M645:~$ DIR=/home/cmccabe/f1
+ DIR=/home/cmccabe/f1
cmccabe@Satellite-M645:~$ DEST=/home/cmccabe/f2
+ DEST=/home/cmccabe/f2
cmccabe@Satellite-M645:~$ for file in "$DIR"/variants/*.vcf ; do
> bname=$(basename $file) # strip of path
> VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension
> for f in "$DIR"/file; do STRING=( $(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file) ); echo "This is the string" "$STRING"; done
> done
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0000-LastName-FirstName.vcf
+ bname=19-0000-LastName-FirstName.vcf
++ echo 19-0000-LastName-FirstName.vcf
++ cut -d. -f1
+ VCF=19-0000-LastName-FirstName
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0002-L-F.vcf
+ bname=19-0002-L-F.vcf
++ echo 19-0002-L-F.vcf
++ cut -d. -f1
+ VCF=19-0002-L-F
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0004-La-Fi.vcf
+ bname=19-0004-La-Fi.vcf
++ echo 19-0004-La-Fi.vcf
++ cut -d. -f1
+ VCF=19-0004-La-Fi
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0020-Las-Fir.vcf
+ bname=19-0020-Las-Fir.vcf
++ echo 19-0020-Las-Fir.vcf
++ cut -d. -f1
+ VCF=19-0020-Las-Fir
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/190319-Control.vcf
+ bname=190319-Control.vcf
++ echo 190319-Control.vcf
++ cut -d. -f1
+ VCF=190319-Control
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/190320-Control.vcf
+ bname=190320-Control.vcf
++ echo 190320-Control.vcf
++ cut -d. -f1
+ VCF=190320-Control
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
I have been able to get a working solution that produces my desired results... using set -x and the below modifications
Code:
if [[ $VCF = ${STRING[*]} ]] # only execute file on match
then
RSTRING=$(awk '/R_2019/' "$DIR"/run) ## search for lines matching R_2019 pattern
VCFRUN=$(awk -F '\n' -v RS="" -v ref="$VCF" '$0 ~ ref {print $NF}' "$DIR"/file) ## search file for matching $VCF and return last column ($2)
RUN="$(echo $RSTRING|cut -d- -f1,2,3)" ## remove after third _ in line with R_2019
mv "$DIR"/variants/${VCF}.vcf "$DEST"/"$RUN"/"$VCF"/variants ## move vcf to folder in destination
This matched each .vcf and moved the match to the correct run file. Maybe this will help others as well.
Thank you very much for your help .
Last edited by cmccabe; 04-09-2019 at 10:22 PM..
Reason: added comments
In the below bash I am trying to rename eachof the 3 text files in /home/cmccabe/Desktop/percent by matching the numerical portion of each file to lines 3,4, or 5 in /home/cmccabe/Desktop/analysis.txt. There will always be a match between the files. When a match is found each text file in... (2 Replies)
I have a directory /home/cmccabe/nfs/exportedReports that contains multiple folders in it. The find writes the name of each folder to out.txt. A new directory is then created in a new location /home/cmccabe/Desktop/NGS/API, named with the date. What I am trying to do, unsuccessfully at the moment,... (7 Replies)
I have a specific set (all ending with .bam) of downloaded files in a directory /home/cmccabe/Desktop/NGS/API/2-15-2016. What I am trying to do is use a match to $2 in name to rename the downloaded files. To make things a more involved the date of the folder is unique and in the header of name... (1 Reply)
I am using bash to prompt a user for a choice using: where a "y" response opens a menu with available panels that can be used.
while true; do
read -p "Do you want to get coverage of a specific panel?" yn
case $yn in
* ) menu; break;;
* ) exit;;
* ) echo... (6 Replies)
Hi all,
I'm new to this forum and bash scripting. I have the following problem, I need to copy some files (from one dir. to another) whose first 5 numbers (subjects' ID) match the directory names. Here a shortened version of what I'm trying to do:
names=(32983_f 35416_f 43579_f) # these are... (6 Replies)
Hi All,
Really stuck up with a requirement where I need to move a file (Lets say date_Employee.txt--the date will have different date values like 20120612/20120613 etc) from one directory to another based on creation/modification dates.
While visiting couple of posts, i could see we can... (3 Replies)
Hi Folks,
I have different type of file in my current directory. From my current directory i need to move the file which is start with csp_rules and if the file is having the string payg , then I need to move all this files to another directory /output/record. Please help me how to do this?
... (3 Replies)
Move all files starting with a specific name to different directory.
This shell script program should have three parameters
File Name
Source Directory
Destination Directory
User should be able to enter ‘AB_CD*' in file name parameter. In this case all the files starting with AB_CD will... (1 Reply)
Hi all,
This is actually more for my lazyness then anything else, but I think others might find it useful to use as well. Basically this is what I am trying to achieve...
In my ubuntu home dir under Downloads is where firefox saves everything by default, now I know that you can manually... (3 Replies)
My input:
File_1:
2000_t
g1110.b1
abb.1
2001_t
g1111.b1
abb.2
abb.2
g1112.b1
abb.3
2002_t
.
.
File_2:
2000_t Ali england 135
abb.1 Zoe british 150
2001_t Ali england 305
g1111.b1 Lucy russia 126 (6 Replies)