Bash to move specific files to directory based on match to file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash to move specific files to directory based on match to file
# 1  
Old 03-26-2019
Bash to move specific files to directory based on match to file

I am trying to mv each of the .vcf files in the variants folder to the folder in /home/cmccabe/f2 that the .vcf id is found in file. $2 in file will always have the id of a .vcf in the variants folder. The line in blue staring with R_2019 in file up to the -v5.6 will always be an exact match to a folder in /home/cmccabe/f2. There may be multiple folders in /home/cmccabe/f2 but will only have one match in file. There also may be mulitple id's but always only one .vcf in /home/cmccabe/f1/variants.

When a match is found between the folder in /home/cmccabe/f2 and the R_ in file, then the id(s) in $2 will be found in /home/cmccabe/f1/variants as a .vcf. Each .vcf is then moved to the matching folder in /home/cmccabe/f2 in a the sub-folder variants. This is the last step of a procedure that I am stuck on. I have included an attempt in bash and included comments, but im sure there is a better way. Thank you Smilie.


file in /home/cmccabe/f1

Code:
IonCode_0007 19-0004-La-Fi
IonCode_0009 19-0005-Last-Firs
IonCode_0011 19-0008-LastN-FirstN
IonCode_0013 190320-Control
R_2019_03_12_13_59_54_user_S5-0271-100-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

IonCode_0005 19-0000-LastName-FirstName
IonCode_0001 19-0001-Las-Fir
IonCode_0003 190319-Control
R_2019_03_12_11_10_20_user_S5-0271-99-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

variants folder in /home/cmccabe/f1
Code:
19-0000-LastName-FirstName.vcf
19-0001-Las-Fir.vcf
190319-Control.vcf
19-0004-La-Fi.vcf
19-0005-Last-Firs.vcf
19-0008-LastN-FirstN.vcf
190320-Control.vcf

current structure of /home/cmccabe/f2

Code:
R_2019_03_12_11_10_20_user_S5-0271-99   ---parent directory ---
     - bam    --- sub-folder ---
     - qc     --- sub-folder ---
     - 19-0000-LastName-FirstName
             - variants
     - 19-0001-Last-Firs
             - variants
    - 190319-Control
             - variants
R_2019_03_12_13_59_54_user_S5-0271-100   ---parent directory ---
     - bam    --- sub-folder ---
     - qc     --- sub-folder ---
     19-0004-La-Fi
         - variants
     - 19-0005-Last-Firs
        - variants
     - 19-0008-LastN-FirstN
        - variants
     - 190320-Control.vcf
        -variants

desired structure of /home/cmccabe/f2

Code:
R_2019_03_12_11_10_20_user_S5-0271-99   ---parent directory ---
     - bam    --- sub-folder ---
     - qc     --- sub-folder
     - 19-0000-LastName-FirstName
              - variants
                   19-0000-LastName-FirstName.vcf
     - 19-0001-Last-Firs
             - variants
                  19-0001-Last-Firs.vcf
     - 190319-Control
             - variants
                   190319-Control.vcf
R_2019_03_12_13_59_54_user_S5-0271-100   ---parent directory ---
     - bam    --- sub-folder ---
     - qc     --- sub-folder ---
     - 19-0004-La-Fi
         - variants
            19-0004-La-Fi.vcf
     - 19-0005-Last-Firs
        - variants
          19-0005-Last-Firs.vcf
     - 19-0008-LastN-FirstN
        - variants
           19-0008-LastN-FirstN.vcf
     - 190320-Control.vcf
        -variants
            190320-Control.vcf

possible bash

Code:
for file in /home/cmccabe/f1/variants/*.vcf ; do
  bname=$(basename $file) # strip of path
  VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension
     f=$(printf '%s' /home/cmccabe/f1/file/${VCF})  ## # Find matching id
       FILE2=$(awk '{print $2}' $f') # set VCF lookup to column
          for RDIR in "$DIR"/R_2019* ; do FOLDER=${RDIR%%-v5.6*}; done  ## trim folder match in RDIR from -v5.6 and store in FOLDER
          if [[ $VCF = $FILE2 ]] # only execute file on match
                 then
                    mkdir -p /home/cmccabe/f2/$FOLDER/variants  ## create variants sub-folder
                   mv /home/cmccabe/f1/file/$VCF /home/cmccabe/f2/$FOLDER/$VCF/variants  ## move vcf to folder/id/variants
          fi  ## end if
done  ## close loop


Last edited by cmccabe; 03-27-2019 at 05:02 PM..
# 2  
Old 03-27-2019
What operating system are you using for this exercise?

It seems that the text description of your problem says that everything you need to find the files to be moved and the locations to which they should be moved is found in a file named /home/cmccabe/f1/file, but your script is treating that regular file as a directory. What am I missing?

Furthermore, you go to a lot of work to create a variable named VCF which contains the name of a file after stripping off the .vcf filename extension. But when you start moving the .vcf files, you use $VCF as the name of those files without reinstating the filename extension???

I then got completely lost when you started a loop on all of the R_2019* files in $DIR. Note that the DIR variable is never defined in your script and is never mentioned in your description of what you are trying to do.

I'm having a hard time guessing at what files are being processed by the code:
Code:
 FILE2=$(awk '{print $2}' $f')

(which should have "$f" instead of $f). I'm guessing that this will set FILE2 to a list of filenames that you are then treating as a single filename; but since I don't know what the contents are of the file that has been selected by $f; I'm lost.

I'm assuming that you have tried running your script and it is failing to work. What diagnostics is it printing, or if there aren't any, in what way is it failing to do what you want it to do?

Please indent your code to show its structure. Then comments like "end if" and "end loop" won't be needed and we won't have to wonder where the start of the "if" and "loop" are located. I know the shell doesn't care about indentation, but you are a human and you're asking humans on this forum to read your code. Lack of indentation makes it make difficult for humans (including you) to understand what your code is trying to do.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 3  
Old 03-27-2019
I am using ubuntu 14.04 as my os.

/home/cmccabe/f1/file is the path to file (which has all the necessary information for the move, (folder name, ids).

The for loop on RDIR was for trimming the R_2019 in file to match the folder name in /home/cmccabe/f2 but is undefined and maybe should be /home/cmccabe/f1/file. The FILE2=$(awk '{print $2}' $f') was then intended to read each id from file1 in FILE2. The code executes but nothing is moved and set -x shows the variables not being populated correctly as you already knew Smilie.I indented the code above but add comments to help me learn and help me in my logic. Thank you for your helpSmilie.

I rewrote the script (well a portion) and most of the variables seem good: $STRING is the same as FILE2, I just changed the name to hopefully be more clear as I am looking for a string. However, the loop is not working so only the first id is retained in $STRING. I think I am on the right track but is there a better way? Thank you Smilie.

Code:
set -x
DIR=/home/cmccabe/f1
DEST=/home/cmccabe/f2
for file in "$DIR"/variants/*.vcf ; do
  bname=$(basename $file) # strip of path
    VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension   
  for f in "$DIR"/file; do STRING=( $(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file) ); echo "This is the string" "$STRING"; done
done

set -x
Code:
cmccabe@Satellite-M645:~$ set -x
cmccabe@Satellite-M645:~$ DIR=/home/cmccabe/f1
+ DIR=/home/cmccabe/f1
cmccabe@Satellite-M645:~$ DEST=/home/cmccabe/f2
+ DEST=/home/cmccabe/f2
cmccabe@Satellite-M645:~$ for file in "$DIR"/variants/*.vcf ; do
>   bname=$(basename $file) # strip of path
>     VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension   
>   for f in "$DIR"/file; do STRING=( $(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file) ); echo "This is the string" "$STRING"; done
> done
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0000-LastName-FirstName.vcf
+ bname=19-0000-LastName-FirstName.vcf
++ echo 19-0000-LastName-FirstName.vcf
++ cut -d. -f1
+ VCF=19-0000-LastName-FirstName
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0002-L-F.vcf
+ bname=19-0002-L-F.vcf
++ echo 19-0002-L-F.vcf
++ cut -d. -f1
+ VCF=19-0002-L-F
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0004-La-Fi.vcf
+ bname=19-0004-La-Fi.vcf
++ echo 19-0004-La-Fi.vcf
++ cut -d. -f1
+ VCF=19-0004-La-Fi
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0020-Las-Fir.vcf
+ bname=19-0020-Las-Fir.vcf
++ echo 19-0020-Las-Fir.vcf
++ cut -d. -f1
+ VCF=19-0020-Las-Fir
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/190319-Control.vcf
+ bname=190319-Control.vcf
++ echo 190319-Control.vcf
++ cut -d. -f1
+ VCF=190319-Control
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/190320-Control.vcf
+ bname=190320-Control.vcf
++ echo 190320-Control.vcf
++ cut -d. -f1
+ VCF=190320-Control
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName


Last edited by cmccabe; 03-27-2019 at 07:12 PM..
# 4  
Old 04-08-2019
I apologize for not getting back to you sooner. (I was distracted for a few days by other activities.)

Have you made any progress on resolving this problem?
These 2 Users Gave Thanks to Don Cragun For This Post:
# 5  
Old 04-09-2019
I have been able to get a working solution that produces my desired results... using set -x and the below modifications

Code:
if [[ $VCF = ${STRING[*]} ]] # only execute file on match
         then
           RSTRING=$(awk '/R_2019/' "$DIR"/run)  ## search for lines matching R_2019 pattern
              VCFRUN=$(awk -F '\n' -v RS="" -v ref="$VCF" '$0 ~ ref {print $NF}' "$DIR"/file)  ## search file for matching $VCF and return last column ($2)
           RUN="$(echo $RSTRING|cut -d- -f1,2,3)" ## remove after third _ in line with R_2019
                mv "$DIR"/variants/${VCF}.vcf "$DEST"/"$RUN"/"$VCF"/variants  ## move vcf to folder in destination

This matched each .vcf and moved the match to the correct run file. Maybe this will help others as well.

Thank you very much for your help Smilie.

Last edited by cmccabe; 04-09-2019 at 10:22 PM.. Reason: added comments
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to add portion of text to files in directory using numerical match

In the below bash I am trying to rename eachof the 3 text files in /home/cmccabe/Desktop/percent by matching the numerical portion of each file to lines 3,4, or 5 in /home/cmccabe/Desktop/analysis.txt. There will always be a match between the files. When a match is found each text file in... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Bash to move specific files from folders in find file

I have a directory /home/cmccabe/nfs/exportedReports that contains multiple folders in it. The find writes the name of each folder to out.txt. A new directory is then created in a new location /home/cmccabe/Desktop/NGS/API, named with the date. What I am trying to do, unsuccessfully at the moment,... (7 Replies)
Discussion started by: cmccabe
7 Replies

3. Shell Programming and Scripting

Rename specific file extension in directory with match to another file in bash

I have a specific set (all ending with .bam) of downloaded files in a directory /home/cmccabe/Desktop/NGS/API/2-15-2016. What I am trying to do is use a match to $2 in name to rename the downloaded files. To make things a more involved the date of the folder is unique and in the header of name... (1 Reply)
Discussion started by: cmccabe
1 Replies

4. Shell Programming and Scripting

Bash to select panel then specific file in directory

I am using bash to prompt a user for a choice using: where a "y" response opens a menu with available panels that can be used. while true; do read -p "Do you want to get coverage of a specific panel?" yn case $yn in * ) menu; break;; * ) exit;; * ) echo... (6 Replies)
Discussion started by: cmccabe
6 Replies

5. UNIX for Dummies Questions & Answers

move files that match specific conditions

Hi all, I'm new to this forum and bash scripting. I have the following problem, I need to copy some files (from one dir. to another) whose first 5 numbers (subjects' ID) match the directory names. Here a shortened version of what I'm trying to do: names=(32983_f 35416_f 43579_f) # these are... (6 Replies)
Discussion started by: ada1983
6 Replies

6. Shell Programming and Scripting

Move files from one directory to another based on creation/modification date

Hi All, Really stuck up with a requirement where I need to move a file (Lets say date_Employee.txt--the date will have different date values like 20120612/20120613 etc) from one directory to another based on creation/modification dates. While visiting couple of posts, i could see we can... (3 Replies)
Discussion started by: dsfreddie
3 Replies

7. Shell Programming and Scripting

Move files to another directory based on name

Hi Folks, I have different type of file in my current directory. From my current directory i need to move the file which is start with csp_rules and if the file is having the string payg , then I need to move all this files to another directory /output/record. Please help me how to do this? ... (3 Replies)
Discussion started by: suresh01_apk
3 Replies

8. Shell Programming and Scripting

Move all files from source to destination directory based on the filename

Move all files starting with a specific name to different directory. This shell script program should have three parameters File Name Source Directory Destination Directory User should be able to enter ‘AB_CD*' in file name parameter. In this case all the files starting with AB_CD will... (1 Reply)
Discussion started by: chetancrsp18
1 Replies

9. UNIX for Advanced & Expert Users

Watch directory and move specific file extensions

Hi all, This is actually more for my lazyness then anything else, but I think others might find it useful to use as well. Basically this is what I am trying to achieve... In my ubuntu home dir under Downloads is where firefox saves everything by default, now I know that you can manually... (3 Replies)
Discussion started by: STOIE
3 Replies

10. Shell Programming and Scripting

Merge two file data together based on specific pattern match

My input: File_1: 2000_t g1110.b1 abb.1 2001_t g1111.b1 abb.2 abb.2 g1112.b1 abb.3 2002_t . . File_2: 2000_t Ali england 135 abb.1 Zoe british 150 2001_t Ali england 305 g1111.b1 Lucy russia 126 (6 Replies)
Discussion started by: patrick87
6 Replies
Login or Register to Ask a Question