Bash to trim folder and files within a path that share a common file extension
The bash will trim the folder to trim folder. Within each of the folders (there may be more than 1) and the format is always the same, are several .bam and matching .bam.bai files (file structure) and the bashunder that executes and trims the .bam as expected but repeats the.bam.bai extentions after trimming those files. xxxx_0113_xxx_xxx.bam.bai.bam.bai also in the set -xv. I think the .bam extension common to both may be causing the repeat but am not sure. Removing the .bam.bai from the mv did not fix the repeats. The end goal is to trim the folders and the files within each of the folders and I am not sure if the nested loops are the best way (probably not). Thank you .
bash to trim folder
Code:
for folder in /home/cmccabe/rename/*/ ; do ## start loop in subdirectory
mv "$folder" "${folder%%-v5.6*}" ## trim folder name
done ## close loop
for d in /home/cmccabe/rename/* ; do ## start loop in parentdir
if [ -d "$d" ]; then ## grab subdir and store in parentdir/subdir in $d
subdir="$(basename $d)" ## define sub-directory
fi ## end if
for bam in "${d}"/*.bam ; do ## iterate through each file in parentdir and read into bam
for bai in "${d}"/*.bam.bai ; do ## iterate through each file in parentdir/subdir and read into bai
bam_path_removed=$(echo $bam| awk -F/ '{print $NF}') ## cut text before last /
bai_path_removed=$(echo $bai| awk -F/ '{print $NF}') ## cut text before last /
bam_trim=$(echo "$bam_path_removed"|cut -f1,2,3,4 -d'_')
bai_trim=$(echo "$bai_path_removed"|cut -f1,2,3,4 -d'_')
mv "${bam}" "${d}/${bam_trim}".bam ## rename all bam
mv "${bai}" "${d}/${bai_trim}".bam.bai ## rename all bai
done ## close loop
done ## close loop
done ## close loop
#!/usr/bin/bash
# we are running this from ROOTDIR, or abort.
ROOTDIR=/home/cmccabe/rename
cd $ROOTDIR || exit 1
# matching name "R_2019*" to operate on desired directory names, expand this to be precise.
MDIR="R_2019*"
# GNU find feature here, since we do not have information about subdirectories under 'rename'
# If you have subdirectories and want posix more effort will be required, you did not specify operating system.
find . -mindepth 1 -maxdepth 1 -type d -name "${MDIR}" | while read RDIR
do
TRIMSTR="${RDIR%%-v5.6*}"
for FLN in $RDIR/*.bam $RDIR/*.bam.bai
do
FLNSUB="_${RDIR/\.\//}"
echo "mv ${FLN} ${FLN/$FLNSUB}"
done
# Now we shall rename the folder, after files inside have been renamed.
echo "mv $RDIR ${TRIMSTR}"
done
Remove the echo infront of mv commands to execute against files, otherwise it will just print on terminal.
Probably could use some more error handling and stuff.
Hope that helps
Regards
Peasant.
These 2 Users Gave Thanks to Peasant For This Post:
The bash will trim the folder to trim folder. Within each of the folders (there may be more than 1) and the format is always the same, are several .bam and matching .bam.bai files (file structure) and the bashunder that executes and trims the .bam as expected but repeats the.bam.bai extentions after trimming those files. xxxx_0113_xxx_xxx.bam.bai.bam.bai also in the set -xv. I think the .bam extension common to both may be causing the repeat but am not sure. Removing the .bam.bai from the mv did not fix the repeats. The end goal is to trim the folders and the files within each of the folders and I am not sure if the nested loops are the best way (probably not). Thank you .
bash to trim folder
Code:
for folder in /home/cmccabe/rename/*/ ; do ## start loop in subdirectory
mv "$folder" "${folder%%-v5.6*}" ## trim folder name
done ## close loop
for d in /home/cmccabe/rename/* ; do ## start loop in parentdir
if [ -d "$d" ]; then ## grab subdir and store in parentdir/subdir in $d
subdir="$(basename $d)" ## define sub-directory
fi ## end if
for bam in "${d}"/*.bam ; do ## iterate through each file in parentdir and read into bam
for bai in "${d}"/*.bam.bai ; do ## iterate through each file in parentdir/subdir and read into bai
bam_path_removed=$(echo $bam| awk -F/ '{print $NF}') ## cut text before last /
bai_path_removed=$(echo $bai| awk -F/ '{print $NF}') ## cut text before last /
bam_trim=$(echo "$bam_path_removed"|cut -f1,2,3,4 -d'_')
bai_trim=$(echo "$bai_path_removed"|cut -f1,2,3,4 -d'_')
mv "${bam}" "${d}/${bam_trim}".bam ## rename all bam
mv "${bai}" "${d}/${bai_trim}".bam.bai ## rename all bai
done ## close loop
done ## close loop
done ## close loop
What you have shown us above makes it look like you may have moved a bunch of your *.bam.bai files into *.bam.bai.bam.bai files and possibly moved *.bam files into *.bam.bam files. The purpose of the echo commands was to make sure that the mv commands that would be executed looked good before actually moving files. The fact that the echoset -xv output did not show the same filenames as the mvset -xv output seems to imply that when the echo was removed from the echo mv lines in your script, something else in your script was changed than just removing the echo in front of the two mv commands.
Since the output from the set -xv trace showed that it was going to rename files in the wrong directory, why did you remove the echo and run it again? The purpose of having the echo in there is so that you can verify that the command being echoed is the command that you want the script to actually perform when run a second time with the echos removed.
Please show us the output from the command:
Code:
find /home/cmccabe/rename/ \( -type d -o -name '*.bam*' \) -exec ls -ld {} +
so we can see how things stand now. Please also tell us what operating system you're using. (PLEASE always tell us what operating system and shell you're using when you start a new thread.)
I don't know if you have tried running the code Peasant suggested in post #2 in this thread. I'm afraid the code Peasant suggested might only work with the file hierarchy you described before any files were moved. (Note that I haven't tried to figure out what his code will do if it starts with your modified file hierarchy instead of what may be the current file hierarchy.) Do you have backups so you can restore that original state? If not, I'm hoping that with the output from the find command above we'll be able to find a way to get to where you want to be without losing any data.
This User Gave Thanks to Don Cragun For This Post:
I am using ubuntu 14.04 as my os.
Each .bam and .bam.bai is inside each R_2019 directory but it looks like the files can not be found. Thank you .
set -xv
Code:
ROOTDIR=/home/cmccabe/rename
ROOTDIR=/home/cmccabe/rename
+ ROOTDIR=/home/cmccabe/rename
cmccabe@Satellite-M645:~$ cd $ROOTDIR || exit 1
cd $ROOTDIR || exit 1
+ cd /home/cmccabe/rename
cmccabe@Satellite-M645:~/rename$ # matching name "R_2019*" to operate on desired directory names, expand this to be precise.
# matching name "R_2019*" to operate on desired directory names, expand this to be precise.
cmccabe@Satellite-M645:~/rename$ MDIR="R_2019*"
MDIR="R_2019*"
+ MDIR='R_2019*'
cmccabe@Satellite-M645:~/rename$ find . -mindepth 1 -maxdepth 1 -type d -name "${MDIR}" | while read RDIR
find . -mindepth 1 -maxdepth 1 -type d -name "${MDIR}" | while read RDIR
> do
do
> TRIMSTR="${RDIR%%-v5.6*}"
TRIMSTR="${RDIR%%-v5.6*}"
> for FLN in $RDIR/*.bam $RDIR/*.bam.bai
for FLN in $RDIR/*.bam $RDIR/*.bam.bai
> do
do
> FLNSUB="_${RDIR/\.\//}"
FLNSUB="_${RDIR/\.\//}"
> "mv ${FLN} ${FLN/$FLNSUB}"
"mv ${FLN} ${FLN/$FLNSUB}"
> done
done
> # Now we shall rename the folder, after files inside have been renamed.
# Now we shall rename the folder, after files inside have been renamed.
> "mv $RDIR ${TRIMSTR}"
"mv $RDIR ${TRIMSTR}"
> done
done
+ read RDIR
+ find . -mindepth 1 -maxdepth 1 -type d -name 'R_2019*'
+ TRIMSTR=./R_2019_01_30_14_24_53_user_S5-0271-95
+ for FLN in '$RDIR/*.bam' '$RDIR/*.bam.bai'
+ FLNSUB=_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx.bam'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx.bam: No such file or directory
+ for FLN in '$RDIR/*.bam' '$RDIR/*.bam.bai'
+ FLNSUB=_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx.bam'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx.bam: No such file or directory
+ for FLN in '$RDIR/*.bam' '$RDIR/*.bam.bai'
+ FLNSUB=_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx.bam.bai'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx.bam.bai: No such file or directory
+ for FLN in '$RDIR/*.bam' '$RDIR/*.bam.bai'
+ FLNSUB=_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx.bam.bai'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx.bam.bai: No such file or directory
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions ./R_2019_01_30_14_24_53_user_S5-0271-95'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions ./R_2019_01_30_14_24_53_user_S5-0271-95: No such file or directory
+ read RDIR
--- Post updated at 08:01 AM ---
@Don Cragun I just tried the scrip by @Peasent and posted the results. The files could not be found to trim. I am using ubuntu 14.04 currently and may be migrating to centos 7 in the near future.
I do have backups of the data and removed the echo as the output looked correct and since Ihave backups I performed the mv. As I look back it was not correct but the .bam files were trimmed as expected it was the .bam.bai that were not. Thank you .
Code:
find /home/cmccabe/rename/ \( -type d -o -name '*.bam*' \) -exec ls -ld {} +
drwxrwxr-x 3 cmccabe cmccabe 4096 Mar 17 07:28 /home/cmccabe/rename/
drwxrwxr-x 2 cmccabe cmccabe 4096 Mar 17 07:34 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
-rw-rw-r-- 1 cmccabe cmccabe 0 Feb 28 14:59 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam
-rw-rw-r-- 1 cmccabe cmccabe 0 Feb 28 14:59 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai
-rw-rw-r-- 1 cmccabe cmccabe 0 Feb 28 14:59 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam
-rw-rw-r-- 1 cmccabe cmccabe 0 Feb 28 14:59 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai
Last edited by cmccabe; 03-17-2019 at 10:04 AM..
Reason: added comments
Hi Peasant,
Here is a slightly modified version of your script that just uses options and variable expansions defined by the POSIX standards. But, of course, it still depends on us knowing the pathname of a shell that provides those standard variable expansions. (Note that find isn't needed for this; we can get what we need just using shell pathname expansions.)
Code:
#!/bin/bash
set -xv
# we are running this from ROOTDIR, or abort.
ROOTDIR=/home/cmccabe/rename
cd $ROOTDIR || exit 1
# matching name "R_2019*" to operate on desired directory names, expand this to be precise.
for RDIR in R_2019*/
do
TRIMSTR=${RDIR%%-v5.6*}
for FLN in $RDIR*.bam # Note that RDIR contains a trailing /.
do
FLNSUB=${FLN%_R_2019_*}
mv "${FLN}" "$FLNSUB.bam"
# Use the fact that .bam and .bam.bai files are paired.
mv "${FLN}.bai" "$FLNSUB.bam.bai"
done
# Now we shall rename the folder, after files inside have been renamed.
mv "$RDIR" "${TRIMSTR}"
done
Hi cmccabe,
If Peasant's script (modified as suggested in post #5 worked for you, the script above should also work and should even run a little bit faster since it doesn't need to invoke find to get the job done.
I hope this helps,
Don
These 2 Users Gave Thanks to Don Cragun For This Post:
Thank you both .... both scripts work great. I guess I don't understand "" and have noticed it makes a difference in the output but have to read more about it. I always thought it was for escaping a whitespace in a filename or variable. Is that not true? Thanks again .
In the bash below I am trying to create sub-directories inside a directory from files with specific .bam extensions. There may be more then one $RDIR ing the directory and the .bam file(s) are trimmed (removing the extension and IonCode_0000_) and the result is the folder name that is saved in... (2 Replies)
Could it be possible to find common lines between all of the files in one folder? Just like comm -12 . So all of the files two at a time. I would like all of the outcomes to be written to a different files, and the file names could be simply numbers - 1 , 2 , 3 etc. All of the file names contain... (19 Replies)
Hi! I would like to comm -12 with one file and with all of the files in another folder that has a 100 files or more (that file is not in that folder) to find common text lines. I would like to have each case that they have common lines to be written to a different output file and the names of the... (6 Replies)
Hi
I have a requirement like this:
/abc/a/x.txt
/abc/a/y.txt
/abc/b/x.gz
/abc/b/y.txt
I need output like this:
/abc/a:*.txt
/abc/b:*.txt
/abc/b:*.gz
I have tried find /abc -type f -name "*.*" ||awk -F . '{print $NF}' it is print only extensions without path name.
Please... (5 Replies)
I have a specific set (all ending with .bam) of downloaded files in a directory /home/cmccabe/Desktop/NGS/API/2-15-2016. What I am trying to do is use a match to $2 in name to rename the downloaded files. To make things a more involved the date of the folder is unique and in the header of name... (1 Reply)
Hi All,
Can you please provide some pointers to move files from Base path to multiple paths in efficient way.Folder Structure is already created.
/Path/AdminUser/User1/1111/Reports/aaa.txt to /Path/User1/1111/Reports/aaa.txt
/Path/AdminUser/User1/2222/Reports/bbb.txt to... (6 Replies)
I have the following files in the dir /home/krishna/datatemp
abc.xml
cde.xml
asfd.txt
asdf_20120101-1.xml
asdf_20120101-2.xml
asdf_20120101-3.xml
asdf_20120101-4.xml
Now I need to move the files having the pattern asdf_20120101-*.xml to the dir /home/krishna/dataout with the extn as... (1 Reply)
Hi,
I need command to display files with full path and date of files where are generated at every 5hrs in a folder.
eg:
/u01/app/test/orjthsd_1_1 Sun May 10 19:03:26 2009
/u01/app/test/weoiusd_1_1 Sun May 10 21:00:26 2009
thanks
saha (3 Replies)
Hello,
I am an amature at UNIX commands and functionality.
Please could you all assist me by replying to my below mentioned querry :
How can I upload a zip folder on a unix path from my windows folder?
Thanks guys
Cheers (2 Replies)
Dear Chaps,
What will I do if, I am not sure about the length of the file name, but only one thing that I want to remove only the last extension.
e.g. abcdXXXXXX.pqrXXXXX.asc (X is any character)
I want to trim only .asc (or,watever) so that resultant file name would be like... (1 Reply)