Rename file using partial match to another


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Rename file using partial match to another
# 1  
Old 10-23-2019
Rename file using partial match to another

In the below I am trying to rename the contents within each data subfolder in a specific run, based on a partial match of the IonCode_0000_ in each file in the data subdirectory to $1 in f1. There will be multiple runs in f1 but each run in $uniq is unique and will be found in f1 and the rename values stored in $string. The below code is commented as to what I think is going on. Thank you Smilie.


f1
Code:
IonCode_0404 00-0000-xxx-xxx-xxx
IonCode_0402 11-1111-yy-yy-yyy
R_2019_00_00_00_00_00_xxxx_xx1-127-xxx_xxx_xxx_xxx_xx_xx_xx

IonCode_0402 22-2222-zz-zzzz-zzz
R_2019_00_00_00_00_00_xxxx_xx1-126-xxx_xxx_xxx_xxx_xx_xx_xx

IonCode_0404 10-0000-aa-aa-aa
IonCode_0412 55-1111-bb-bbb-bbb
R_2019_00_00_00_00_00_xxxx_xx1-120-xxx_xxx_xxx_xxx_xx_xx_xx

Code:
/path/to/run/R_2019_00_00_00_00_00_xxxx_xx1-127
      data   --- sub-folder ---
      IonCode_0402_xxx.xxx_xxx.bam
      IonCode_0402_xxx.xxx_xxx.bam.bai
      IonCode_0404_xxx.xxx_xxx.bam
      IonCode_0404_xxx.xxx_xxx.bam.bai

Code:
dir=/path/to/run/
for run in "$dir"/R_2019* ; do  ## # matching "R_2019*" to operate on desired directory and expand
  uniq=${run##*/}  ## store run with no path as s5
  cd "$dir"/"$uniq"/data  ## change directory to subfolder
   string=$(awk -F '\n' -v RS="" -v ref="$uniq" '$0 ~ ref {d=split($0, val, " "); for(i=2;i<d;i+=2) printf "%s ",val[i]; printf "\n"}' "$dir"/f1)  ## loop through f1 for unique run and store $2 in string
   for $f in "$dir"/"$s5"/data/*.bam* ; do sample_basename=$(basename "${f}") ;
     rename_file_path="$string" ## define rename string
     cmd=$(sed -n "/$f/,/IonCode_[0-9][0-9][0-9][0-9]_*/{s/\(.*\.bam\) \(.*\)/mv \1 \2/g}" $rename_file_path)  ## rename file in data subfolder matching IonCode_ to f1 and replacing with $2 of f1
  done
done

desired in data
Code:
11-1111-yy-yy-yyy_test.bam
11-1111-yy-yy-yyy_test.bam.bai
00-0000-xxx-xxx-xxx_test.bam
00-0000-xxx-xxx-xxx_test.bam.bai


Last edited by cmccabe; 10-24-2019 at 08:03 PM.. Reason: corrected typo
# 2  
Old 10-23-2019
You were pretty close on this. I used a here-string (<<<) piped into a while read block. As we want to change directories in the loop using a sub shell ensures the pwd is reset after each rename loop.

change echo mv in red below to mv if you are happy with what it's doing

Code:
dir=/path/to/run/
for run in "$dir"/R_2019* ; do  ## # matching "R_2019*" to operate on desired directory and expand
  uniq=${run##*/}  ## store run with no path as s5
  while read from to
  do
     (
       cd "$dir"/"$uniq"/data
       for file in *.bam*
       do
          newname=${file/$from/$to}
          [ -f "$file" ] && [ "$newname" != "$file" ] && echo mv "$file" "$newname"
       done
     )
  done <<<$(
     awk -F '\n' -v RS="" -v ref="$uniq" '
         $0 ~ ref {
             d=split($0, val);
             for(i=1;i<d;i++) print val[i];
          }' "$dir"/f1
  )  ## loop through f1 for unique run and populate from and to
done

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 10-24-2019
Looks like the last file in the directory is getting renamed with both matching not the unique. Is another loop needed or all the values on one line instead of separate? Thank you Smilie.


Code:
mv IonCode_0404_xxx.xxx_xxx.bam 00-0000-xxx-xxx-xxx IonCode_0402 11-1111-yy-yy-yyy_xxx.xxx_xxx.bam
mv IonCode_0404_xxx.xxx_xxx.bam.bai 00-0000-xxx-xxx-xxx IonCode_0402 11-1111-yy-yy-yyy_xxx.xxx_xxx.bam.bai

desired
Code:
11-1111-yy-yy-yyy_test.bam   ---- this is IonCode_0402_xxx.xxx_xxx ---
11-1111-yy-yy-yyy_test.bam.bai   ---- this is IonCode_0402_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam   ---- this is IonCode_0404_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam.bai ---- this is IonCode_0404_xxx.xxx_xxx ---


Last edited by cmccabe; 10-24-2019 at 01:52 PM.. Reason: fixed format
# 4  
Old 10-24-2019
It looks a bit complicated.
Perhaps you want to do the following?
Code:
dir=/path/to/run
ind=0
while read a b c
do
  if [ -n "$b" ]
  then
    fsearch[ind]=$a
    mvto[ind]=$b
    ((ind++))
  elif [ -z "$a" ]
  then
    ind=0
  else
    while [ $ind -gt 0 ]
    do
      ((ind--))
      echo "In $dir/$a/data/ rename ${fsearch[ind]}*.bam* to ${mvto[ind]}_test.bam*" 
    done
  fi
done < $dir/f1

This User Gave Thanks to MadeInGermany For This Post:
# 5  
Old 10-24-2019
Quote:
Originally Posted by cmccabe
Looks like the last file in the directory is getting renamed with both matching not the unique. Is another loop needed or all the values on one line instead of separate? Thank you Smilie.
What do you want to happen here? f1 requires that IonCode_0404 in directory *127* be renamed to both 10-0000-aa-aa-aa and 00-0000-xxx-xxx-xxx. Is this a mistake in the data file or how should the script handle this?

Quote:
Originally Posted by cmccabe
desired
Code:
11-1111-yy-yy-yyy_test.bam   ---- this is IonCode_0402_xxx.xxx_xxx ---
11-1111-yy-yy-yyy_test.bam.bai   ---- this is IonCode_0402_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam   ---- this is IonCode_0404_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam.bai ---- this is IonCode_0404_xxx.xxx_xxx ---

Change the newname assignment to this newname=${file/$from*.bam/${to}_test.bam}

Last edited by Chubler_XL; 10-24-2019 at 05:15 PM..
This User Gave Thanks to Chubler_XL For This Post:
# 6  
Old 10-24-2019
Looks like only one pair gets renamed with both values: Thank you Smilie.


4 original files:
Code:
IonCode_0402_xxx.xxx_xxx.bam
IonCode_0402_xxx.xxx_xxx.bam.bai
IonCode_0404_xxx.xxx_xxx.bam
IonCode_0404_xxx.xxx_xxx.bam.bai

Current:
Code:
IonCode_0402_xxx.xxx_xxx.bam
IonCode_0402_xxx.xxx_xxx.bam.bai
00-0000-xxx-xxx-xxx IonCode_0402 11-1111-yy-yy-yyy_test.bam.bam
00-0000-xxx-xxx-xxx IonCode_0402 11-1111-yy-yy-yyy_test.bam.bam.bai

Desired after rename:
Code:
11-1111-yy-yy-yyy_test.bam   ---- this is IonCode_0402_xxx.xxx_xxx ---
11-1111-yy-yy-yyy_test.bam.bai   ---- this is IonCode_0402_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam   ---- this is IonCode_0404_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam.bai ---- this is IonCode_0404_xxx.xxx_xxx ---

# 7  
Old 10-24-2019
I further qualified my question (see #5 above) - this appears to be a problem with the data file.

If the actual renames were done instead of echo, only the first match would apply as the file would then have a different name and the 2nd rename would not be attempted. Red lines will not occur as file has already be renamed on lines 1 and 2:

Code:
$ ./cmccabe_rename 
mv IonCode_0404_xxx.xxx_xxx.bam 00-0000-xxx-xxx-xxx_test.bam
mv IonCode_0404_xxx.xxx_xxx.bam.bai 00-0000-xxx-xxx-xxx_test.bam.bai
mv IonCode_0402_xxx.xxx_xxx.bam 11-1111-yy-yy-yyy_test.bam
mv IonCode_0402_xxx.xxx_xxx.bam.bai 11-1111-yy-yy-yyy_test.bam.bai
mv IonCode_0404_xxx.xxx_xxx.bam 10-0000-aa-aa-aa_test.bam
mv IonCode_0404_xxx.xxx_xxx.bam.bai 10-0000-aa-aa-aa_test.bam.bai


Last edited by Chubler_XL; 10-24-2019 at 05:26 PM..
This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to rename portion of file using match to another

In the portion of bash below I am using rename to match the $id variable to $file and when a match (there will alwsys be one) is found then the $id is removed from each bam and bam.bai in $file and _test is added to thee file name before the extension. Each of the variables is set correctly but... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

Partial Match and Replace

Hi, I have a tab delimited text file like this one. I need to do a partial match of a particular cell and then replace matches with an empty cell. So here is a sample: Smith FordMustang ChevroletCamaro Miller FordFiesta Jones KiaSorrento Davis ChevroletCamaro Johnson ToyotaHighlander I... (4 Replies)
Discussion started by: mikey11415
4 Replies

4. Shell Programming and Scripting

Rename files to match file list pattern

Hi All, I have 100 folders with the first delimiter has a unique name i.e (123_hello and 575_hello) and each folder have atlist 1000 plus files with naming convention i.e (575_hello_1.iso ... 575_hello_1000.iso). 575_hello/575_hello_1.iso 575_hello/575_hello_2.iso 575_hello/575_hello_3.iso... (8 Replies)
Discussion started by: lxdorney
8 Replies

5. Shell Programming and Scripting

Rename specific file extension in directory with match to another file in bash

I have a specific set (all ending with .bam) of downloaded files in a directory /home/cmccabe/Desktop/NGS/API/2-15-2016. What I am trying to do is use a match to $2 in name to rename the downloaded files. To make things a more involved the date of the folder is unique and in the header of name... (1 Reply)
Discussion started by: cmccabe
1 Replies

6. Shell Programming and Scripting

Match partial text

I posted the incorrect files yesterday and apologize. I also modified the awk script but with no luck. There are two text files in the zip (name.txt and output.txt). I am trying to match $2 in name.txt with $1 in output.txt and if they match then $1 of name.txt is copied to $7 of output.txt. ... (7 Replies)
Discussion started by: cmccabe
7 Replies

7. UNIX for Dummies Questions & Answers

How to substitute for the partial match?

Hi I have a question and hope I can get answer here. Thank you in advance. I have two files: file1: aa X bb Y cc Z file2: cc A bb B dd C aa D bb E If the 1st column match in both file1 and file2, the 2nd column in file2 will be replaced by the 2nd column in file1. If there is no... (2 Replies)
Discussion started by: yuejian
2 Replies

8. UNIX for Dummies Questions & Answers

Partial match in two files then substitute

Hi, I was trying to figure this out but failed so I hope someone here can help me, thank you in advance. I have two files. file1: aa M bb N cc O dd P ee Q file2: aa A_87_P254063 cc A_87_P016532 bb A_87_P104793 dd A_87_P055331 ee A_87_P059706 aa A_87_P071636 ee A_87_P028302... (2 Replies)
Discussion started by: yuejian
2 Replies

9. Shell Programming and Scripting

Using grep returns partial matches, I need to get an exact match or nothing

I’m trying to modify someone perl script to fix a bug. The piece of code checks that the zone name you want to add is unique. However, when the code runs, it finds a partial match using grep, and decides it already exists, so the “create” command exits. $cstatus = `${ZADM} list -vic | grep... (3 Replies)
Discussion started by: TKD
3 Replies

10. Shell Programming and Scripting

awk partial match and filter records

Hi, I am having file which contains around 15 columns, i need to fetch column 3,12,14 based on the condition that column 3 starts with 40464 this is the sample data how to achieve that (3 Replies)
Discussion started by: aemunathan
3 Replies
Login or Register to Ask a Question