Rename portion of file based on another file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Rename portion of file based on another file
# 1  
Old 12-19-2011
Rename portion of file based on another file

Hello,

I've been searching and reading, but I can't figure out how to solve this problem with my newbie skills.

In my directory, I have a list of files (see dirlist.txt attachment) that I need to merge and rename. I have part of the code of the code figured out (see below). However, I have no idea how to replace the barcode sequence (6 letter combination in the center) with the barcode number based on barcodekey.txt. awk seems like the command I need, but I can't figure out how.

Code:
find *gz | while IFS="_" read name lanename date sequence lane readpair number
do
    if [ ! -f "${lanename}_${sequence}_${readpair}.fastq.gz"]; then
        cat *_${lanename}_*_${sequence}_*_${readpair}* > ${lanename}_${sequence}_${readpair}.fastq.gz
    fi
done
unset IFS

Could you please help and explain how it works (that way I don't have to come back again for a minor change; eg, when we decide to change CRC1_bc05 to AA_01_T in 6 mo)?

As a bonus, is there a way to change CRC (in the lanename) to ColonTrio while leaving the 1 and not messing up the merge [ ! -f ...] part? Would you recommend doing it before the merge and change what the [!-f is looking for] Or should I just do sed at the very end after done?

Thank you so much,
anjulka
# 2  
Old 12-19-2011
Actually, you can do this with a few changes to your script by building a translation list from your .txt file and using that to change the sequence number. I assume that CRC1 was to become ColonTrio1; wasn't quite sure based on your description.

Code:
typeset -A xlate            # must declare an associative array
while read old new          # build a translation list from old to new
do
    xlate[$old]=$new       # translation table
done <barcodekey.txt     # read in from your old to new translation file

find *gz | while IFS="_" read name lanename date sequence lane readpair number
do
    if [[ ! -f ColonTrio1_${xlate[$sequence]}}_${readpair}.fastq.gz ]] #merge and rename 
    then
        echo "cat *_${lanename}_*_${sequence}_*_${readpair}* >ColonTrio1_${xlate[$sequence]}}_${readpair}.fastq.gz"
    fi
done

I put the 'real' commands in an echo for testing -- it will print out what it would do without really doing it so you can validate before doing damage with the mv command etc. There will probably be many more commands echoed as multiple files found that generate the same merged file will test true in the if, where when running things for real the first file that results in a merge will supress additional files which were merged in. (Hope that makes sense)

Last edited by agama; 12-19-2011 at 03:22 PM.. Reason: typo -- clarification
# 3  
Old 12-19-2011
Thank you, agama. That looks like an elegant solution.

Two unfortunate problems, however:
1. Our cluster uses bash 3.2.25(1)-release (sorry that I forgot to mention that in my original post). typeset will not let me use -A. -a doesn't seem to have the same effect. declare won't work either. :-(
2. Just putting ColonTrio1 into the output won't work, since I also have CRC2->ColonTrio2 and CRC3->ColonTrio3.
# 4  
Old 12-19-2011
I don't usually use bash, so I didn't realise this before... seems that the script will work under bash without the typeset -A; have a try and see what happens. I tested it under bash, but didn't notice the complaint about the bad option to typeset.

As for the CRCn issue, I should have suggested this, but thought they were all *1 (I only looked at the first few filenames in your sample, so this is likely my fault). You can use something like this:

Code:
lane_n=${lanename:3:1}   # add this just before the cat command

and when substituting use:
Code:
ColonTrio${lane_n}_

which should take the number from the CRC and append it to ColonTrio.


EDIT: I tested this with bash version 3.2.39 -- if taking away the typeset all together doesn't work, it may be a version difference and another work around will be needed.

---------- Post updated at 15:59 ---------- Previous update was at 15:43 ----------

Here's a solution that should work -- avoids using an associative array in the shell:

Code:
while read old new
do
    eval $old=\"$new\"    # build a list of variables named using old with value that is new. 
done <barcodekey.txt 


find *gz | while IFS="_" read name lanename date sequence lane readpair number
do
    eval xlation=\"\$$sequence\"  # use the old name as a variable to get it's translation.
    if [[ ! -f ColonTrio1_${xlation}_${readpair}.fastq.gz ]]    # not already renamed
    then
        lane_n=${lanename:3:1}    # save the n after CRC
        echo "cat *_${lanename}_*_${sequence}_*_${readpair}* >ColonTrio${lane_n}_${xlation}_${readpair}.fastq.gz"
    fi
done


Last edited by agama; 12-19-2011 at 04:45 PM.. Reason: additional info
This User Gave Thanks to agama For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to rename portion of file using match to another

In the portion of bash below I am using rename to match the $id variable to $file and when a match (there will alwsys be one) is found then the $id is removed from each bam and bam.bai in $file and _test is added to thee file name before the extension. Each of the variables is set correctly but... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

Bash to copy file 3 times and rename based on another file

In the below bash I am trying to copy the only text file (always only one) in /home/cmccabe/Desktop/list/QC/metrics.txt and rename each of the 3 text files according to /home/cmccabe/Desktop/test/list.txt using lines 3, 4 ,5. This format (that is list.txt) is always 5 lines. Thank you :). ... (12 Replies)
Discussion started by: cmccabe
12 Replies

3. Shell Programming and Scripting

Rename files based on name in text file

Hello, I have a text file "file.list" with the contents below. file1 filename1 file2 filename2 file3 filename3 file1, file2 and file3 are files existing in the same directory as the text file file.list. I want to rename file1 to filename1, file2 to filename2, as show in the text... (1 Reply)
Discussion started by: james2009
1 Replies

4. Shell Programming and Scripting

Remove or rename based on contents of file

I am trying to use the two files shown below to either remove or rename contents in one of those files. If in file1.txt $5 matches $5 of file2.txt and the value in $1 of file1.txt is not "No Match" then that value is substituted for all values in $5 and $1 of file2.txt. If however in $1 ... (5 Replies)
Discussion started by: cmccabe
5 Replies

5. Shell Programming and Scripting

Need to rename file based on name in same file

Good day. I need a one-liner (if possible) otherwise a longer script to do the following: I have a list of files in a directory with the same filename extension, ie. firstfile.cks, anotherfile.cks, somefile.cks, etc.The filename and extension, however, are incorrect. The correct filename... (7 Replies)
Discussion started by: BRH
7 Replies

6. Shell Programming and Scripting

Rename folder based on containing XML file

Hi everyone. I'm in need of a solution where i need to rename a folder to a name that's inside an XML file in that folder. OS is Ubuntu 9.10 with Gnome. I've tried using grep, sed and xpath, but can't seem to find a solution. This is the simplified folder structure: FOLDER-NAME -... (4 Replies)
Discussion started by: CoolCow
4 Replies

7. UNIX for Advanced & Expert Users

need to get a portion of entries in file based on a criteria --- Help please

All, Below is the file, what i need to do is take the text in between the /*-- and --*/ , i mean the jobs. Then i have grep for system name . If the job is there in system 1 i have to print to a file. Basically i want to take all the jobs that are in system1 to another file . because... (7 Replies)
Discussion started by: arunkumar_mca
7 Replies

8. Shell Programming and Scripting

mv command to rename multiple files that retain some portion of the original file nam

Well the title is not too good, so I will explain. I need to move (rename) files using a simple AIX script. ???file1.txt ???file2.txt ???file1a.txt ???file2a.txt to be: ???renamedfile1'date'.txt ???renamedfile2'date'.txt ???renamedfile1a'date'.txt ???renamedfile2a'date'.txt ... (4 Replies)
Discussion started by: grimace15
4 Replies

9. UNIX for Dummies Questions & Answers

Rename file based on first 3 characters of data in file

I'm looking to determine if I can use a grep command to read file and rename the file based on the first 3 characters of the data in the file. An example is: Read FileA If the first 3 positions of the data in the file are "ITP", then rename the file as FileA_ITP, else if the first 3... (3 Replies)
Discussion started by: jchappel
3 Replies

10. Shell Programming and Scripting

Select a portion of file based on query

Hi friends :) I am having a small problem and ur help is needed... I have a long file from which i want to select only some portions after filtering (grep). My file looks like : header xxyy lmno xxyy wxyz footer header abcd xy pqrs footer . . (14 Replies)
Discussion started by: vanand420
14 Replies
Login or Register to Ask a Question