Add unique identifier from file to filetype in directory


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Add unique identifier from file to filetype in directory
# 8  
Old 11-27-2016
Please, try the following. If you like the output, comment the line that print and remove the comment on the line that says move.
This script parses the input file and works on each directory identified as the last line in each paragraph.

Code:
#!/usr/bin/perl
#
use strict;
use warnings;
use File::Copy;

{
    $/ = "\n\n";
    while(<>) {
        work_space();
    }
}

sub work_space{
    my @lines = split /\n/;
    my $curr_dir = pop @lines;
    for my $info (@lines) {
        my ($pattern, $new_name) = split /\s+/, $info;
        change_filename($curr_dir, $pattern, $new_name);
    }
}

sub change_filename {
    my ($c_dir, $pat, $new) = @_;
    my $base_path = "/home/cmccabe/Desktop/index"; # change me if needed.
    my $working_path = "$base_path/$c_dir";

    opendir my $dir, "$working_path" || return;
    my @files = grep { /$pat.*\.(bam|vcf)/ && -f "$working_path/$_" } readdir $dir;
    for my $f (@files) {
        my ($ext) = $f =~ /(\..*)$/;
        # For testing purposes. Comment this line and remove the # on the one after.
        print "$working_path/$f => GETS RENAMED TO => $working_path/$new$ext\n";
        #move "$working_path/$f", "$working_path/$new$ext";
    }
}


Code:
perl identifier.pl input

Code:
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_007.bam => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV21.bam
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_007.vcf => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV21.vcf
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_007.bam.bai => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV21.bam.bai
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_008.bam => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV22.bam
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_008.vcf => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV22.vcf
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_008.bam.bai => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV22.bam.bai
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_009.bam => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV23.bam
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_009.vcf => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV23.vcf
./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/IonXpress_009.bam.bai => GETS RENAMED TO => ./R_2016_09_21_14_01_15_user_S5-00580-9-Medexome/MEV23.bam.bai


Last edited by Aia; 11-27-2016 at 11:34 PM.. Reason: Comment about the script.
This User Gave Thanks to Aia For This Post:
# 9  
Old 11-28-2016
You might note that in post #1 you said:
Quote:
files in /home/cmccabe/Desktop/index/R_2016_09_21_14_01_15_user_S5-00580-9-Medexome

Code:
MEV21.bam
MEV21.vcf
MEV22.bam
MEV22.vcf
MEV23.bam
MEV23.vcf

desired output
Code:
IonXpress_007.bam
IonXpress_007.vcf
IonXpress_008.bam
IonXpress_008.vcf
IonXpress_009.bam
IonXpress_009.vcf

while in post #7 you said:
Quote:
Files in directory being updated in dir: R_2016_09_21_14_01_15_user_S5-00580-9-Medexome
Code:
IonXpress_007.bam
IonXpress_007.vcf
IonXpress_007.bam.bai
IonXpress_008.bam
IonXpress_008.vcf
IonXpress_008.bam.bai
IonXpress_009.bam
IonXpress_009.vcf
IonXpress_009.bam.bai

input
Code:
IonXpress_001 MEC2
IonXpress_002 MEC3
IonXpress_003 MEV48
R_2016_10_21_09_52_37_user_S5-00580-10-Medexome

IonXpress_007 MEV21
IonXpress_008 MEV22
IonXpress_009 MEV23
R_2016_09_21_14_01_15_user_S5-00580-9-Medexome  --- line matches dir

The identifier is in $2 of the input file. That is what the file in dir should be updated with. The $1 value will match the file name (before it is renamed).

So using the dir in the example:
Code:
MEV21.bam
MEV21.vcf
MEV21.bam.bai
MEV22.bam
MEV22.vcf
MEV22.bam.bai
MEV23.bam
MEV23.vcf
MEV23.bam.bai

So, your original request was to rename MEV21.bam to IonXpress_007.bam and MEV21.vcf to IonXpress_007.vcf.

Now the request is to rename IonXpress_007.bam to MEV21.bam, IonXpress_007.vcf to MEV21.vcf, and IonXpress_007.bam.bai to MEV21.bam.bai.

In other words the request has changed from moving two files with fixed length suffixes per prefix to moving three files with varying length suffixes per prefix and the direction of movement has changed from moving from $2 to $1 to moving from $1 to $2.

There is absolutely no reason given for adding all of the rename commands into your script which seem to be complete no-ops (assuming that the files that are in your folder have the names you say they have with just a prefix found in $1 in input and one of the three suffixes above).

Is there a reason why you need those rename commands in your script? Do your existing filenames contain additional characters between the prefixes in input and the three suffixes you want to process? If so, can a <period> ever be one of those additional characters?

Why is it so important that only one directory be processed at a time instead of renaming the files in all of the subdirectories in one run of your script?

Does Aia's suggestion do what you need? Or is something else still required?
This User Gave Thanks to Don Cragun For This Post:
# 10  
Old 11-28-2016
Quote:
Is there a reason why you need those rename commands in your script? Do your existing filenames contain additional characters between the prefixes in input and the three suffixes you want to process? If so, can a <period> ever be one of those additional characters?

Why is it so important that only one directory be processed at a time instead of renaming the files in all of the subdirectories in one run of your script?

Does Aia's suggestion do what you need? Or is something else still required?
My file names do contain many other additional characters that are removed in the rename. A <period> is not one of them however.

There are a few reasons to process only one directory at a time.
1. Since I am a clinical scientist it is important to only process and log one directory
2. Only the oldest directory is processed by the bash in the beginning
3. Since many of the lines repeat duplicates may exist, however the current directory is unique in input


Aia's code works except I am getting:
Code:
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 1.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 1.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 1.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 2.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 2.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 2.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 3.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 3.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 3.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 4.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 4.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 4.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 5.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 5.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 5.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 6.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 6.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 6.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 7.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 7.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 7.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 8.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 8.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 8.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 9.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 9.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 9.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 10.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 10.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 10.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 11.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 11.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 11.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 12.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 12.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 12.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 13.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 13.
readdir() attempted on invalid dirhandle $dir at /home/cmccabe/Desktop/NGS/scripts/identifier.pl line 29, <> chunk 13.

line 29 is the grep. Thank you very much Smilie

Below is the bash used to run the code and ensure the current directory is used:
Code:
#!/bin/bash

# get oldest folder
dir=/home/cmccabe/Desktop/index
{
  read -r -d $'\t' time && read -r -d '' filename
} < <(find "$dir" -maxdepth 1 -mindepth 1 -printf '%T+\t%P\0' | sort -z )
printf "The oldest folder is $filename and was created on $time, analysis was performed using v1.3 of the medex pipeline by $USER at $(date "+%D %r")\n" >> /home/cmccabe/Desktop/index/log

# rename bam
cd /home/cmccabe/Desktop/index/$filename
   rename 's/^([^_]+_[^_]+)_.+$/$1.bam/' *.bam
   
# rename vcf files
cd /home/cmccabe/Desktop/index/$filename
   rename 's/^([^_]+_[^_]+)_.+$/$1.vcf/' *.vcf
   
# rename .bam.bai files
cd /home/cmccabe/Desktop/index/$filename
   rename 's/^([^_]+_[^_]+)_.+$/$1.bam.bai/' *.bam.bai

# add patient identifier to bam bam.bai and vcf
cd /home/cmccabe/Desktop/index/$filename
perl /home/cmccabe/Desktop/NGS/scripts/identifier.pl /home/cmccabe/s5_files/identifier/input


Last edited by cmccabe; 11-28-2016 at 11:09 AM.. Reason: added bash
# 11  
Old 11-28-2016
Those messages are for the directories in the input file that it can not read. I did not want to stop the program at the first time that entries in the input file does not contain an available path.

Please, replace || for or and they will go away.
Instead of:
Code:
opendir my $dir, "$working_path" || return;

This:
Code:
opendir my $dir, "$working_path" or return;

[/CODE]

Please, use the code outside the bash file. I read your other threads and the program does not require for you to rename them. It will work even if they are originally as:

Code:
IonXpress_007_MEVxx_R_2016_11_18_10_45_10_user_S5-00580-14-Medexome.bam
IonXpress_008_MEVxy_R_2016_11_18_10_45_10_user_S5-00580-14-Medexome.bam
IonXpress_009_MEVxz_R_2016_11_18_10_45_10_user_S5-00580-14-Medexome.bam
IonXpress_007_MEVxx_R_2016_11_18_10_45_10_user_S5-00580-14-Medexome.vcf
IonXpress_008_MEVxy_R_2016_11_18_10_45_10_user_S5-00580-14-Medexome.vcf

The program does not require to be cd'ed into a particular directory nor does depend of any transformation. Just make sure that the directory lines are correct in the input file. And that these directories live under the path: /home/cmccabe/Desktop/index or you need to change that where is says "change as needed."

Last edited by Aia; 11-28-2016 at 12:07 PM..
This User Gave Thanks to Aia For This Post:
# 12  
Old 11-28-2016
The script does run without error now, however the files do not update in the directory. Below is the code I use the perl is run before the bash

Is this correct or am I missing something? Thank you for all of your help Smilie.

Code:
#!/bin/bash

# add identifier to bam bam.bai and vcf
perl /home/cmccabe/Desktop/NGS/scripts/identifier.pl /home/cmccabe/s5_files/identifier/input

# get oldest folder
dir=/home/cmccabe/Desktop/index
{
  read -r -d $'\t' time && read -r -d '' filename
} < <(find "$dir" -maxdepth 1 -mindepth 1 -printf '%T+\t%P\0' | sort -z )
printf "The oldest folder is $filename and was created on $time, analysis was performed using v1.3 of the medex pipeline by $USER at $(date "+%D %r")\n" >> /home/cmccabe/Desktop/index/log

/home/cmccabe/s5_files/identifier/input
Code:
IonXpress_001 MEC2
IonXpress_002 MEC3
IonXpress_003 MEV53
R_2016_11_10_10_37_08_user_S5-00580-12-Medexome

IonXpress_004 MEV49
IonXpress_005 MEV50
IonXpress_006 MEV51
R_2016_10_21_12_39_06_user_S5-00580-11-Medexome

IonXpress_001 MEC2
IonXpress_002 MEC3
IonXpress_003 MEV48
R_2016_10_21_09_52_37_user_S5-00580-10-Medexome

IonXpress_007 MEV21
IonXpress_008 MEV22
IonXpress_009 MEV23
R_2016_09_21_14_01_15_user_S5-00580-9-Medexome

IonXpress_001 MEC1
IonXpress_002 MEC32
IonXpress_003 MEC33
R_2016_09_21_11_26_19_user_S5-00580-8-Medexome

current directory /home/cmccabe/Desktop/index/R_2016_09_21_11_26_19_user_S5-00580-8-Medexome
Code:
IonXpress_001_MEVxx_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam
IonXpress_001_MEVxx_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.vcf
IonXpress_001_MEVxx_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam.bai
IonXpress_002_MEVxy_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam
IonXpress_002_MEVxy_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.vcf
IonXpress_002_MEVxy_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam.bai
IonXpress_003_MEVxw_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam
IonXpress_003_MEVxw_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.vcf
IonXpress_003_MEVxw_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam.bai

after script is run files are renamed to:
Code:
MEC1.bam
MEC1.vcf
MEC1.bam.bai
MEV32.bam
MEV32.vcf
MEV32.bam.bai
MEV33.bam
MEV33.vcf
MEV33.bam.bai

# 13  
Old 11-28-2016
You have shown at the end of the post #12 some files renamed. Is not that what you want?

But previously you said:
Quote:
however the files do not update in the directory.
Are you referring that there are no logs in /home/cmccabe/Desktop/index/log?

Otherwise, I am not understanding you.
This User Gave Thanks to Aia For This Post:
# 14  
Old 11-28-2016
after script is run files are renamed to: ---- desired output

Code:
MEC1.bam
MEC1.vcf
MEC1.bam.bai
MEV32.bam
MEV32.vcf
MEV32.bam.bai
MEV33.bam
MEV33.vcf
MEV33.bam.bai

but what the directory looks like currently after the script is run.
Code:
IonXpress_001_MEVxx_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam
IonXpress_001_MEVxx_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.vcf
IonXpress_001_MEVxx_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam.bai
IonXpress_002_MEVxy_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam
IonXpress_002_MEVxy_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.vcf
IonXpress_002_MEVxy_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam.bai
IonXpress_003_MEVxw_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam
IonXpress_003_MEVxw_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.vcf
IonXpress_003_MEVxw_R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.bam.bai

Maybe I typed something wrong, but I do get a log in the index directory. Thank you Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to create new directory by date followed by identifier and additional subdirectories

I have a bash that downloads a list and if that list has data in it then a new main directory is created (with the date) with several subdirectories (example1, example2, example3). My question is in that list there are portion of specific file types (.vcf.gz) - identifier towards the end that have... (0 Replies)
Discussion started by: cmccabe
0 Replies

2. UNIX for Advanced & Expert Users

File command return wrong filetype while file holds group separator char.

hi, I am trying to get the FileType using the File command. I have one file, which holds Group separator along with ASCII character. It's a Text file. But when I ran the File command the FileType is coming as "data". It should be "ASCII, Text file". Is the latest version of File... (6 Replies)
Discussion started by: Arpitak29
6 Replies

3. Shell Programming and Scripting

Change everything in a file that maps to {module::name.filetype} to _modules/name/applicat

path = content.txt filename = application directory = _modules define create $(eval from := $(shell echo $$1)) \ $(eval to := $(shell echo $$2)) \ sed -i '' 's/$(from)/$(to)/g' content.txt endef all: clear $(eval modules := $(shell egrep -o "{module+\}" $(path))) ... (1 Reply)
Discussion started by: bmson
1 Replies

4. Shell Programming and Scripting

HPUX find string in directory and filetype and replace string

Hi, Here's my dilemma. I need to replace the string Sept_2012 to Oct_2012 in all *config.py files within the current directory and below directories Is this possible? Also I am trying to find all instances of the string Sept_2012 within files in the current directory and below I have... (13 Replies)
Discussion started by: pure_jax
13 Replies

5. Shell Programming and Scripting

Change unique file names into new unique filenames

I have 84 files with the following names splitseqs.1, spliseqs.2 etc. and I want to change the .number to a unique filename. E.g. change splitseqs.1 into splitseqs.7114_1#24 and change spliseqs.2 into splitseqs.7067_2#4 So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies

6. Shell Programming and Scripting

Unique files in a given directory

I keep all my files on a NAS device and copy files from it to usb or local storage when needed. The bad part about this is that I often have the same file on numerous places. I'd like to write a script to check if the files in a given directory exist in another. An example: say I have a... (7 Replies)
Discussion started by: cue
7 Replies

7. Shell Programming and Scripting

get part of file with unique & non-unique string

I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM". I can find the line number for the beginning of the statement section with sed. ... (5 Replies)
Discussion started by: andrewsc
5 Replies

8. Shell Programming and Scripting

Unique Directory and Folder Deletion Script

Ok, so I just got charged with the task of deleting some 300 user folders in a FTP server to free up some space. I managed to grep and cut the list of user folders to delete into a list of one user folder per line. Example: bob00 jane01 sue03 In the home folder, there are folders a-z, and... (5 Replies)
Discussion started by: b4sher
5 Replies

9. UNIX for Dummies Questions & Answers

Shell Script Unique Identifier Question

i All I have scripting question. I have a file "out.txt" which is generated by another script the file contains the following my_identifier8859574 logout The number is generated in the script and I have put the my_identifier bit in front of it as a unique identifier I now have... (7 Replies)
Discussion started by: grahambo2005
7 Replies

10. UNIX for Dummies Questions & Answers

Directory Inode Number Not Unique

Hi, I know that inode for each file is unique, but is it the for the directory? So far I found different directories has the same inode nubmer when you do ls -i, could some one explain why? Thanks a lot. (9 Replies)
Discussion started by: nj302
9 Replies
Login or Register to Ask a Question