Rename file in directory using contents within each file


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Rename file in directory using contents within each file
# 1  
Rename file in directory using contents within each file

In the below there are two generic .vcf files (genome.S1.vcf and genome.S2.vcf) in a directory. There wont always be two genaric files but I am trying to use bash to rename each of these generic files with specfic text (unique identifier) within in each .vcf. The text will always be different, but it will always be in the same position (after the word FORMAT) on the same line (that starts with #). Each .vcf is tab-delimited, not sure if my attempt is the best way, but hopefully it helps. Thank you Smilie.


genome.S1.vcf
Code:
...
...
...
##FILTER=<ID=NotGenotyped,Description="Locus contains forcedGT input alleles which could not be genotyped">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NAME1_S1
chr10	323215	.	A	.	.	LowGQX	END=323313;BLOCKAVG_min30p3a	GT:GQX:DP:DPF:MIN_DP	.:.:0:0:0
chr10	323314	.	C	.	.	LowGQX;LowDepth	END=323397;BLOCKAVG_min30p3a	GT:GQX:DP:DPF:MIN_DP	0/0:3:1:0:1

genome.S2.vcf
Code:
...
...
...
##FILTER=<ID=NotGenotyped,Description="Locus contains forcedGT input alleles which could not be genotyped">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	11-1111-ID_S5
chr10	323215	.	A	.	.	LowGQX	END=323313;BLOCKAVG_min30p3a	GT:GQX:DP:DPF:MIN_DP	.:.:0:0:0
chr10	323314	.	C	.	.	LowGQX;LowDepth	END=323385;BLOCKAVG_min30p3a	GT:GQX:DP:DPF:MIN_DP	.:.:0:0:0

desired (each vcf in directory renamed with unique identifier)

NAME1_S1.vcf
Code:
...
...
...
##FILTER=<ID=NotGenotyped,Description="Locus contains forcedGT input alleles which could not be genotyped">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NAME1_S1
chr10	323215	.	A	.	.	LowGQX	END=323313;BLOCKAVG_min30p3a	GT:GQX:DP:DPF:MIN_DP	.:.:0:0:0
chr10	323314	.	C	.	.	LowGQX;LowDepth	END=323397;BLOCKAVG_min30p3a	GT:GQX:DP:DPF:MIN_DP	0/0:3:1:0:1

11-1111-ID_S5.vcf
Code:
...
...
...
##FILTER=<ID=NotGenotyped,Description="Locus contains forcedGT input alleles which could not be genotyped">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	11-1111-ID_S5
chr10	323215	.	A	.	.	LowGQX	END=323313;BLOCKAVG_min30p3a	GT:GQX:DP:DPF:MIN_DP	.:.:0:0:0
chr10	323314	.	C	.	.	LowGQX;LowDepth	END=323385;BLOCKAVG_min30p3a	GT:GQX:DP:DPF:MIN_DP	.:.:0:0:0

bash
Code:
cd /path/to/files
for f in *.vcf ; do # loop through all vcf files
    new="$(head -1 "$f" | awk '{print $10}').vcf" # store value of $10 in new
    if [ ! -f "$new" ]; then # if original file doesn't match new
        echo -e Renaming $f to $new # log rename
        mv "$f" "$new" # rename original to new
    fi # close if
done # close loop


Last edited by cmccabe; 12-26-2019 at 08:07 PM..
# 2  
Hi
Try this
Code:
new=$(awk '/FORMAT/ {print $10}' $f)

or
Code:
new=$(sed /FORMAT/!d;s/^.*\t//' $f)

if the file is large, it’s useful to add "quit" command
Code:
new=$(sed /FORMAT/!d;s/^.*\t//;q' $f)

--- Post updated at 07:37 ---

You can try with one command
Code:
awk '/FORMAT/ {system("mv "FILENAME" "$10)}' *.vcf

This User Gave Thanks to nezabudka For This Post:
# 3  
I'm sorry, you should add a check for the existence of the file
Code:
awk '/FORMAT/ {system("mv -n "FILENAME" "$10)}' *.vcf

This User Gave Thanks to nezabudka For This Post:
# 4  
Note: -n (no clobber) for the mv command is non-standard extension to the POSIX standard. Alternatively, try using -i for interactive use (but some systems ignore -i when used in a non-interactive manner, so test this also), or (better) try testing for file existence beforehand.

Last edited by Scrutinizer; 12-27-2019 at 03:58 AM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
# 5  
Quote:
Originally Posted by nezabudka
Hi
.....................................................
You can try with one command
Code:
awk '/FORMAT/ {system("mv "FILENAME" "$10)}' *.vcf

Hello nez,

How are you?
I hope you are doing fine Smilie

For your this solution, if you ask me IMHO we could avoid using renaming of a Input_file with system while reading Input_file itself could cause issues. Since Input_file is being read and we are renaming it.

IMHO, I would go with approach where will check for string FORMAT in line and print the rename shell command(I am hoping each Input_file should have only 1 rename because once Input_file which is being read is renamed can't be find again in system, since no same name file is existing now).

So what I am doing here is I am printing shell commands by same condition used in your provided code as follows:

Code:
awk '/FORMAT/{print "if [[ -n " s1 FILENAME s1 " ]]; then      echo " s1 "Input_file named " FILENAME " is already present." s1 "; else      mv " s1 FILENAME s1 OFS $10"; fi"}' *.vcf

For a sample file named file3 output will be as follows.

Code:
if [[ -n file3 ]]; then      echo Input_file named file3 is already present.; else      mv file3 ; fi


Now above will print rename commands, if OP is happy with above commands we could use | bash to raname them.

Code:
awk '/FORMAT/{print "if [[ -n " s1 FILENAME s1 " ]]; then      echo " s1 "Input_file named " FILENAME " is already present." s1 "; else      mv " s1 FILENAME s1 OFS $10"; fi"}' *.vcf | bash

Apologies if I missed here something, I thought to give my views here, cheers Smilie


Thanks,
R. Singh
These 2 Users Gave Thanks to RavinderSingh13 For This Post:
# 6  
Hi and thanks
No, you can see what will happen
Code:
cat>>file.txt<<EOF
1
2
FORMAT new
3
4
EOF
cp file.txt file2.txt
ls
file.txt file2.txt
awk '/FORMAT/ {system("mv -n "FILENAME" "$2)}; {print $0, FILENAME}' *.txt
1 file2.txt
2 file2.txt
3 file2.txt
FORMAT new file2.txt
4 file2.txt #<<<-not changed
5 file2.txt
6 file2.txt
1 file.txt
2 file.txt
3 file.txt
FORMAT new file.txt
4 file.txt
5 file.txt
6 file.txt
ls
file.txt new

just open file descriptors
These 2 Users Gave Thanks to nezabudka For This Post:
# 7  
Hi Ravinder,

As long as a mv operation is performed on the same file system - as is the case here - that should not pose a problem, since mv then only manipulates directory data: A file name is nothing more than a directory entry, a pointer (a hard link) to the file itself.

When a process opens a file for reading, the operation system creates an entry (file descriptor) to represent that file and stores information about that opened file in memory. So then the directory entry is no longer used.

The mv operation is thus free to manipulate the directory entry.

So for the process that has opened and is reading the file, nothing changes as the directory data is being changed.
When it is done reading it just closes the file descriptor.

Also, the file list expanded by the glob is expanded before being passed to the awk script, so new file names are not passed to the script.

S.
These 2 Users Gave Thanks to Scrutinizer For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #581
Difficulty: Easy
In a typical unix-based system, everything is a file.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Rename the file name from Parent directory

Hi All, Just started learning unix and stuck into below issue. Suppose i have folder structure as below. Dir1/Dir2/Dir3/File1.msg I am looking to rename the file name from File1.msg to File2.msg but from the parent Dir1 From Dir3 i can easily run the command like mv File1.msg... (2 Replies)
Discussion started by: Gurjeet Singh
2 Replies

2. Shell Programming and Scripting

Rename specific file extension in directory with match to another file in bash

I have a specific set (all ending with .bam) of downloaded files in a directory /home/cmccabe/Desktop/NGS/API/2-15-2016. What I am trying to do is use a match to $2 in name to rename the downloaded files. To make things a more involved the date of the folder is unique and in the header of name... (1 Reply)
Discussion started by: cmccabe
1 Replies

3. Shell Programming and Scripting

Remove or rename based on contents of file

I am trying to use the two files shown below to either remove or rename contents in one of those files. If in file1.txt $5 matches $5 of file2.txt and the value in $1 of file1.txt is not "No Match" then that value is substituted for all values in $5 and $1 of file2.txt. If however in $1 ... (5 Replies)
Discussion started by: cmccabe
5 Replies

4. Shell Programming and Scripting

How to read contents in each file and rename the file?

Hello All, Can you help me in writing a script for reading the specific position data in a file and if that data found in that file that particular file should be renamed. Ex: Folder : C:\\test and Filename : CLSACK_112214.txt,CLSACK_112314.txt,CLSACK_112414.txt Contents in the file would... (3 Replies)
Discussion started by: nanduedi
3 Replies

5. Shell Programming and Scripting

Rename last directory in a file structure

I have to write a script to rename the every last sub-directory in a directory structure if the last sub-directory name doesn't contain "submitted". eg: given directory path:/u01/home/somedir somedir can have many subdirectories and each subdirectory inturn has many subdirectories. somedir... (3 Replies)
Discussion started by: ramse8pc
3 Replies

6. Shell Programming and Scripting

Rename multiple file names in a directory

I hope some one can help me I have multiple files in a directory with out extension like as below mentioned. But i want to change all the file names along .DDMMYYYYHHMISS format. And all files should have same DDMMYYYYHHMISS. Scenario: direcory name = /vol/best/srcfiles files in a... (4 Replies)
Discussion started by: hari001
4 Replies

7. UNIX for Dummies Questions & Answers

Help with searching for a file in a directory and copying the contents of that file in a new file

Hi guys, I am a newbie here :wall: I need a script that can search for a file in a directory and copy the contents of that file in a new file. Please help me. :confused: Thanks in advance~ (6 Replies)
Discussion started by: zel2zel
6 Replies

8. Shell Programming and Scripting

A script that will move a file to a directory with the same name and then rename that file

Hello all. I am new to this forum (and somewhat new to UNIX / LINUX - I started using ubuntu 1 year ago).:b: I have the following problem that I have not been able to figure out how to take care of and I was wondering if anyone could help me out.:confused: I have all of my music stored in... (7 Replies)
Discussion started by: marcozd
7 Replies

9. Shell Programming and Scripting

Read File and use contents to rename another

Hello guys, thank God that I found this forum. I hope that someone can help me because I don't have any idea on how to start it. I know that for some of you this is a very simple task but I'm not as advance on shell scripting like many people out there. I got this file with a permanent... (10 Replies)
Discussion started by: Shark Tek
10 Replies

10. Shell Programming and Scripting

SED To insert Directory Contents to file

I am on a HP-UX machine I have a directory called "/u01/blobs" and the files look like this: ls -1 7398 7399 7400 I need to produce a comma delimited file with the following format: filename,location/filename i.e: 7398,/u01/blobs/7398 7399,/u01/blobs/7399 7400,/u01/blobs/7400 What... (3 Replies)
Discussion started by: NomDeGuerre
3 Replies

Featured Tech Videos