FILE_ID extraction from file name and save it in CSV file after looping through each folders

09-14-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by princetd001

i executed your script, i got result , but not placed as comma seperated

can you help me to resolve it .

I tried your line of code : echo ${id/~(+E)^[0]+/} $name
it does not work for me , it gives sh[10]: ${id/~(+E)^[0]+/}: bad substitution

script
#!/usr/bin/env ksh
OUTFILE=test.txt
find 20[0-1][0-9] -type f | while read path
do
name=${path##*/}
name=${name%.trns*}
id=${name%_*}
id=${id##*_}
id=${id##*000}
echo "id: $id"
echo "file name: $name"
done > ${OUTFILE}
exit

MY SCRIPT RESULT
id: 20532
file name: sasmm_fsbc_durds_id00020532_t20120112192606.dat
id: 20533
file name: sasmm_fsbc_durds_id00020533_t20120212192606.dat
id: 20534
file name: sasmm_fsbc_durds_id00020534_t20120312192606.dat

This i show i got the result

but I need it in comma seprated csv file.

I need to address one more issue ; that is each folder may contain another file with same name but extention is different, as given below

sasmm_fsbc_durds_id00020227_t20120901005046.aud.trnsfr.gz
sasmm_fsbc_durds_id00020227_t20120901005046.dat.trnsfr.gz
sasmm_fsbc_durds_id00020228_t20120901015112.aud.trnsfr.gz
sasmm_fsbc_durds_id00020228_t20120901015112.dat.trnsfr.gz
sasmm_fsbc_durds_id00020229_t20120901025124.aud.trnsfr.gz

i want to omit file with file names ends with "aud.trnsfr.gz" ,only consider file name with "dat.trnsfr.gz"

how do I omit file ends with "aud.trnsfr.gz" while loop through folders and files?

If you really want a CSV, why don't any of your postings show desired output containing a comma? Anyway, the following minor change to my earlier posted script should meet your currently stated requirements:

Code:

!/bin/ksh
printf "file_id,file_name\n"
find 2[0-9][0-9][0-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path
do
        file=$(basename "$path" .trnsfr.gz)
        id=${file#sasmm_fsbc_durds_id000}
        id=${id%%_t*}
        printf "%s,%s\n" "$id" "$file"
done

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

09-14-2012

Registered User

25, 0

Join Date: Jun 2012

Last Activity: 17 May 2013, 12:56 PM EDT

Posts: 25

Thanks Given: 0

Thanked 0 Times in 0 Posts

I neeed one more solution for replacing a few characters in a file name. As I mentioned above, my folder structure and files are constructed as given in my previous posts.

Code:

 
folder years -> folder months -> folder days

In each day folder , there will files, and their file names are as mentioned above posts

Code:

sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz

Now I have to loop through each folder sequentially like year 2009,2010,2011 etc..and go in days folder and modify file names. from March 2011 to 26 jan 2012

Code:

 
sasmm_fsbc_durds_id00020079_t20110301010023.dat.trnsfr.gz

this is the first file is placed in the folder 2011-> 03-> 01

from this file onwards modify file name from

Code:

 
sasmm_fsbc_durds_id00020079_t20110301010023.dat.trnsfr.gz
 
sasmm_fsbc_durds_id0007111_t20110301010023.dat.trnsfr.gz

SECOND FILE file name modification

Code:

 
from 
sasmm_fsbc_durds_id00020080_t20110301020123.dat
to
sasmm_fsbc_durds_id0007112_t20110301020123.dat

Out here ; assigning 7111 to very first file and then incrementing by one to next files

It means ; I am looping through each file and replacing the character between id000 and _t to 7111 and incrementing by 1 sequentially.

Code:

 
sasmm_fsbc_durds_id00020318_t20110311022510.dat
sasmm_fsbc_durds_id00020319_t20110311032555.dat
sasmm_fsbc_durds_id00020320_t20110311042632.dat
sasmm_fsbc_durds_id00020321_t20110311052657.dat
sasmm_fsbc_durds_id00020322_t20110311062730.dat
 
will be modified into 
 
sasmm_fsbc_durds_id0007111_t20110311022510.dat
sasmm_fsbc_durds_id0007112_t20110311032555.dat
sasmm_fsbc_durds_id0007113_t20110311042632.dat
sasmm_fsbc_durds_id0007114_t20110311052657.dat
sasmm_fsbc_durds_id0007115_t20110311062730.dat

it goes on till it reaches the year 2012, month january[01] and date 26

Code:

 
from 2011 -> 03 -> 01
to   2012 -> 03 - >26

thanks in advance

princetd001

View Public Profile for princetd001

Find all posts by princetd001

09-14-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Before we start on a new request, please tell us if any of our suggestions did what you wanted before this last set of changes so we have some idea as to whether or not we have finally correctly interpreted what you're asking us to do.

Then please clarify your requirements:

Do you want only the files with names matching sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz to be renamed, or do you want every file matching id000[0-9]*_t to be renamed?
Do you want the file names to restart at 7111 in each directory processed, or do you want files in all of the directories processed to be treated as a single list numbered starting at 7111 and incrementing for each file processed?
Do you have a backup of this directory hierarchy in case something goes horribly wrong during the renaming process?
Are you absolutely sure that no file existing before this renaming process begins has a name that will match any new file name that will be created by this renaming process?
When you say you want these changes applied to files for dates:
Code:
```
from 2011 -> 03 -> 01
to   2012 -> 03 - >26
```
is that range inclusive or exclusive? (I'm assuming you want files in 2011/03/01 renamed, but it isn't at all clear whether you want files in 2012/03/26 renamed.)

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

09-16-2012

Registered User

25, 0

Join Date: Jun 2012

Last Activity: 17 May 2013, 12:56 PM EDT

Posts: 25

Thanks Given: 0

Thanked 0 Times in 0 Posts

1. Do you want only the files with names matching sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz to be renamed, or do you want every file matching id000[0-9]*_t to be renamed?

Quote:

All files matching sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz

2. Do you want the file names to restart at 7111 in each directory processed, or do you want files in all of the directories processed to be treated as a single list numbered starting at 7111 and incrementing for each file processed?

Quote:

I want to rename each file by incrementing starting from 7111 in all directories . After re-naming process, the files will be stayed in the same folder structure. We are just renaming files in the directories
march 01 2011 - 7111 -- > assuming there is only one file in this folder

march 02 2011 - 7112 --> this is in a directory of 2nd day in march folder and 2011 folder. again assuming there is only one file

if it goes on like with an assumption of one file in each day folder, there can be many files in each folder, this is just for sake of example.

after 30 days folders, in march , there is 30 dyas
apr 01 2011 - 7141; the file in day folder 01 , inside april folder and inside 2011 folder, the file_id in the file name is replaced with 7141.

3. Do you have a backup of this directory hierarchy in case something goes horribly wrong during the renaming process?

Quote:

I will take backup of all files

4. Are you absolutely sure that no file existing before this renaming process begins has a name that will match any new file name that will be created by this renaming process?

Quote:

no , this is freshly created IDs , no file names will conflicted ..

5. When you say you want these changes applied to files for dates:
Code:
---------
from 2011 -> 03 -> 01
to 2012 -> 03 - >26
---------
is that range inclusive or exclusive? (I'm assuming you want files in 2011/03/01 renamed, but it isn't at all clear whether you want files in 2012/03/26 renamed.)

Quote:

2011/03/01 [inclusive] to 2012/03/25(inclusive)

---------- Post updated at 08:24 AM ---------- Previous update was at 08:20 AM ----------

sasmm_fsbc_durds_id00020079_t20110301010023.dat.trnsfr.gz

you have mentoned id00020079 to id000[0-9]* ; does this [0-9] consider all digital numbers starting from 7111.

princetd001

View Public Profile for princetd001

Find all posts by princetd001

09-16-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

The following script creates a file containing the mv commands needed to rename the files as you requested, and then runs those commands, and removes that file. Before running this script, I strongly suggest commenting out the last two lines, run the modified script and verify that the command file created performs the file moves that you want to perform. This script is written using ksh, but it should also work with at least bash and sh.

Code:

#!/bin/ksh
newID=7111
find 2011/0[3-9] 2011/1[0-2] 2012/0[1-2] 2012/03/[01][0-9] 2012/03/2[0-5] \
    -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path
do
        oldID=${path##*id000}
        oldID=${oldID%_t*}
        newpath=${path%${oldID}_t*}$newID${path##*id000$oldID}
        newID=$((newID + 1))
        printf "mv \"%s\" \"%s\"\n" "$path" "$newpath"
done > mv_commands.$$
. mv_commands.$$
rm mv_commands.$$

---------- Post updated at 10:01 AM ---------- Previous update was at 09:44 AM ----------

Quote:

Originally Posted by princetd001

... ... ...
you have mentoned id00020079 to id000[0-9]* ; does this [0-9] consider all digital numbers starting from 7111.

I forgot to mention this in my last posting. Instead of the command:

Code:

printf "mv \"%s\" \"%s\"\n" "$path" "$newpath"

in the script in my last posting, I could have just used:

Code:

mv "$path" "$newpath"

but if there are enough files in one of the directories being processed it would be possible to end up unintentionally renaming one or more of the renamed files (possibly even creating an infinite loop of mv commands). This isn't likely since we're renaming files rather than creating additional files, but the standards don't guarantee that a file will be found at all nor that a file will only be found once if a directory is being changed while the find utility is processing that directory. Using the two step process given in my script avoids this possible complication.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

09-17-2012

Registered User

25, 0

Join Date: Jun 2012

Last Activity: 17 May 2013, 12:56 PM EDT

Posts: 25

Thanks Given: 0

Thanked 0 Times in 0 Posts

FILE_ID EXTRACTION from FILE_NAME DATE RANGE

in my first question, file_id extraction from file_name, if i need to extract file_ids in a range like January 26 2012 to today? How do I specify it?

---------- Post updated at 04:44 PM ---------- Previous update was at 04:40 PM ----------

Code:

 
!/bin/ksh
printf "file_id,file_name\n"
find 2[0-9][0-9][0-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path
do
        file=$(basename "$path" .trnsfr.gz)
        id=${file#sasmm_fsbc_durds_id000}
        id=${id%%_t*}
        printf "%s,%s\n" "$id" "$file"
done

I tested this solution , its working absolutely fine for file_id extraction..Thanks a lot Don..! In case , I want to extract file_id and file_name combination in a CSV file for a given date range ; for example Jan 26 2012 to today? Where do I need to make change and what would be the change?

Thanks..!

princetd001

View Public Profile for princetd001

Find all posts by princetd001

09-17-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by princetd001

in my first question, file_id extraction from file_name, if i need to extract file_ids in a range like January 26 2012 to today? How do I specify it?

---------- Post updated at 04:44 PM ---------- Previous update was at 04:40 PM ----------

Code:

 
!/bin/ksh
printf "file_id,file_name\n"
find 2[0-9][0-9][0-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path
do
        file=$(basename "$path" .trnsfr.gz)
        id=${file#sasmm_fsbc_durds_id000}
        id=${id%%_t*}
        printf "%s,%s\n" "$id" "$file"
done

I tested this solution , its working absolutely fine for file_id extraction..Thanks a lot Don..! In case , I want to extract file_id and file_name combination in a CSV file for a given date range ; for example Jan 26 2012 to today? Where do I need to make change and what would be the change?

Thanks..!

The current script selects all directories for years 2000 through 2999 (this comes from the 2[0-9][0-9][0-9] to select those directories in the find command). So for January 26-31, 2012 you need 2012/01/2[6-9] and 2012/01/3[01] and for February 1, 2012 through today you can use 2012/0[2-9]. So replacing:

Code:

find 2[0-9][0-9][0-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path

in my script with:

Code:

find 2012/01/2[6-9] 2012/01/3[01] 2012/0[2-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path

will give you that restricted range.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

FILE_ID extraction from file name and save it in CSV file after looping through each folders

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data extraction and converting into .csv file.

Discussion started by: abhi_123

2. Shell Programming and Scripting

Save output of updated csv file as csv file itself, part 2

Discussion started by: refrain

3. Shell Programming and Scripting

Save output of updated csv file as csv file itself

Discussion started by: refrain

4. Shell Programming and Scripting

CSV file data extraction

Discussion started by: nanduri

5. Shell Programming and Scripting

need to save the space when converting to CSV file

Discussion started by: wintersnow2011

6. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Discussion started by: arvindosu

7. Shell Programming and Scripting

select data from oracle table and save the output as csv file

Discussion started by: rdhanek

8. Shell Programming and Scripting

Data fetched from text file and save in a csv file

Discussion started by: rohitkalia

9. Shell Programming and Scripting

how to start looping from the second line in .csv file

Discussion started by: codeman007