FILE_ID extraction from file name and save it in CSV file after looping through each folders


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting FILE_ID extraction from file name and save it in CSV file after looping through each folders
# 8  
Old 09-14-2012
Quote:
Originally Posted by princetd001
i executed your script, i got result , but not placed as comma seperated

can you help me to resolve it .

I tried your line of code : echo ${id/~(+E)^[0]+/} $name
it does not work for me , it gives sh[10]: ${id/~(+E)^[0]+/}: bad substitution


script
#!/usr/bin/env ksh
OUTFILE=test.txt
find 20[0-1][0-9] -type f | while read path
do
name=${path##*/}
name=${name%.trns*}
id=${name%_*}
id=${id##*_}
id=${id##*000}
echo "id: $id"
echo "file name: $name"
done > ${OUTFILE}
exit




MY SCRIPT RESULT
id: 20532
file name: sasmm_fsbc_durds_id00020532_t20120112192606.dat
id: 20533
file name: sasmm_fsbc_durds_id00020533_t20120212192606.dat
id: 20534
file name: sasmm_fsbc_durds_id00020534_t20120312192606.dat


This i show i got the result

but I need it in comma seprated csv file.

I need to address one more issue ; that is each folder may contain another file with same name but extention is different, as given below

sasmm_fsbc_durds_id00020227_t20120901005046.aud.trnsfr.gz
sasmm_fsbc_durds_id00020227_t20120901005046.dat.trnsfr.gz
sasmm_fsbc_durds_id00020228_t20120901015112.aud.trnsfr.gz
sasmm_fsbc_durds_id00020228_t20120901015112.dat.trnsfr.gz
sasmm_fsbc_durds_id00020229_t20120901025124.aud.trnsfr.gz

i want to omit file with file names ends with "aud.trnsfr.gz" ,only consider file name with "dat.trnsfr.gz"


how do I omit file ends with "aud.trnsfr.gz" while loop through folders and files?
If you really want a CSV, why don't any of your postings show desired output containing a comma? Anyway, the following minor change to my earlier posted script should meet your currently stated requirements:
Code:
!/bin/ksh
printf "file_id,file_name\n"
find 2[0-9][0-9][0-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path
do
        file=$(basename "$path" .trnsfr.gz)
        id=${file#sasmm_fsbc_durds_id000}
        id=${id%%_t*}
        printf "%s,%s\n" "$id" "$file"
done

# 9  
Old 09-14-2012
I neeed one more solution for replacing a few characters in a file name. As I mentioned above, my folder structure and files are constructed as given in my previous posts.

Code:
 
folder years -> folder months -> folder days

In each day folder , there will files, and their file names are as mentioned above posts

Code:
sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz

Now I have to loop through each folder sequentially like year 2009,2010,2011 etc..and go in days folder and modify file names. from March 2011 to 26 jan 2012

Code:
 
sasmm_fsbc_durds_id00020079_t20110301010023.dat.trnsfr.gz

this is the first file is placed in the folder 2011-> 03-> 01

from this file onwards modify file name from
Code:
 
sasmm_fsbc_durds_id00020079_t20110301010023.dat.trnsfr.gz
 
sasmm_fsbc_durds_id0007111_t20110301010023.dat.trnsfr.gz

SECOND FILE file name modification

Code:
 
from 
sasmm_fsbc_durds_id00020080_t20110301020123.dat
to
sasmm_fsbc_durds_id0007112_t20110301020123.dat

Out here ; assigning 7111 to very first file and then incrementing by one to next files

It means ; I am looping through each file and replacing the character between id000 and _t to 7111 and incrementing by 1 sequentially.

Code:
 
sasmm_fsbc_durds_id00020318_t20110311022510.dat
sasmm_fsbc_durds_id00020319_t20110311032555.dat
sasmm_fsbc_durds_id00020320_t20110311042632.dat
sasmm_fsbc_durds_id00020321_t20110311052657.dat
sasmm_fsbc_durds_id00020322_t20110311062730.dat
 
will be modified into 
 
sasmm_fsbc_durds_id0007111_t20110311022510.dat
sasmm_fsbc_durds_id0007112_t20110311032555.dat
sasmm_fsbc_durds_id0007113_t20110311042632.dat
sasmm_fsbc_durds_id0007114_t20110311052657.dat
sasmm_fsbc_durds_id0007115_t20110311062730.dat

it goes on till it reaches the year 2012, month january[01] and date 26

Code:
 
from 2011 -> 03 -> 01
to   2012 -> 03 - >26

thanks in advance
# 10  
Old 09-14-2012
Before we start on a new request, please tell us if any of our suggestions did what you wanted before this last set of changes so we have some idea as to whether or not we have finally correctly interpreted what you're asking us to do.

Then please clarify your requirements:
  1. Do you want only the files with names matching sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz to be renamed, or do you want every file matching id000[0-9]*_t to be renamed?
  2. Do you want the file names to restart at 7111 in each directory processed, or do you want files in all of the directories processed to be treated as a single list numbered starting at 7111 and incrementing for each file processed?
  3. Do you have a backup of this directory hierarchy in case something goes horribly wrong during the renaming process?
  4. Are you absolutely sure that no file existing before this renaming process begins has a name that will match any new file name that will be created by this renaming process?
  5. When you say you want these changes applied to files for dates:
    Code:
    from 2011 -> 03 -> 01
    to   2012 -> 03 - >26

    is that range inclusive or exclusive? (I'm assuming you want files in 2011/03/01 renamed, but it isn't at all clear whether you want files in 2012/03/26 renamed.)
# 11  
Old 09-16-2012
1. Do you want only the files with names matching sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz to be renamed, or do you want every file matching id000[0-9]*_t to be renamed?

Quote:
All files matching sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz
2. Do you want the file names to restart at 7111 in each directory processed, or do you want files in all of the directories processed to be treated as a single list numbered starting at 7111 and incrementing for each file processed?

Quote:
I want to rename each file by incrementing starting from 7111 in all directories . After re-naming process, the files will be stayed in the same folder structure. We are just renaming files in the directories
march 01 2011 - 7111 -- > assuming there is only one file in this folder

march 02 2011 - 7112 --> this is in a directory of 2nd day in march folder and 2011 folder. again assuming there is only one file

if it goes on like with an assumption of one file in each day folder, there can be many files in each folder, this is just for sake of example.

after 30 days folders, in march , there is 30 dyas
apr 01 2011 - 7141; the file in day folder 01 , inside april folder and inside 2011 folder, the file_id in the file name is replaced with 7141.



3. Do you have a backup of this directory hierarchy in case something goes horribly wrong during the renaming process?

Quote:
I will take backup of all files
4. Are you absolutely sure that no file existing before this renaming process begins has a name that will match any new file name that will be created by this renaming process?

Quote:
no , this is freshly created IDs , no file names will conflicted ..
5. When you say you want these changes applied to files for dates:
Code:
---------
from 2011 -> 03 -> 01
to 2012 -> 03 - >26
---------
is that range inclusive or exclusive? (I'm assuming you want files in 2011/03/01 renamed, but it isn't at all clear whether you want files in 2012/03/26 renamed.)
Quote:

2011/03/01 [inclusive] to 2012/03/25(inclusive)
---------- Post updated at 08:24 AM ---------- Previous update was at 08:20 AM ----------

sasmm_fsbc_durds_id00020079_t20110301010023.dat.trnsfr.gz

you have mentoned id00020079 to id000[0-9]* ; does this [0-9] consider all digital numbers starting from 7111.
# 12  
Old 09-16-2012
The following script creates a file containing the mv commands needed to rename the files as you requested, and then runs those commands, and removes that file. Before running this script, I strongly suggest commenting out the last two lines, run the modified script and verify that the command file created performs the file moves that you want to perform. This script is written using ksh, but it should also work with at least bash and sh.
Code:
#!/bin/ksh
newID=7111
find 2011/0[3-9] 2011/1[0-2] 2012/0[1-2] 2012/03/[01][0-9] 2012/03/2[0-5] \
    -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path
do
        oldID=${path##*id000}
        oldID=${oldID%_t*}
        newpath=${path%${oldID}_t*}$newID${path##*id000$oldID}
        newID=$((newID + 1))
        printf "mv \"%s\" \"%s\"\n" "$path" "$newpath"
done > mv_commands.$$
. mv_commands.$$
rm mv_commands.$$

---------- Post updated at 10:01 AM ---------- Previous update was at 09:44 AM ----------

Quote:
Originally Posted by princetd001
... ... ...
you have mentoned id00020079 to id000[0-9]* ; does this [0-9] consider all digital numbers starting from 7111.
I forgot to mention this in my last posting. Instead of the command:
Code:
printf "mv \"%s\" \"%s\"\n" "$path" "$newpath"

in the script in my last posting, I could have just used:
Code:
mv "$path" "$newpath"

but if there are enough files in one of the directories being processed it would be possible to end up unintentionally renaming one or more of the renamed files (possibly even creating an infinite loop of mv commands). This isn't likely since we're renaming files rather than creating additional files, but the standards don't guarantee that a file will be found at all nor that a file will only be found once if a directory is being changed while the find utility is processing that directory. Using the two step process given in my script avoids this possible complication.
# 13  
Old 09-17-2012
FILE_ID EXTRACTION from FILE_NAME DATE RANGE

in my first question, file_id extraction from file_name, if i need to extract file_ids in a range like January 26 2012 to today? How do I specify it?

---------- Post updated at 04:44 PM ---------- Previous update was at 04:40 PM ----------

Code:
 
!/bin/ksh
printf "file_id,file_name\n"
find 2[0-9][0-9][0-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path
do
        file=$(basename "$path" .trnsfr.gz)
        id=${file#sasmm_fsbc_durds_id000}
        id=${id%%_t*}
        printf "%s,%s\n" "$id" "$file"
done

I tested this solution , its working absolutely fine for file_id extraction..Thanks a lot Don..! In case , I want to extract file_id and file_name combination in a CSV file for a given date range ; for example Jan 26 2012 to today? Where do I need to make change and what would be the change?

Thanks..!
# 14  
Old 09-17-2012
Quote:
Originally Posted by princetd001
in my first question, file_id extraction from file_name, if i need to extract file_ids in a range like January 26 2012 to today? How do I specify it?

---------- Post updated at 04:44 PM ---------- Previous update was at 04:40 PM ----------

Code:
 
!/bin/ksh
printf "file_id,file_name\n"
find 2[0-9][0-9][0-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path
do
        file=$(basename "$path" .trnsfr.gz)
        id=${file#sasmm_fsbc_durds_id000}
        id=${id%%_t*}
        printf "%s,%s\n" "$id" "$file"
done

I tested this solution , its working absolutely fine for file_id extraction..Thanks a lot Don..! In case , I want to extract file_id and file_name combination in a CSV file for a given date range ; for example Jan 26 2012 to today? Where do I need to make change and what would be the change?

Thanks..!
The current script selects all directories for years 2000 through 2999 (this comes from the 2[0-9][0-9][0-9] to select those directories in the find command). So for January 26-31, 2012 you need 2012/01/2[6-9] and 2012/01/3[01] and for February 1, 2012 through today you can use 2012/0[2-9]. So replacing:
Code:
find 2[0-9][0-9][0-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path

in my script with:
Code:
find 2012/01/2[6-9] 2012/01/3[01] 2012/0[2-9] -name 'sasmm_fsbc_durds_id000[0-9]*_t?*.dat.trnsfr.gz' | while read path

will give you that restricted range.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data extraction and converting into .csv file.

Hi All, I have a data file and need to extract and convert it into csv format: 1) Read and extract the line containing string ending with "----" (file sample_linebyline.txt file) and to make a .csv file from this. 2) To read the flat file flatfile_sample.txt which consists of similar data (... (9 Replies)
Discussion started by: abhi_123
9 Replies

2. Shell Programming and Scripting

Save output of updated csv file as csv file itself, part 2

Hi, I have another problem. I want to sort another csv file by the first field. result.csv SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw /home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92 ... (2 Replies)
Discussion started by: refrain
2 Replies

3. Shell Programming and Scripting

Save output of updated csv file as csv file itself

Hi, all I want to sort a csv file based on timestamp from oldest to newest and save the output as csv file itself. Here is an example of my csv file. test.csv SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21 /home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28... (10 Replies)
Discussion started by: refrain
10 Replies

4. Shell Programming and Scripting

CSV file data extraction

Hi I am writing a shell script to parse a CSV file , in which i am facing a problem to separate the columns . Could some one help me with it. IN301330/00001 pvavan kumar limited xyz@ttccpp.com IN302148/00002 PRECIOUS SECURITIES (P) LTD viash@yahoo.co.in IN300239/00000 CENTRE india... (8 Replies)
Discussion started by: nanduri
8 Replies

5. Shell Programming and Scripting

need to save the space when converting to CSV file

Hi, I have a text file with the following format. Some of the fields are blank. 1234 3456 23 45464 327837283232 343434 5654353 34 34343 3434345 434242 .... .... .... I need to convert this file to a CSV file, like 1234, ,23, ... (3 Replies)
Discussion started by: wintersnow2011
3 Replies

6. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

7. Shell Programming and Scripting

select data from oracle table and save the output as csv file

Hi I need to execute a select statement in a solaris environment with oracle database. The select statement returns number of rows of data. I need the data to be inserted into a CSV file with proper format. For that we normally use "You have to select all your columns as one big string,... (2 Replies)
Discussion started by: rdhanek
2 Replies

8. Shell Programming and Scripting

Data fetched from text file and save in a csv file

Hi i have wriiten a script which fetches the data from text file, and saves in the output in a text file itself, but i want that the output should save in different columns. I have the output like: For Channel:response_time__24.txt 1547 data points 0.339 0.299 0.448 0.581 7.380 ... (1 Reply)
Discussion started by: rohitkalia
1 Replies

9. Shell Programming and Scripting

how to start looping from the second line in .csv file

I have a .csv file and i use the below while loop to navigate through it But i need to loop from the second line since the first line is the header How will i do it?? please help while IFS=, read Filename Path size readonly do echo "Filename -> ${Filename}" echo "Path -> ${Path}" echo... (8 Replies)
Discussion started by: codeman007
8 Replies
Login or Register to Ask a Question