FILE_ID extraction from file name and save it in CSV file after looping through each folders
FILE_ID extraction from file name and save it in CSV file after looping through each folders
My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that?
I have folders in unix environment, directory structure is structured as follows
year folder -> inside 12 months folders -> inside 30/31 days folders
I ran ls command folder
year as follows
2009 2010 2011 2012
I ran cd command for year 2012
I ran ls command for 2012 year folder
then I ran command for september
there are folders for each year like 2009,2010,2011 and 2012
and folder has 12 folders for each months like 01,02,03,04,05,06,07,08,09,10,11,12
and each month folder has 31 folders for days like 1,2,3, etc... 29,30,31
inside each day folder has files..
the file name is as follows,
sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
sasmm_fsbc_durds_id00020513_t20120913003312.dat.trnsfr.gz
I want to have one csv file and that file needs to have two columns , one is for file_id and
second field is for file name.
to obtain file_id value ,loop through each folders and get file name, then read file name and
get substring between "sasmm_fsbc_durds_id000" and _t and store it in file_id column and store
file name in file_name column.
in above example for file sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
read file name sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz
cut 20532 and save it in a file_id clumn and the whole file name in second column = sasmm_fsbc_durds_id00020532_t20100313192606.dat
CSV file will look like
file_id is to be cut from the file name , if you look at the file name closely, you can see;
after 000 , file_ids in above file name examples , they are 20532 and 20513.
How do I loop through year 2012 and 12 months folders and 31 days folders inside it and create
csv file which has data as shown above?
I am very new unix, please help me out.. If you provide a code , that would be great..
thanks..
output CSV file look like this
do we need to search files recursively for finding file in each folder or to go dwon to day folder?
Moderator's Comments:
edit by bakunin: Please view this code tag video for how to use code tags when posting code and data.
In addition please do not use all-caps routinely. All-caps is like spice - use it to make SOMETHING STAND OUT, but overdo it and its tasteless.
First you say that filenames in the directory 2012/09/13 are:
and then you say you want the entire filename to be the second field in your output file and say that that field should be:
What happened to the .trnsfr.gz at the end of the filenames?
Is the file_id field always supposed to be a string a decimal digits or could other characters appear in the file_id?
Is there any chance that there will be more than one occurrence of _t in a filename after sasmm_fsbc_durds_id000?
Should an error be reported if other files exist under 2???/[01][0-9]/[0-3][0-9] with filenames that that don't start with sasmm_fsbc_durds_id000 and contain _t after that?
Code:
sasmm_fsbc_durds_id00020532_t20100313192606.dat.trnsfr.gz sasmm_fsbc_durds_id00020513_t20120913003312.dat.trnsfr.gz
and then you say you want the entire filename to be the second field in your output file and say that that field should be:
Code:
sasmm_fsbc_durds_id00020532_t20100313192606.dat sasmm_fsbc_durds_id00020513_t20120913003312.dat
What happened to the .trnsfr.gz at the end of the filenames?
yeah in each folder , the file name ends with
.dat.trnsfr.gz
but when we enter into CSV file UNDER file_name column , it should omit
.trnsfr.gz
for file_id
it is number, it should be extracted from file name itself
in your code , you have not specified output file as CSV,
are you looping through all files inside all folders in a year?
which code is used for extracting id from file id?
how you specify the coulmn names in out put file?
do you know write same logic in simple Shell, Shell Scripts?
---------- Post updated at 10:18 PM ---------- Previous update was at 09:59 PM ----------
if i use this loop, will it loop through all folders?
FILES=`ls -1`
for FILE in $FILES
do
---------- Post updated at 10:27 PM ---------- Previous update was at 10:18 PM ----------
I ran your script, it says error message
[/work/users/po/prince]$ ./testSBI.sh
./testSBI.sh[8]: id=${id:2}: bad substitution
your code
Quote:
#!/usr/bin/env ksh
find 20[0-1][0-9] -type f | while read path
do
name=${path##*/}
name=${name%.trns*}
id=${name%_*}
id=${id##*_}
id=${id:2}
echo ${id/~(+E)^[0]+/} $name
done >output-file
---------- Post updated at 10:39 PM ---------- Previous update was at 10:27 PM ----------
i removed line of code which causes the error
i executed your script without that, it again throw an error
./testSBI.sh[10]: ${id/~(+E)^[0]+/}: bad substitution
in your code , you have not specified output file as CSV,
are you looping through all files inside all folders in a year?
which code is used for extracting id from file id?
You can set the output file name however you want. Replace output-file with CSV, or what ever you want the output filename to be. The find command will list all files under all directories that are of the form 2000 - 2099, so yes, in a way we are looping through all files, but letting find do the work rather than the script.
The code that extracts the ID from the name is:
The leading zeros are removed as the variable is expanded in the echo:
Quote:
how you specify the coulmn names in out put file?
You made no mention of column names, only that the ID was to be first and the filename was to be second. The code prints ID followed by filename. Per your example there is no comma; I was a bit confused with your initial post as you indicated that the file was comma separated values (csv) yet you didn't indicate that the columns should be separated that way.
Quote:
do you know write same logic in simple Shell, Shell Scripts?
The code I posted is a simple shell script.
Quote:
if i use this loop, will it loop through all folders?
FILES=`ls -1`
for FILE in $FILES
do
Yes, but it's bad form if you ask me. Something like this would be better:
Quote:
I ran your script, it says error message
[FONT=r_ansi][SIZE=2][FONT=r_ansi][SIZE=2][/work/users/po/prince]$ ./testSBI.sh
./testSBI.sh[8]: id=${id:2}: bad substitution
Were you using ksh (Korn Shell)? Bash cannot handle the last substitution which eliminates the leading zeros from the ID. If you cannot use ksh, then you'll need to change the echo and delete the zeros with sed or some other mechanism.
The following seems to do what you requested. You say that you want to create a CSV file, but by definition a CSV file has fields that are separated by commas. You don't show any commas in any of your sample output. This script uses a tab to separate output fields to get the headers to line up with the following data. Although it is written using ksh, it should also work with at least bash and sh:
Note that this will ignore any files found in and under the year directories that don't match your filename specifications.
To run it, save the above code in a file (e.g., extract) in the same directory where the year directories reside, make it executable by issuing the command:
and then issue the command:
If you leave off > output_file, the output will be written to your terminal. If you want to save the output in a file with a name other than output_file, replace it with any name you want.
Hi All,
I have a data file and need to extract and convert it into csv format:
1) Read and extract the line containing string ending with "----" (file sample_linebyline.txt file) and to make a .csv file from this.
2) To read the flat file flatfile_sample.txt which consists of similar data (... (9 Replies)
Hi,
I have another problem. I want to sort another csv file by the first field.
result.csv
SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw
/home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92 ... (2 Replies)
Hi, all
I want to sort a csv file based on timestamp from oldest to newest and save the output as csv file itself. Here is an example of my csv file.
test.csv
SourceFile,DateTimeOriginal
/home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21
/home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28... (10 Replies)
Hi
I am writing a shell script to parse a CSV file , in which i am facing a problem to separate the columns . Could some one help me with it.
IN301330/00001 pvavan kumar limited xyz@ttccpp.com
IN302148/00002 PRECIOUS SECURITIES (P) LTD viash@yahoo.co.in
IN300239/00000 CENTRE india... (8 Replies)
Hi,
I have a text file with the following format. Some of the fields are blank.
1234 3456 23 45464 327837283232 343434
5654353 34 34343 3434345 434242
....
....
....
I need to convert this file to a CSV file, like
1234, ,23, ... (3 Replies)
Hi Unix gurus,
Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me.
File format: CSV file
File has four columns with no header... (8 Replies)
Hi
I need to execute a select statement in a solaris environment with oracle database. The select statement returns number of rows of data.
I need the data to be inserted into a CSV file with proper format. For that we normally use "You have to select all your columns as one big string,... (2 Replies)
Hi i have wriiten a script which fetches the data from text file, and saves in the output in a text file itself, but i want that the output should save in different columns.
I have the output like:
For Channel:response_time__24.txt
1547 data points
0.339
0.299
0.448
0.581
7.380
... (1 Reply)
I have a .csv file and i use the below while loop to navigate through it
But i need to loop from the second line since the first line is the header
How will i do it?? please help
while IFS=, read Filename Path size readonly
do
echo "Filename -> ${Filename}"
echo "Path -> ${Path}"
echo... (8 Replies)