Hi everyone,
I've got an extensive collection of seismic files that I am trying to turn into workable subsurface data collection. It's all real-time history and it is being loaded onto the main linux computer from a collection of about 1000 CDs. There are about 4000 seismic files on each CD, and these CDs were generated by approximately 20 seismic stations over the course of five to eight years.
There are gaps in the record as some of the stations went offline and online throughout the eight year history.
One of the tools that I need to develop is a workable database that can inventory the various seismic files and record within the database the times of when data is on-hand and the times when there are gaps in the seismic records.
Each of the files contained within the database has the file creation date embedded in the file name. As an example:
Quote:
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226223330.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226223932.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226224537.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226225141.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226225743.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226230348.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226230950.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226231554.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226232157.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226232803.out
Note that the first file name (20011226223330.out) breaks the time down (in GMT) as: 2001, 12,26,22,33,30 which translates to 2001-Dec-26 22:33:30 GMT as the initial time within the file.
I would like to first of all,
1) Pick an appropriate database and RDMS which is workable via sql and automated scripting
2) Develop a script that can:
a. recursively move through all of the files within the directory structure,
b. seek the file name of each .OUT file as well as the station designator as found in the folder name
c. Create an entry within the database consisting of
ca. The decoded date & time that is sourced from the file name
cb. The station name designator
cc. The actual file name including full path
d. Place the entry into the database to catalog the file.
This would help me identify where it is that I have data and where I have gaps.
However, I am really new to scripting, and my database skills are like, 22 years out of date.
If you have the time and insight, could you maybe give me a hint on where to start ? It's been a big project thus far and is getting bigger. Putting a database together that inventories the data segments would be greatly beneficial, especially when I move onto the next phase, which is time validation of the existing data, and the conversion of that data into a researchable format.
---------- Post updated at 01:01 PM ---------- Previous update was at 12:40 PM ----------
One thing that comes to mind is to come up with a recursive ls command that outputs to an ascii file the full path name for each .OUT file within the entire directory structure. This would contain all of the raw information which could then be parsed, maybe with a script, into a .CSV format text file? :shrug: