I need a database and a plan of attack!


 
Thread Tools Search this Thread
# 1  
I need a database and a plan of attack!

Hi everyone,

I've got an extensive collection of seismic files that I am trying to turn into workable subsurface data collection. It's all real-time history and it is being loaded onto the main linux computer from a collection of about 1000 CDs. There are about 4000 seismic files on each CD, and these CDs were generated by approximately 20 seismic stations over the course of five to eight years.

There are gaps in the record as some of the stations went offline and online throughout the eight year history.

One of the tools that I need to develop is a workable database that can inventory the various seismic files and record within the database the times of when data is on-hand and the times when there are gaps in the seismic records.

Each of the files contained within the database has the file creation date embedded in the file name. As an example:

Quote:
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226223330.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226223932.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226224537.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226225141.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226225743.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226230348.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226230950.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226231554.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226232157.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226232803.out
Note that the first file name (20011226223330.out) breaks the time down (in GMT) as: 2001, 12,26,22,33,30 which translates to 2001-Dec-26 22:33:30 GMT as the initial time within the file.

I would like to first of all,

1) Pick an appropriate database and RDMS which is workable via sql and automated scripting

2) Develop a script that can:
a. recursively move through all of the files within the directory structure,
b. seek the file name of each .OUT file as well as the station designator as found in the folder name
c. Create an entry within the database consisting of
ca. The decoded date & time that is sourced from the file name
cb. The station name designator
cc. The actual file name including full path
d. Place the entry into the database to catalog the file.

This would help me identify where it is that I have data and where I have gaps.

However, I am really new to scripting, and my database skills are like, 22 years out of date.

If you have the time and insight, could you maybe give me a hint on where to start ? It's been a big project thus far and is getting bigger. Putting a database together that inventories the data segments would be greatly beneficial, especially when I move onto the next phase, which is time validation of the existing data, and the conversion of that data into a researchable format.

---------- Post updated at 01:01 PM ---------- Previous update was at 12:40 PM ----------

One thing that comes to mind is to come up with a recursive ls command that outputs to an ascii file the full path name for each .OUT file within the entire directory structure. This would contain all of the raw information which could then be parsed, maybe with a script, into a .CSV format text file? :shrug:
# 2  
To get full path for *.out files use "find" not "ls":
Code:
find /directory/with/outs -type f -name "*.out"

This User Gave Thanks to bartus11 For This Post:
# 3  
Need to clarify a bit:
1000 CD, OK
4000 records ( or files) per CD, but of one or many stations? (this can make things more hard...)
etc...

20 stations, OK
recording over 8 years...
How did you copy them on the linux box? per station? per year? Are they in subdirectories? what is the structure?
# 4  
Directory structure

My directory structure is broken up by CD name and by station name.
I've named the discs with the three or four letter designator of the originating station, followed by a two digit numerical number representing the year or origin, followed by an arbitrary letter N, and then the disc sequence from the collection. So for instance, "SEY06N01" would represent the first disc in the sequence from the year 2006 for station SEY.

The directory structure then looks as follows:
Quote:
~/Seismic_Data_Collection/ /* Only one collection */
~/Seismic_Data_Collection/SEY /* About twenty folders (one per station) at this level */
~/Seismic_Data_Collection/SEY/SEY06N01/ /* About four thousand files at this level, or perhaps one more level of three or four subfolders in case of a station outage. */

~/Seismic_Data_Collection/SEY/SEY06N01/20060110235959.out
/* About four thousand of these files that contain raw seismic data from the originating station*/
There are about twenty station subdirectories within the directory Seismic_Data_Collection, and within each station subdirectory there may be as many as 360 subfolders. That would assume an unbroken ten-year record for that particular station, which is not the case. However a couple of the stations do have a couple-hundred CDs. SEY, for instance, has records from 2001 through 2010. Each CD folder contains only data from that particular station, and each file within each folder are supposed to be identical in length (in terms of bytes) and thus, recording time.

---------- Post updated at 02:05 PM ---------- Previous update was at 01:51 PM ----------

Hey, I think that find command might come to the rescue. I got some ideas - maybe a combination where the find is used to put out the full path, then an ls is used to gather file size & creation date. The database will be necessary when I move to the next phase which is to validate the time synchronization of continuous segments of data found within adjacent files.

I'll keep thinking about it. Thanks for the feedback.

Last edited by ws6transam; 02-22-2012 at 02:57 PM..
 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Science: Computers
Difficulty: Easy
Ada Lovelace is often considered the first computer programmer.
True or False?

6 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Cpio Restore didnt go to plan

Hello folks, one of the RAID drives in our SCO system crashed recently and being hot swap it was replaced. Problem was that on boot it stops at: Checking protected password and protected subsystem databases.... First I did #authck -a and checked /etc/auth/system/ttys as per instructions in a... (7 Replies)
Discussion started by: Redstar
7 Replies

2. What is on Your Mind?

Plan to Buy a New Laptop

It is time for me to buy a new laptop computer and i was searching the net for the last few days looking for one. The results haven't really been satisfying and i thought you might have information i lack - if so, i'd be grateful for sharing it with me and the probably interested audience. What... (22 Replies)
Discussion started by: bakunin
22 Replies

3. AIX

Plan to shutdown servers

Hello everyone I need to shutdown all my servers and my storage. I would like to hear your opinions about this. This is my little plan about all this. 1.-Stop the applications 2.-Stop the webservers 3.-Stop the ihs 4.-Stop the databases 5.-Verify no process are running 6.-Close the... (1 Reply)
Discussion started by: lo-lp-kl
1 Replies

4. Filesystems, Disks and Memory

an advice regarding backup plan

Hi all i'm looking for good advice regarding backup plan becuase its first time to me handle large scale database expected to grow up 10000 - 20000 record per year with daily operations on it I'm working as sysAdmin in educational organization ( junior level ) with mixed OSes environment... (3 Replies)
Discussion started by: h@foorsa.biz
3 Replies

5. What is on Your Mind?

401 Keg Plan

If you had purchased $1,000 of AIG stock one year ago, you would have about $40.00 left. With Lehman, you have about $6.00 left. With Fannie or Freddie, you would have less than $5.00 left. But, if you had purchased $1,000 worth of beer one year ago, and had drunk all of the beer, then turned... (4 Replies)
Discussion started by: Neo
4 Replies

6. UNIX for Dummies Questions & Answers

plan

i want to make a program run when a person types 'finger USERNAME', username being my username. how can i do this? (2 Replies)
Discussion started by: cypher
2 Replies

Featured Tech Videos