I need a database and a plan of attack!

02-22-2012

Registered User

9, 0

Join Date: Nov 2011

Last Activity: 6 June 2012, 1:59 PM EDT

Posts: 9

Thanks Given: 1

Thanked 0 Times in 0 Posts

I need a database and a plan of attack!

Hi everyone,

I've got an extensive collection of seismic files that I am trying to turn into workable subsurface data collection. It's all real-time history and it is being loaded onto the main linux computer from a collection of about 1000 CDs. There are about 4000 seismic files on each CD, and these CDs were generated by approximately 20 seismic stations over the course of five to eight years.

There are gaps in the record as some of the stations went offline and online throughout the eight year history.

One of the tools that I need to develop is a workable database that can inventory the various seismic files and record within the database the times of when data is on-hand and the times when there are gaps in the seismic records.

Each of the files contained within the database has the file creation date embedded in the file name. As an example:

Quote:

-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226223330.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226223932.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226224537.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226225141.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226225743.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226230348.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226230950.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226231554.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226232157.out
-rwxrwxr-x 1 msugws users 174592 Nov 1 14:52 20011226232803.out

Note that the first file name (20011226223330.out) breaks the time down (in GMT) as: 2001, 12,26,22,33,30 which translates to 2001-Dec-26 22:33:30 GMT as the initial time within the file.

I would like to first of all,

1) Pick an appropriate database and RDMS which is workable via sql and automated scripting

2) Develop a script that can:
a. recursively move through all of the files within the directory structure,
b. seek the file name of each .OUT file as well as the station designator as found in the folder name
c. Create an entry within the database consisting of
ca. The decoded date & time that is sourced from the file name
cb. The station name designator
cc. The actual file name including full path
d. Place the entry into the database to catalog the file.

This would help me identify where it is that I have data and where I have gaps.

However, I am really new to scripting, and my database skills are like, 22 years out of date.

If you have the time and insight, could you maybe give me a hint on where to start ? It's been a big project thus far and is getting bigger. Putting a database together that inventories the data segments would be greatly beneficial, especially when I move onto the next phase, which is time validation of the existing data, and the conversion of that data into a researchable format.

---------- Post updated at 01:01 PM ---------- Previous update was at 12:40 PM ----------

One thing that comes to mind is to come up with a recursive ls command that outputs to an ascii file the full path name for each .OUT file within the entire directory structure. This would contain all of the raw information which could then be parsed, maybe with a script, into a .CSV format text file? :shrug:

ws6transam

View Public Profile for ws6transam

Find all posts by ws6transam

02-22-2012

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

To get full path for *.out files use "find" not "ls":

Code:

find /directory/with/outs -type f -name "*.out"

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

02-22-2012

Moderator

6,876, 694

Join Date: Sep 2005

Last Activity: 10 February 2021, 3:50 AM EST

Location: Switzerland - GE

Posts: 6,876

Thanks Given: 594

Thanked 694 Times in 627 Posts

Need to clarify a bit:
1000 CD, OK
4000 records ( or files) per CD, but of one or many stations? (this can make things more hard...)
etc...

20 stations, OK
recording over 8 years...
How did you copy them on the linux box? per station? per year? Are they in subdirectories? what is the structure?

vbe

View Public Profile for vbe

Find all posts by vbe

02-22-2012

Registered User

9, 0

Join Date: Nov 2011

Last Activity: 6 June 2012, 1:59 PM EDT

Posts: 9

Thanks Given: 1

Thanked 0 Times in 0 Posts

Directory structure

My directory structure is broken up by CD name and by station name.
I've named the discs with the three or four letter designator of the originating station, followed by a two digit numerical number representing the year or origin, followed by an arbitrary letter N, and then the disc sequence from the collection. So for instance, "SEY06N01" would represent the first disc in the sequence from the year 2006 for station SEY.

The directory structure then looks as follows:

Quote:

~/Seismic_Data_Collection/ /* Only one collection */
~/Seismic_Data_Collection/SEY /* About twenty folders (one per station) at this level */
~/Seismic_Data_Collection/SEY/SEY06N01/ /* About four thousand files at this level, or perhaps one more level of three or four subfolders in case of a station outage. */

~/Seismic_Data_Collection/SEY/SEY06N01/20060110235959.out
/* About four thousand of these files that contain raw seismic data from the originating station*/

There are about twenty station subdirectories within the directory Seismic_Data_Collection, and within each station subdirectory there may be as many as 360 subfolders. That would assume an unbroken ten-year record for that particular station, which is not the case. However a couple of the stations do have a couple-hundred CDs. SEY, for instance, has records from 2001 through 2010. Each CD folder contains only data from that particular station, and each file within each folder are supposed to be identical in length (in terms of bytes) and thus, recording time.

---------- Post updated at 02:05 PM ---------- Previous update was at 01:51 PM ----------

Hey, I think that find command might come to the rescue. I got some ideas - maybe a combination where the find is used to put out the full path, then an ls is used to gather file size & creation date. The database will be necessary when I move to the next phase which is to validate the time synchronization of continuous segments of data found within adjacent files.

I'll keep thinking about it. Thanks for the feedback.

Last edited by ws6transam; 02-22-2012 at 02:57 PM..

ws6transam

View Public Profile for ws6transam

Find all posts by ws6transam

UNIX for Dummies Questions & Answers

I need a database and a plan of attack!

6 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Cpio Restore didnt go to plan

Discussion started by: Redstar

2. What is on Your Mind?

Plan to Buy a New Laptop

Discussion started by: bakunin

3. AIX

Plan to shutdown servers

Discussion started by: lo-lp-kl

4. Filesystems, Disks and Memory

an advice regarding backup plan

Discussion started by: h@foorsa.biz

5. What is on Your Mind?

401 Keg Plan

Discussion started by: Neo

6. UNIX for Dummies Questions & Answers

plan

Discussion started by: cypher