Flat file "database"

11-22-2010

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Flat file "database"

I'm wondering about the best way to store large amounts of sorted data, sorted by date/time, in a manner to allow fast retrieval of ranges of dates. Deletion isn't necessary.

A database hardly seems ideal, since they're not optimized for sorted data, and allow things I don't need like the deletion of arbitrary records. A text flat file would work fine, except how to seek in one without reading the whole thing? It's theoretically possible assuming the data's sorted, but I don't know a way to do this from shell, so I considered writing my own application to quickly seek within a huge flat file and/or split into different flat files based on date, until I realized I might be reinventing the wheel. Are there any standard tools for this?

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

11-22-2010

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

You can try sgrep - it does binary searches on huge sorted files very efficiently.

See sourceforge or sgrep homepage:
Sgrep - Home page

We use it for searching giant xml data files. In looking at the homepage it has added a bunch of features, since our earlier version, but we don't seem to need them.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

11-22-2010

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

I suppose it depends on the volume of data. If you are storing everything into a single file, then many shell commands will read the whole file, (grep for example) but if you have a record of the line number, you could use

Code:

sed -n ${i}p $filename

It's a bit of a heavyweight , but it will read you line ${i}

An alternate might be to split the records so that you have one file per day and then you reduce the volume read by tools such as grep, assuming you know the day.

You could then compress the older files then use

Code:

zcat $filename|grep $mysearch

although there is, of course the overhead of the zcat. Perhaps if you found a logic that said "Anything over 30 days is compressed because it is rarely used" then you could build a script that just looks for the file and if only the compressed file exists, uses the above.

Robin
Liverpool/Blackburn
UK

Do let us know how you get on or if there are more questions.

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

11-22-2010

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by rbatte1

I suppose it depends on the volume of data. If you are storing everything into a single file, then many shell commands will read the whole file, (grep for example) but if you have a record of the line number, you could use

Code:

sed -n ${i}p $filename

Exactly the problem. It's going to be big, eventually, and I don't want to read the entire file. I don't want to use a real database either, since the data's intended to be sorted, which would add a lot of overhead there too.

On the other hand it's certainly possible to be intelligent about the way you read a flat file. Things like GNU tac and GNU tail use seeking when possible to get approximately where they want and fine-tune from there, instead of reading all 3 gigs like a moron until they find the 10 lines they need.

Quote:

An alternate might be to split the records so that you have one file per day

Yes, I thought of that. One file per month might be more appropriate if I don't want to generate massive numbers of tiny tiny files, but anyway.

The point is, I'm looking for a smart tool to store in and retrieve from such a tree. Beginning to think there is no such thing, might have to build one myself...

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

11-22-2010

Registered User

4,996, 477

Join Date: Dec 2003

Last Activity: 12 June 2016, 11:03 PM EDT

Location: /dev/ph

Posts: 4,996

Thanks Given: 73

Thanked 477 Times in 439 Posts

All depends on whether your sorted data is fixed or variable length records. Direct access to a specific fixed length record is an easy nut to crack. Quickly accessing variable length records in a flat file is a whole different game. Which is it?

fpmurphy

View Public Profile for fpmurphy

Find all posts by fpmurphy

11-22-2010

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Variable length. I don't see a simple way to do it in shell with fixed-length either, though, since a shell can't seek.

---------- Post updated at 03:29 PM ---------- Previous update was at 02:35 PM ----------

I've got an application partly written for this now, so, solved, I guess.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

11-22-2010

Registered User

1,155, 93

Join Date: Dec 2007

Last Activity: 28 December 2019, 12:50 PM EST

Posts: 1,155

Thanks Given: 5

Thanked 93 Times in 90 Posts

sqllite ? Berkley Database?

frank_rizzo

View Public Profile for frank_rizzo

Find all posts by frank_rizzo

UNIX for Advanced & Expert Users

Flat file "database"

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Discussion started by: penchev

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Discussion started by: jcdole

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

Discussion started by: asjaiswal

4. Shell Programming and Scripting

finding the strings beween 2 characters "/" & "/" in .txt file

Discussion started by: Behrouzx77

5. UNIX for Dummies Questions & Answers

Unix "look" Command "File too large" Error Message

Discussion started by: shishong

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Discussion started by: shis100

7. Shell Programming and Scripting

how to create flat file delimited by "\002"

Discussion started by: injey

8. Shell Programming and Scripting

"sed" to check file size & echo " " to destination file

Discussion started by: jockey007

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Discussion started by: Lokesha