Flat file "database"


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Flat file "database"
# 1  
Old 11-22-2010
Flat file "database"

I'm wondering about the best way to store large amounts of sorted data, sorted by date/time, in a manner to allow fast retrieval of ranges of dates. Deletion isn't necessary.

A database hardly seems ideal, since they're not optimized for sorted data, and allow things I don't need like the deletion of arbitrary records. A text flat file would work fine, except how to seek in one without reading the whole thing? It's theoretically possible assuming the data's sorted, but I don't know a way to do this from shell, so I considered writing my own application to quickly seek within a huge flat file and/or split into different flat files based on date, until I realized I might be reinventing the wheel. Are there any standard tools for this?
# 2  
Old 11-22-2010
You can try sgrep - it does binary searches on huge sorted files very efficiently.

See sourceforge or sgrep homepage:
Sgrep - Home page

We use it for searching giant xml data files. In looking at the homepage it has added a bunch of features, since our earlier version, but we don't seem to need them.
# 3  
Old 11-22-2010
I suppose it depends on the volume of data. If you are storing everything into a single file, then many shell commands will read the whole file, (grep for example) but if you have a record of the line number, you could use
Code:
sed -n ${i}p $filename

It's a bit of a heavyweight , but it will read you line ${i}

An alternate might be to split the records so that you have one file per day and then you reduce the volume read by tools such as grep, assuming you know the day.

You could then compress the older files then use
Code:
zcat $filename|grep $mysearch

although there is, of course the overhead of the zcat. Perhaps if you found a logic that said "Anything over 30 days is compressed because it is rarely used" then you could build a script that just looks for the file and if only the compressed file exists, uses the above.


Robin
Liverpool/Blackburn
UK


Do let us know how you get on or if there are more questions.
# 4  
Old 11-22-2010
Quote:
Originally Posted by rbatte1
I suppose it depends on the volume of data. If you are storing everything into a single file, then many shell commands will read the whole file, (grep for example) but if you have a record of the line number, you could use
Code:
sed -n ${i}p $filename

Exactly the problem. It's going to be big, eventually, and I don't want to read the entire file. I don't want to use a real database either, since the data's intended to be sorted, which would add a lot of overhead there too.

On the other hand it's certainly possible to be intelligent about the way you read a flat file. Things like GNU tac and GNU tail use seeking when possible to get approximately where they want and fine-tune from there, instead of reading all 3 gigs like a moron until they find the 10 lines they need.

Quote:
An alternate might be to split the records so that you have one file per day
Yes, I thought of that. One file per month might be more appropriate if I don't want to generate massive numbers of tiny tiny files, but anyway.

The point is, I'm looking for a smart tool to store in and retrieve from such a tree. Beginning to think there is no such thing, might have to build one myself...
# 5  
Old 11-22-2010
All depends on whether your sorted data is fixed or variable length records. Direct access to a specific fixed length record is an easy nut to crack. Quickly accessing variable length records in a flat file is a whole different game. Which is it?
# 6  
Old 11-22-2010
Variable length. I don't see a simple way to do it in shell with fixed-length either, though, since a shell can't seek.

---------- Post updated at 03:29 PM ---------- Previous update was at 02:35 PM ----------

I've got an application partly written for this now, so, solved, I guess.
# 7  
Old 11-22-2010
sqllite ? Berkley Database?
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. Shell Programming and Scripting

finding the strings beween 2 characters "/" & "/" in .txt file

Hi all. I have a .txt file that I need to sort it My file is like: 1- 88 chain0 MASTER (FF-TE) FFFF 1962510 /TCK T FD2TQHVTT1 /jtagc/jtag_instreg/updateinstr_reg_1 dff1 (TI,SO) 2- ... (10 Replies)
Discussion started by: Behrouzx77
10 Replies

5. UNIX for Dummies Questions & Answers

Unix "look" Command "File too large" Error Message

I am trying to find lines in a text file larger than 3 Gb that start with a given string. My command looks like this: $ look "string" "/home/patrick/filename.txt" However, this gives me the following message: "look: /home/patrick/filename.txt: File too large" So, I have two... (14 Replies)
Discussion started by: shishong
14 Replies

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

7. Shell Programming and Scripting

how to create flat file delimited by "\002"

I need to create a flat file with columns delimited by "\002" (octal 2) I tried using the simple echo. name="Adam Smith" age=40 address="1 main st" city="New York" echo ${name}"\002"${age}"\002"${address}"\002"${city} > mytmp but it creates a delimiter with different octal... (4 Replies)
Discussion started by: injey
4 Replies

8. Shell Programming and Scripting

"sed" to check file size & echo " " to destination file

Hi, I've modified the syslogd source to include a thread that will keep track of a timer(or a timer thread). My intention is to check the file size of /var/log/messages in every one minute & if the size is more than 128KB, do a echo " " > /var/log/messages, so that the file size will be set... (7 Replies)
Discussion started by: jockey007
7 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question