Flat file "database"


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Flat file "database"
# 8  
Old 11-22-2010
How big is "huge"?
What Database Engines or High Level Languages do you have available?
# 9  
Old 11-22-2010
I worked with HXTT on JDBC flat file csv product that would allow you to have directories of zip files of delimited flat files, and you could create a table of selected files using file wild cards right down into the zip archive. So, part of the secret to storing sorted delimited compressed is partitioning to files by range of keys. A zip can deliver the decompressed file content to stdout without reading the whole file, unlike a compressed tar. I wrote much of this into the wiki on flat file databases.

BTW, partitioning as sorting: I once took all the US stock trades of one day and split them up using a simple C tool that found the symbol and wrote the record to a file of the same name. When it got to 200 names/file streams open to write (on this OS, 256 fd was a limit), it did a popen() of itself and sent the misses downstream. Since the data was already in time order, now it was symbol-time sorted/partitioned in one pass no sorting. I might even have the code around somewhere. The point is that the most popular stocks did not go down the pipeline very far, so it was fast and multiprocessor friendly.
# 10  
Old 11-22-2010
Quote:
I don't see a simple way to do it in shell with fixed-length either, though, since a shell can't seek.
The Korn shell, for one, can do extended I/O. Consider the following trivial example:
Code:
#!/bin/ksh93

TMP=file.$$

cat <<EOT >$TMP
aaa
bbb
ccc
ddd
eee
fff
EOT

# open file descriptor 3 for read/write
command exec 3<> $TMP || exit 1

# check file descriptor 3 position
print
print "At offset: $(3<#)"
if (($(3<#) != 0))
then
   print "Not at offset 0"
   exit 1
fi

# read in the first line and print it
read -u3
print $REPLY
print "At offset $(3<#) after reading line"
print

# search forward for string "ddd"
3<#"ddd"
print "At offset $(3<#) after search forward for 'ddd'"
read -u3
print $REPLY
print

# check that we are at offset 8 and, if so, read line
if (( $(3<# ((8))) != 8))
then
  print "Not at offset 8"
  exit 1
fi
print "At offset $(3<#) after specifying absolute offset of 8"
read -u3
print $REPLY
print

# go on that is at offset 24, so check.
if (( $(3<#((EOF))) != 4*6 ))
then
   print "Not at EOF"
   exit 1
fi
print "At offset $(3<#) after specifying EOF"
print

# backup one line i.e. 4 characters
3<#((CUR - 4))
print "At offset $(3<#) after backing up 4 characters"
read -u3
print $REPLY
print

redirect 3<&- || echo 'cannot close FD 3'

rm $TMP

This outputs
Code:
At offset: 0 
aaa 
At offset 4 after reading line 

At offset 12 after search forward for 'ddd' 
ddd 

At offset 8 after specifying absolute offset of 8 
ccc 

At offset 24 after specifying EOF 

At offset 20 after backing up 4 characters 
fff

This User Gave Thanks to fpmurphy For This Post:
# 11  
Old 11-23-2010
You can seek with head and tail using any shell. However, for rows to be seek-address-accessible, you either need either an index to look up the seek address or fixed sized rows so you can multiply to find row N, assuming you have an index that tells you N is the row you want. Fixed size rows are space wasters. If you have an index, why not put the data on the leaf?

unzip seeks for you and also normally stores the data compressed, which may make it flow faster than a disk (CPUs are faster than disk drives). If you partition your data into many modest sized files, zip can put them away with relative paths for quick access.
# 12  
Old 11-23-2010
Quote:
Originally Posted by frank_rizzo
sqllite ? Berkley Database?
sqlite doesn't seem appropriate, a relational database won't really take advantage of sorted data. Select a range of dates and it won't be able to do a binary search to find the start and end; it'll either do a table-scan, or consult some mammoth index.

I'm less familiar with Berkeley DB, but being a key-pair system it wouldn't appear to have particular facility for sorted data either.

Quote:
Originally Posted by methyl
How big is "huge"?
300 records a day doesn't sound like a lot, but that's about 100,000 records a year for one logger. And there might be many, ultimately. It could be hundreds of megs to several gigs of data if you wait long enough, and all of it should remain reasonably accessible.
Quote:
What Database Engines or High Level Languages do you have available?
I'm open to most open-source solutions. I've been using MySQL for most database tasks but it, and relational databases in general, doesn't seem suited to large amounts of sorted data. Considering the complexity of the data(or rather, the lack of it) it seems overkill in any case.

But, as I've said: I think I have this problem solved. I've made a fairly simple C application to partition data across a configurable number of sorted flat files based on their first key, it can also select arbitrary ranges from them without grinding a giant index.
# 13  
Old 11-23-2010
Quote:
I've made a fairly simple C application to partition data into a number of sorted flat files based on their first key, it can also select arbitrary ranges from them without grinding a giant index.
Sounds very zip friendly, too!
# 14  
Old 11-23-2010
Hi.

I haven't used this much, but it fits the words of your request ... cheers, drl

Flat File Extractor | Download Flat File Extractor software for free at SourceForge.net
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. Shell Programming and Scripting

finding the strings beween 2 characters "/" & "/" in .txt file

Hi all. I have a .txt file that I need to sort it My file is like: 1- 88 chain0 MASTER (FF-TE) FFFF 1962510 /TCK T FD2TQHVTT1 /jtagc/jtag_instreg/updateinstr_reg_1 dff1 (TI,SO) 2- ... (10 Replies)
Discussion started by: Behrouzx77
10 Replies

5. UNIX for Dummies Questions & Answers

Unix "look" Command "File too large" Error Message

I am trying to find lines in a text file larger than 3 Gb that start with a given string. My command looks like this: $ look "string" "/home/patrick/filename.txt" However, this gives me the following message: "look: /home/patrick/filename.txt: File too large" So, I have two... (14 Replies)
Discussion started by: shishong
14 Replies

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

7. Shell Programming and Scripting

how to create flat file delimited by "\002"

I need to create a flat file with columns delimited by "\002" (octal 2) I tried using the simple echo. name="Adam Smith" age=40 address="1 main st" city="New York" echo ${name}"\002"${age}"\002"${address}"\002"${city} > mytmp but it creates a delimiter with different octal... (4 Replies)
Discussion started by: injey
4 Replies

8. Shell Programming and Scripting

"sed" to check file size & echo " " to destination file

Hi, I've modified the syslogd source to include a thread that will keep track of a timer(or a timer thread). My intention is to check the file size of /var/log/messages in every one minute & if the size is more than 128KB, do a echo " " > /var/log/messages, so that the file size will be set... (7 Replies)
Discussion started by: jockey007
7 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question