I'm wondering about the best way to store large amounts of sorted data, sorted by date/time, in a manner to allow fast retrieval of ranges of dates. Deletion isn't necessary.
A database hardly seems ideal, since they're not optimized for sorted data, and allow things I don't need like the deletion of arbitrary records. A text flat file would work fine, except how to seek in one without reading the whole thing? It's theoretically possible assuming the data's sorted, but I don't know a way to do this from shell, so I considered writing my own application to quickly seek within a huge flat file and/or split into different flat files based on date, until I realized I might be reinventing the wheel. Are there any standard tools for this?
We use it for searching giant xml data files. In looking at the homepage it has added a bunch of features, since our earlier version, but we don't seem to need them.
I suppose it depends on the volume of data. If you are storing everything into a single file, then many shell commands will read the whole file, (grep for example) but if you have a record of the line number, you could use
It's a bit of a heavyweight , but it will read you line ${i}
An alternate might be to split the records so that you have one file per day and then you reduce the volume read by tools such as grep, assuming you know the day.
You could then compress the older files then use
although there is, of course the overhead of the zcat. Perhaps if you found a logic that said "Anything over 30 days is compressed because it is rarely used" then you could build a script that just looks for the file and if only the compressed file exists, uses the above.
Robin
Liverpool/Blackburn
UK
Do let us know how you get on or if there are more questions.
I suppose it depends on the volume of data. If you are storing everything into a single file, then many shell commands will read the whole file, (grep for example) but if you have a record of the line number, you could use
Exactly the problem. It's going to be big, eventually, and I don't want to read the entire file. I don't want to use a real database either, since the data's intended to be sorted, which would add a lot of overhead there too.
On the other hand it's certainly possible to be intelligent about the way you read a flat file. Things like GNU tac and GNU tail use seeking when possible to get approximately where they want and fine-tune from there, instead of reading all 3 gigs like a moron until they find the 10 lines they need.
Quote:
An alternate might be to split the records so that you have one file per day
Yes, I thought of that. One file per month might be more appropriate if I don't want to generate massive numbers of tiny tiny files, but anyway.
The point is, I'm looking for a smart tool to store in and retrieve from such a tree. Beginning to think there is no such thing, might have to build one myself...
All depends on whether your sorted data is fixed or variable length records. Direct access to a specific fixed length record is an easy nut to crack. Quickly accessing variable length records in a flat file is a whole different game. Which is it?
Hi 2 all,
i have had AIX 7.2
:/# /usr/IBMAHS/bin/apachectl -v
Server version: Apache/2.4.12 (Unix)
Server built: May 25 2015 04:58:27
:/#:/# /usr/IBMAHS/bin/apachectl -M
Loaded Modules:
core_module (static)
so_module (static)
http_module (static)
mpm_worker_module (static)
... (3 Replies)
Hello.
System : opensuse leap 42.3
I have a bash script that build a text file.
I would like the last command doing :
print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt
where :
print_cmd ::= some printing... (1 Reply)
How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address
and column 3 contains “cc” e-mail address to include with same email.
Sample input file, email.txt
Below is an sample code where... (2 Replies)
Hi all.
I have a .txt file that I need to sort it
My file is like:
1- 88 chain0 MASTER (FF-TE) FFFF 1962510 /TCK T FD2TQHVTT1 /jtagc/jtag_instreg/updateinstr_reg_1 dff1 (TI,SO)
2- ... (10 Replies)
I am trying to find lines in a text file larger than 3 Gb that start with a given string. My command looks like this:
$ look "string" "/home/patrick/filename.txt"
However, this gives me the following message:
"look: /home/patrick/filename.txt: File too large"
So, I have two... (14 Replies)
Hi,
I have line in input file as below:
3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL
My expected output for line in the file must be :
"1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL"
Can someone... (7 Replies)
I need to create a flat file with columns delimited by "\002" (octal 2)
I tried using the simple echo.
name="Adam Smith"
age=40
address="1 main st"
city="New York"
echo ${name}"\002"${age}"\002"${address}"\002"${city} > mytmp
but it creates a delimiter with different octal... (4 Replies)
Hi,
I've modified the syslogd source to include a thread that will keep track of a timer(or a timer thread). My intention is to check the file size of /var/log/messages in every one minute & if the size is more than 128KB, do a echo " " > /var/log/messages, so that the file size will be set... (7 Replies)
Hi Friends,
Can any of you explain me about the below line of code?
mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`
Im not able to understand, what exactly it is doing :confused:
Any help would be useful for me.
Lokesha (4 Replies)