![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| High Level Programming Post questions about C, C++, Java, SQL, and other programming languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to add metadata to digital pictures from the command line | iBot | UNIX and Linux RSS News | 0 | 09-19-2008 03:30 PM |
| Hachoir metadata 1.2 (Default branch) | iBot | Software Releases - RSS News | 0 | 09-03-2008 08:30 PM |
| Yet Another MetaData Injector for FLV 1.4 (Default branch) | iBot | Software Releases - RSS News | 0 | 05-25-2008 10:10 AM |
| Yet Another MetaData Injector for FLV 1.3 (Default branch) | iBot | Software Releases - RSS News | 0 | 04-27-2008 05:30 PM |
| help, what is the difference between core dump and panic dump? | aileen | UNIX for Dummies Questions & Answers | 1 | 06-11-2001 09:08 PM |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
||||
|
Best way to dump metadata to file: when and by who?
Hi,
my application (actually library) indexes a file of many GB producing tables (arrays of offset and length of the data indexed) for later reuse. The tables produced are pretty big too, so big that I ran out of memory in my process (3GB limit), when indexing more than 8GB of file or so. Although I could fork another process to work around the memory limit size, this would not fix the problem, so I'd like to dump the tables to a file in order to free the memory, and avoid to re-index the same file more than once. Bear in mind that currently, the tables produced are kept in memory in a single-linked list, shared with another thread that use it to produce another list of filtered data. So I'd rather not change this schema. The other thread only access the list once the whole file has been indexed. Now, the questions I'm asking myself are: - When and how it's best time to dump the tables to a file? Dumping a table as it gets full doesn't sound very efficient to me. Would I keep nothing in memory? The linked list would always be empty? If I decide to keep N tables in memory, and dump every N, how do I avoid making a check for how many tables I have in memory at every cycle ? - Who should dump the metadata produced to file? Different thread? Same thread that index the data? I also wouldn't like to produce metadata files when the file processed is less then a giga (small file case), but at the same time I wouldn't want to complex the code of the indexer, that right now is pretty simply: parse, find the data, create an entry table, add it. If the table is full, create another one and add it to the linked list. - Let's say I figured out (thanks to you) the best way (in my case) to dump the metadata. What policy should I use to load the data in order to let the other thread filtering the index data without radically changing the way it works now (e.g. through the linked list) ? One solution that come to my mind, that would avoid a drastical change in my schema is to create a "list manager" that would provide an interface to add and retrieve element from the list. This entity (either a thread or a process) would take care of keeping some data in memory (linked list) and some other in the file. Please share with me your skill and experience! :-) Thanks in advance. Regards, S. |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|