The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > High Level Programming
.
google unix.com



High Level Programming Post questions about C, C++, Java, SQL, and other programming languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to add metadata to digital pictures from the command line iBot UNIX and Linux RSS News 0 09-19-2008 02:30 PM
Hachoir metadata 1.2 (Default branch) iBot Software Releases - RSS News 0 09-03-2008 07:30 PM
Yet Another MetaData Injector for FLV 1.4 (Default branch) iBot Software Releases - RSS News 0 05-25-2008 09:10 AM
Yet Another MetaData Injector for FLV 1.3 (Default branch) iBot Software Releases - RSS News 0 04-27-2008 04:30 PM
help, what is the difference between core dump and panic dump? aileen UNIX for Dummies Questions & Answers 1 06-11-2001 08:08 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-29-2009
emitrax emitrax is offline
Registered User
  
 

Join Date: Apr 2009
Posts: 37
Best way to dump metadata to file: when and by who?

Hi,

my application (actually library) indexes a file of many GB producing tables (arrays of offset and length of the data indexed) for later reuse. The tables produced are pretty big too, so big that I ran out of memory in my process (3GB limit), when indexing more than 8GB of file or so. Although I could fork another process to work around the memory limit size, this would not fix the problem, so I'd like to dump the tables to a file in order to free the memory, and avoid to re-index the same file more than once.

Bear in mind that currently, the tables produced are kept in memory in a single-linked list, shared with another thread that use it to produce another list of filtered data. So I'd rather not change this schema. The other thread only access the list once the whole file has been indexed.

Now, the questions I'm asking myself are:

- When and how it's best time to dump the tables to a file?

Dumping a table as it gets full doesn't sound very efficient to me. Would I keep nothing in memory? The linked list would always be empty? If I decide to keep N tables in memory, and dump every N, how do I avoid making a check for how many tables I have
in memory at every cycle ?

- Who should dump the metadata produced to file? Different thread? Same thread that index the data? I also wouldn't like to produce metadata files when the file processed is less then a giga (small file case), but at the same time I wouldn't want to complex the code of the indexer, that right now is pretty simply: parse, find the data, create an entry table, add it. If the table is full, create another one and add it to the linked list.

- Let's say I figured out (thanks to you) the best way (in my case) to dump the metadata. What policy should I use to load the data in order to let the other thread
filtering the index data without radically changing the way it works now (e.g. through the linked list) ?

One solution that come to my mind, that would avoid a drastical change in my schema is to create a "list manager" that would provide an interface to add and retrieve element from the list. This entity (either a thread or a process) would take care of keeping some data in memory (linked list) and some other in the file.

Please share with me your skill and experience! :-)

Thanks in advance.

Regards,
S.
  #2 (permalink)  
Old 06-30-2009
otheus's Avatar
otheus otheus is offline Forum Staff  
Moderator ala Mode
  
 

Join Date: Feb 2007
Location: Innsbruck, Austria
Posts: 1,864
Wow, what a question. Are you re-engineering a database system?
Quote:
- When and how it's best time to dump the tables to a file?
On slightly-less than gigabyte boundaries. Actually, 256 kB blocks also work very well.
Quote:
- Who should dump the metadata produced to file? Different thread?
If it's in a different thread, what's the point? You can't just free the memory if the other thread still has a lock on it.
Quote:
What policy should I use to load the data
I don't think that's answerable unless one really knows your existing software architecture.
  #3 (permalink)  
Old 07-08-2009
emitrax emitrax is offline
Registered User
  
 

Join Date: Apr 2009
Posts: 37
Quote:
Originally Posted by otheus View Post
Wow, what a question. Are you re-engineering a database system?
Nope. I'm just trying to write an application as efficient as possible, that needs to dump indexes table, and I'd like to learn as much as possible from this experience.

Quote:
Originally Posted by otheus View Post
On slightly-less than gigabyte boundaries. Actually, 256 kB blocks also work very well.
Do you mean to execute an fwrite of a 256KB buffer? Currently I have a list where every element (table) is an array of N entry, for a total size of 4KB per array, and I dump every table at once with a single fwrite.

Quote:
Originally Posted by otheus View Post
If it's in a different thread, what's the point? You can't just free the memory if the other thread still has a lock on it.

I don't think that's answerable unless one really knows your existing software architecture.
Basically one thread (A) indexes the file, while another thread (B) waits for it to finish, in order to use the produced tables (which I used to keep in memory) to process the data in the file. The problem is that the file indexed are huge (~30GB) and produce more than 4GB of data, which I can't keep in memory (limit of 3GB per process) so, at one point or another I have to dump the data produced in a file in order to free the memory.

The other thread (B), based on a flag, either read the tables from the file or the list in memory.

Thanks for your help,
S.
  #4 (permalink)  
Old 07-08-2009
otheus's Avatar
otheus otheus is offline Forum Staff  
Moderator ala Mode
  
 

Join Date: Feb 2007
Location: Innsbruck, Austria
Posts: 1,864
I cannot help other than to quote an old software design maxim:

Quote:
don't reinvent the wheel
  #5 (permalink)  
Old 07-08-2009
emitrax emitrax is offline
Registered User
  
 

Join Date: Apr 2009
Posts: 37
Quote:
Originally Posted by otheus View Post
I cannot help other than to quote an old software design maxim:
You mean I should use a database for holding the tables, like sqlite ?
  #6 (permalink)  
Old 07-08-2009
otheus's Avatar
otheus otheus is offline Forum Staff  
Moderator ala Mode
  
 

Join Date: Feb 2007
Location: Innsbruck, Austria
Posts: 1,864
Which database primarily depends on how you many indexable and unique columns you have, on the ratio of readers to writers. sqlite? LOL. I was thinking more along the lines of MySQL or BerkelyDB/SleepyCat DB .
  #7 (permalink)  
Old 07-08-2009
emitrax emitrax is offline
Registered User
  
 

Join Date: Apr 2009
Posts: 37
Quote:
Originally Posted by otheus View Post
Which database primarily depends on how you many indexable and unique columns you have, on the ratio of readers to writers. sqlite? LOL. I was thinking more along the lines of MySQL or BerkelyDB/SleepyCat DB .
That's why I wouldn't want to use a database. The work involved, and the dependency produced, is not worth it in my case (IMHO).

I only have one writer, and one reader.

Data are written sequentially, and never modified. Write once, read many.

An ad-hoc solution I thought would be my best way to go.

I appreciate your thought on this.

Thanks,
S.
Sponsored Links
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 12:04 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language translation by Google.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0