Best way to dump metadata to file: when and by who? Post: 302329710

Sponsored Content

Top Forums Programming Best way to dump metadata to file: when and by who? Post 302329710 by emitrax on Monday 29th of June 2009 09:05:09 AM

06-29-2009

Registered User

Best way to dump metadata to file: when and by who?

Hi,

my application (actually library) indexes a file of many GB producing tables (arrays of offset and length of the data indexed) for later reuse. The tables produced are pretty big too, so big that I ran out of memory in my process (3GB limit), when indexing more than 8GB of file or so. Although I could fork another process to work around the memory limit size, this would not fix the problem, so I'd like to dump the tables to a file in order to free the memory, and avoid to re-index the same file more than once.

Bear in mind that currently, the tables produced are kept in memory in a single-linked list, shared with another thread that use it to produce another list of filtered data. So I'd rather not change this schema. The other thread only access the list once the whole file has been indexed.

Now, the questions I'm asking myself are:

- When and how it's best time to dump the tables to a file?

Dumping a table as it gets full doesn't sound very efficient to me. Would I keep nothing in memory? The linked list would always be empty? If I decide to keep N tables in memory, and dump every N, how do I avoid making a check for how many tables I have
in memory at every cycle ?

- Who should dump the metadata produced to file? Different thread? Same thread that index the data? I also wouldn't like to produce metadata files when the file processed is less then a giga (small file case), but at the same time I wouldn't want to complex the code of the indexer, that right now is pretty simply: parse, find the data, create an entry table, add it. If the table is full, create another one and add it to the linked list.

- Let's say I figured out (thanks to you) the best way (in my case) to dump the metadata. What policy should I use to load the data in order to let the other thread
filtering the index data without radically changing the way it works now (e.g. through the linked list) ?

One solution that come to my mind, that would avoid a drastical change in my schema is to create a "list manager" that would provide an interface to add and retrieve element from the list. This entity (either a thread or a process) would take care of keeping some data in memory (linked list) and some other in the file.

Please share with me your skill and experience! :-)

Thanks in advance.

Regards,
S.

emitrax

View Public Profile for emitrax

Find all posts by emitrax

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

help, what is the difference between core dump and panic dump?

2. Shell Programming and Scripting

Importing dump file

Hi, I am trying to import 22 .dmp files but facing the problem with the last table file it never ends the import command, only the table is created but the rows of the table don't get imported. This is the problem with only ine table rest 21 tables are being imported properly. Thanks in...

3. Shell Programming and Scripting

Dump an array in a file

Hi, I'm wondering if there's a way to dump the content of an array into a specified part of a file. I know that I can redirect the output in a way that the array adds in the text file, this is done with ">>", but doing by this way, puts the array at the end of the file, and I'm asking for some...

4. Programming

How to use a core dump file

Hi All, May be it is a stupid question, but, I would like to know what is the advantage using a core dump file at the moment of debugging using gdb. I know a core dump has information about the state of the application when it crashed, but, what is the difference between debugging using the...

5. Shell Programming and Scripting

print metadata to jpg

Hi all, I would like to write a scipt that gets gps informatoin from a jpg and print 's it on the lower left corner, In order to get the gps data I have found a tool called jhead. In know that with the help of the imagemagick command convert it is possible to print text on the pictures. ...

6. UNIX for Dummies Questions & Answers

dump display to a file

Hi: I want to dump whatever command I type on the terminal + whatever is the result of that command's execution to one file. Is it possible in unix? Rgds, Indu

7. UNIX for Advanced & Expert Users

LVM - restore metadata on other disk

Hi guys, I would like to ask your opinion about my theory, how to fix my broken LVM without risking any data loss. I use Archlinux at home. I just love this distro, even it gives me a lots of work (particularly after system updates). Basic system spec: AMD FX(tm)-6100 Six-Core Processor...

8. Solaris

Solaris 11.2 dump device "kernel without ZFS metadata"

I've never seen this, is it normal for 11.2? Anyway to change it back to dumping metadata or is this simply an overly verbose message I may ignore? kernel without ZFS metadata

9. UNIX and Linux Applications

About gvfsd-metadata

I need a hint about gvfsd-metadata using mate on bsd. Or dual-core cpu, quad-core cpu ore an old laptop single core, the gvfsd is an obstacle and does not accelerate anything, vice versa, it slows down many processes, coming from gnome. So someone can give me a hint how to wipe it out for good? I...

LEARN ABOUT DEBIAN

mydumper

MYDUMPER(1)							     mydumper							       MYDUMPER(1)

NAME

       mydumper - multi-threaded MySQL dumping

SYNOPSIS

       mydumper [OPTIONS]

DESCRIPTION

       mydumper  is a tool used for backing up MySQL database servers much faster than the mysqldump tool distributed with MySQL.  It also has the
       capability to retrieve the binary logs from the remote server at the same time as the dump itself.  The advantages of mydumper are:

	  o Parallelism (hence, speed) and performance (avoids expensive character set conversion routines, efficient code overall)

	  o Easier to manage output (separate files for tables, dump metadata, etc, easy to view/parse data)

	  o Consistency - maintains snapshot across all threads, provides accurate master and slave log positions, etc

	  o Manageability - supports PCRE for specifying database and tables inclusions and exclusions

OPTIONS

       The mydumper tool has several available options:

       --help Show help text

       --host, -h
	      Hostname of MySQL server to connect to (default localhost)

       --user, -u
	      MySQL username with the correct privileges to execute the dump

       --password, -p
	      The corresponding password for the MySQL user

       --port, -P
	      The port for the MySQL connection.

       Note   For localhost TCP connections use 127.0.0.1 for --host.

       --socket, -S
	      The UNIX domain socket file to use for the connection

       --database, -B
	      Database to dump

       --table-list, -T
	      A comma separated list of tables to dump

       --threads, -t
	      The number of threads to use for dumping data, default is 4

       Note   Other threads are used in mydumper, this option does not control these

       --outputdir, -o
	      Output directory name, default is export-YYYYMMDD-HHMMSS

       --statement-size, -s
	      The maximum size for an insert statement before breaking into a new statement, default 1,000,000 bytes

       --rows, -r
	      Split table into chunks of this many rows, default unlimited

       --compress, -c
	      Compress the output files

       --compress-input, -C
	      Use client protocol compression for connections to the MySQL server

       --build-empty-files, -e
	      Create empty dump files if there is no data to dump

       --regex, -x
	      A regular expression to match against database and table

       --ignore-engines, -i
	      Comma separated list of storage engines to ignore

       --no-schemas, -m
	      Do not dump schemas with the data

       --long-query-guard, -l
	      Timeout for long query execution in seconds, default 60

       --kill-long-queries, -k
	      Kill long running queries instead of aborting the dump

       --version, -V
	      Show the program version and exit

       --verbose, -v
	      The verbosity of messages.  0 = silent, 1 = errors, 2 = warnings, 3 = info.  Default is 2.

       --binlogs, -b
	      Get the binlogs from the server as well as the dump files

       --daemon, -D
	      Enable daemon mode

       --snapshot-interval, -I
	      Interval between each dump snapshot (in minutes), requires --daemon, default 60 (minutes)

       --logfile, -L
	      A file to log mydumper output to instead of console output.  Useful for daemon mode.

       --no-locks, -k
	      Do not execute the temporary shared read lock.

       Warning
	      This will cause inconsistent backups.

AUTHOR

       Andrew Hutchings

COPYRIGHT

       2011, Andrew Hutchings

0.5.1								   June 09, 2012						       MYDUMPER(1)