GNU libextractor 0.5.20c (Default branch)


 
Thread Tools Search this Thread
Special Forums News, Links, Events and Announcements Software Releases - RSS News GNU libextractor 0.5.20c (Default branch)
# 1  
Old 07-14-2008
GNU libextractor 0.5.20c (Default branch)

libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python. License: GNU General Public License (GPL) Changes:
This release fixes locale paths (for translations). It also ensures that plugin loading and unloading are thread-safe. Some linkage errors on OpenBSD were resolved. An experimental thumbnail extractor based on ffmpeg was added (but is not enabled by default due to security concerns).Image

More...
Login or Register to Ask a Question

Previous Thread | Next Thread
Login or Register to Ask a Question
LIBEXTRACTOR(3) 					     Library Functions Manual						   LIBEXTRACTOR(3)

NAME
libextractor - meta-information extraction library 0.6.0 SYNOPSIS
#include <extractor.h> const char *EXTRACTOR_metatype_to_string(enum EXTRACTOR_MetaType type); const char *EXTRACTOR_metatype_to_description(enum EXTRACTOR_MetaType type); enum EXTRACTOR_MetaTypeEXTRACTOR_metatype_get_max (void); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_defaults(enum EXTRACTOR_Options flags); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add (struct EXTRACTOR_PluginList * prev, const char * library, const char * options, enum EXTRACTOR_Options flags); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_last(struct EXTRACTOR_PluginList *prev, const char *library, const char *options, enum EXTRACTOR_Options flags); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_config (struct EXTRACTOR_PluginList * prev, const char *config, enum EXTRACTOR_Options flags); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_remove(struct EXTRACTOR_PluginList * prev, const char * library); void EXTRACTOR_plugin_remove_all(struct EXTRACTOR_PluginList *plugins); void EXTRACTOR_extract(struct EXTRACTOR_PluginList *plugins, const char *filename, const void *data, size_t size, EXTRACTOR_MetaDataProces- sor proc, void *proc_cls); int EXTRACTOR_meta_data_print(void * handle, const char *plugin_name, enum EXTRACTOR_MetaType type, enum EXTRACTOR_MetaFormat format, const char *data_mime_type, const char *data, size_t data_len); EXTRACTOR_VERSION DESCRIPTION
GNU libextractor is a simple library for keyword extraction. libextractor does not support all formats but supports a simple plugging mechanism such that you can quickly add extractors for additional formats, even without recompiling libextractor. libextractor typically ships with dozens of plugins that can be used to obtain meta data from common file-types. If you want to write your own plugin for some filetype, all you need to do is write a little library that implements a single method with this signature: int EXTRACTOR_name_extract(const char *data, size_t datasize, EXTRACTOR_MetaDataProcessor proc, void *proc_cls, const char *options); Data is a pointer to the contents of the file and datasize is the size of data. The extract method must call proc for meta data that it finds. The interpretation of options is up to the plugin. The function should return 0 if 'proc' always returned 0, otherwise 1. After 'proc' returned a non-zero value, proc should not be called again. An example implementation can be found in html_extractor.c. Plugins should be automatically found and used once they are installed in the respective directory (typically something like /usr/lib/libextrac- tor/). The application extract gives an example how to use libextractor. The basic use of libextractor is to load the plugins (for example with EXTRACTOR_plugin_add_defaults), then to extract the keyword list using EXTRACTOR_extract, and finally unloading the plugins (with EXTRACTOR_plugin_remove_all). Textual meta data obtained from libextractor is supposed to be UTF-8 encoded if the text encoding is known. Plugins are supposed to con- vert meta-data to UTF-8 if necessary. The EXTRACTOR_meta_data_print function converts the UTF-8 keywords to the character set from the current locale before printing them. SEE ALSO
extract(1) LEGAL NOTICE
libextractor is released under the GPL and a GNU package (http://www.gnu.org/). BUGS
A couple of file-formats (on the order of 10^3) are not recognized... AUTHORS
extract was originally written by Christian Grothoff <christian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use <libextrac- tor@gnu.org> to contact the current maintainer(s). AVAILABILITY
You can obtain the original author's latest version from http://www.gnu.org/software/libextractor/. Dec 14, 2009 LIBEXTRACTOR(3)