Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

extract(1) [v7 man page]

EXTRACT(1)                                                    General Commands Manual                                                   EXTRACT(1)

NAME
extract - determine meta-information about a file SYNOPSIS
extract [ -bghLnvV ] [ -H hash-algorithm ] [ -i ] [ -l library ] [ -p type ] [ -x type ] file ... DESCRIPTION
This manual page documents version 0.6.0 of the extract command. extract tests each file specified in the argument list in an attempt to infer meta-information from it. Each file is subjected to the meta-data extraction libraries from libextractor. libextractor classifies meta-information (also referred to as keywords) into types. A list of all types can be obtained with the -L option. OPTIONS
-b Display the output in BiBTeX format. -g Use grep-friendly output (all keywords on a single line for each file). Use the verbose option to print the filename first, fol- lowed by the keywords. Use the verbose option twice to also display the keyword types. This option will not print keyword types or non-textual metadata. -h Print a brief summary of the options. -i Run plugins in-process (for debugging). By default, each plugin is run in its own process. -l libraries Use the specified libraries to extract keywords. The general format of libraries is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*] where LIBRARYNAME is a libextractor compatible library and typically of the form .Ijpeg. The minus before the libraryname indicates that this library should be removed from the existing list. To run only a few selected plugins, use -l in combination with -n. -L Print a list of all known keyword types. -n Do not use the default set of extractors (typically all standard extractors, currently mp3, ogg, jpg, gif, png, tiff, real, html, pdf and mime-types), use only the extractors specified with the .B -l option. -p type Print only the keywords matching the specified type. By default, all keywords that are found and not removed as duplicates are printed. -v Print the version number and exit. -V Be verbose. This option can be specified multiple times to increase verbosity further. -x type Exclude keywords of the specified type from the output. By default, all keywords that are found and not removed as duplicates are printed. SEE ALSO
libextractor(3) - description of the libextractor library EXAMPLES
$ extract test/test.jpg comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1 mimetype - image/jpeg $ extract -V -x comment test/test.jpg Keywords for file test/test.jpg: mimetype - image/jpeg $ extract -p comment test/test.jpg comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1 $ extract -nV -l png.so -p comment test/test.jpg test/test.png Keywords for file test/test.jpg: Keywords for file test/test.png: comment - Testing keyword extraction LEGAL NOTICE
libextractor and the extract tool are released under the GPL. libextractor is a GNU package. BUGS
A couple of file-formats (on the order of 10^3) are not recognized... AUTHORS
extract was originally written by Christian Grothoff <christian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use <libextrac- tor@gnu.org> to contact the current maintainer(s). AVAILABILITY
You can obtain the original author's latest version from http://www.gnu.org/software/libextractor/ libextractor 0.6.0 Dec 20, 2009 EXTRACT(1)

Check Out this Related Man Page

LIBEXTRACTOR(3) 					     Library Functions Manual						   LIBEXTRACTOR(3)

NAME
libextractor - meta-information extraction library 0.6.0 SYNOPSIS
#include <extractor.h> const char *EXTRACTOR_metatype_to_string(enum EXTRACTOR_MetaType type); const char *EXTRACTOR_metatype_to_description(enum EXTRACTOR_MetaType type); enum EXTRACTOR_MetaTypeEXTRACTOR_metatype_get_max (void); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_defaults(enum EXTRACTOR_Options flags); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add (struct EXTRACTOR_PluginList * prev, const char * library, const char * options, enum EXTRACTOR_Options flags); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_last(struct EXTRACTOR_PluginList *prev, const char *library, const char *options, enum EXTRACTOR_Options flags); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_config (struct EXTRACTOR_PluginList * prev, const char *config, enum EXTRACTOR_Options flags); struct EXTRACTOR_PluginList *EXTRACTOR_plugin_remove(struct EXTRACTOR_PluginList * prev, const char * library); void EXTRACTOR_plugin_remove_all(struct EXTRACTOR_PluginList *plugins); void EXTRACTOR_extract(struct EXTRACTOR_PluginList *plugins, const char *filename, const void *data, size_t size, EXTRACTOR_MetaDataProces- sor proc, void *proc_cls); int EXTRACTOR_meta_data_print(void * handle, const char *plugin_name, enum EXTRACTOR_MetaType type, enum EXTRACTOR_MetaFormat format, const char *data_mime_type, const char *data, size_t data_len); EXTRACTOR_VERSION DESCRIPTION
GNU libextractor is a simple library for keyword extraction. libextractor does not support all formats but supports a simple plugging mechanism such that you can quickly add extractors for additional formats, even without recompiling libextractor. libextractor typically ships with dozens of plugins that can be used to obtain meta data from common file-types. If you want to write your own plugin for some filetype, all you need to do is write a little library that implements a single method with this signature: int EXTRACTOR_name_extract(const char *data, size_t datasize, EXTRACTOR_MetaDataProcessor proc, void *proc_cls, const char *options); Data is a pointer to the contents of the file and datasize is the size of data. The extract method must call proc for meta data that it finds. The interpretation of options is up to the plugin. The function should return 0 if 'proc' always returned 0, otherwise 1. After 'proc' returned a non-zero value, proc should not be called again. An example implementation can be found in html_extractor.c. Plugins should be automatically found and used once they are installed in the respective directory (typically something like /usr/lib/libextrac- tor/). The application extract gives an example how to use libextractor. The basic use of libextractor is to load the plugins (for example with EXTRACTOR_plugin_add_defaults), then to extract the keyword list using EXTRACTOR_extract, and finally unloading the plugins (with EXTRACTOR_plugin_remove_all). Textual meta data obtained from libextractor is supposed to be UTF-8 encoded if the text encoding is known. Plugins are supposed to con- vert meta-data to UTF-8 if necessary. The EXTRACTOR_meta_data_print function converts the UTF-8 keywords to the character set from the current locale before printing them. SEE ALSO
extract(1) LEGAL NOTICE
libextractor is released under the GPL and a GNU package (http://www.gnu.org/). BUGS
A couple of file-formats (on the order of 10^3) are not recognized... AUTHORS
extract was originally written by Christian Grothoff <christian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use <libextrac- tor@gnu.org> to contact the current maintainer(s). AVAILABILITY
You can obtain the original author's latest version from http://www.gnu.org/software/libextractor/. Dec 14, 2009 LIBEXTRACTOR(3)
Man Page