Problem with extract PDFs from huge files. Post: 303045999

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab...

2. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc...

3. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is...

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a...

5. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ...

6. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . .

7. Shell Programming and Scripting

Search pdfs in command line

Hi, I'm trying to search for a particular phrase in a large number of PDFs in a particular directory. What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears. find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase" ...

8. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file. I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far.. Any suggestions please ? Thanks !!

9. Shell Programming and Scripting

Extract few content from a huge list of files

I have a huge list of files (about 300,000) which have a pattern like this. .I 1 .U 87049087 .S Am J Emerg .M Allied Health Personnel/*; Electric Countershock/*; .T Refibrillation managed by EMT-Ds: .P ARTICLE. .W Some patients converted from ventricular fibrillation to organized...

10. Shell Programming and Scripting

Bash script monitor directory and subdirectories for new pdfs

I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors. #!/bin/bash SYSDIR="/var/www/html/Intranet" HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs" FEEDTITLE="Najnoviji dokumenti na...

LEARN ABOUT CENTOS

tracker-extract

tracker-extract(1)						   User Commands						tracker-extract(1)

NAME

       tracker-extract - Extract metadata from a file.

SYNOPSYS

       tracker-extract [OPTION...] FILE...

DESCRIPTION

       tracker-extract reads the file and mimetype provided in stdin and extract the metadata from this file; then it displays the metadata on the
       standard output.

       NOTE: If a FILE is not provided then tracker-extract will run for 30 seconds waiting for DBus calls before quitting.

OPTIONS

       -?, --help
	      Show summary of options.

       -v, --verbosity=N
	      Set verbosity to N. This overrides the config value.  Values include 0=errors, 1=minimal, 2=detailed and 3=debug.

       -f, --file=FILE
	      The FILE to extract metadata from. The FILE argument can be either a local path or a URI. It also does not have to  be  an  absolute
	      path.

       -m, --mime=MIME
	      The MIME type to use for the file. If one is not provided, it will be guessed automatically.

       -d, --disable-shutdown
	      Disable shutting down after 30 seconds of inactivity.

       -i, --force-internal-extractors
	      Use this option to force internal extractors over 3rd parties like libstreamanalyzer.

       -m, --force-module=MODULE
	      Force  a particular module to be used. This is here as a convenience for developers wanting to test their MODULE file. Only the MOD-
	      ULE name has to be specified, not the full path. Typically, a MODULE is installed  to  /usr/lib/tracker-0.7/extract-modules/.   This
	      option can be used with or without the .so part of the name too, for example, you can use --force-module=foo

	      Modules are shared objects which are dynamically loaded at run time. These files must have the .so suffix to be loaded and must con-
	      tain the correct symbols to be authenticated by tracker-extract.	For more information see the libtracker-extract reference documen-
	      tation.

       -V, --version
	      Show binary version.

EXAMPLES

       Using command line to extract metadata from a file:

	       $ tracker-extract -v 3 -f /path/to/some/file.mp3

       Using a specific module to extract metadata from a file:

	       $ tracker-extract -v 3 -f /path/to/some/file.mp3 -m mymodule

ENVIRONMENT

       TRACKER_EXTRACTORS_DIR
	      This  is	the directory which tracker uses to load the shared libraries from (used for extracting metadata for specific file types).
	      These are needed on each invocation of tracker-store. If unset it will default to the correct place. This is used mainly for testing
	      purposes. The default location is /usr/lib/tracker-0.10/extract-modules/.

       TRACKER_EXTRACTOR_RULES_DIR
	      This  is	the  directory which tracker uses to load the rules files from.  The rules files describe extractor modules and their sup-
	      ported MIME types. The default location is /usr/share/tracker/extract-rules/.

       TRACKER_USE_CONFIG_FILES
	      Don't use GSettings, instead use a config file similar to how settings were saved in 0.10.x. That is, a file which is much  like	an
	      .ini file.  These are saved to $HOME/.config/tracker/

SEE ALSO

       tracker-store(1), tracker-sparql(1), tracker-stats(1), tracker-info(1).

       /usr/lib/tracker-0.10/extract-modules/

       /usr/share/tracker/extract-rules/

GNU
								     July 2007							tracker-extract(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract data from a huge file?

Discussion started by: srsahu75

2. Shell Programming and Scripting

How to extract a piece of information from a huge file

Discussion started by: Marcor

3. Shell Programming and Scripting

How to extract a subset from a huge dataset

Discussion started by: cliffyiu

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100