Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Problem with extract PDFs from huge files. Post 303045990 by mrAibo on Tuesday 21st of April 2020 06:09:11 AM
Old 04-21-2020
Problem with extract PDFs from huge files.

Hello Unix experts,

we have a problem Smilie
We have some binary files ~25GB. In this files are many (millions) PDF files included.
How we can extract them from such huge files? In small files I got it with the command:
Code:
awk -v FS="(%PDF-1.4|%%EOF)" '{print $2}' FILE > OUTPUTDIR

so the PDF file begins with PDF-1.? and ends with %%EOF
but it don't works on such big files. So we need another way to extract them.

Thanks in advance!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies

2. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc... (2 Replies)
Discussion started by: Marcor
2 Replies

3. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

5. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ... (4 Replies)
Discussion started by: ad23
4 Replies

6. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . . (4 Replies)
Discussion started by: patrick87
4 Replies

7. Shell Programming and Scripting

Search pdfs in command line

Hi, I'm trying to search for a particular phrase in a large number of PDFs in a particular directory. What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears. find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase" ... (2 Replies)
Discussion started by: lost.identity
2 Replies

8. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file. I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far.. Any suggestions please ? Thanks !! (9 Replies)
Discussion started by: Kannan K
9 Replies

9. Shell Programming and Scripting

Extract few content from a huge list of files

I have a huge list of files (about 300,000) which have a pattern like this. .I 1 .U 87049087 .S Am J Emerg .M Allied Health Personnel/*; Electric Countershock/*; .T Refibrillation managed by EMT-Ds: .P ARTICLE. .W Some patients converted from ventricular fibrillation to organized... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

10. Shell Programming and Scripting

Bash script monitor directory and subdirectories for new pdfs

I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors. #!/bin/bash SYSDIR="/var/www/html/Intranet" HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs" FEEDTITLE="Najnoviji dokumenti na... (20 Replies)
Discussion started by: markus1981
20 Replies
innoextract(1)						      General Commands Manual						    innoextract(1)

NAME
innoextract - tool to extract installers created by Inno Setup SYNOPSIS
innoextract [-behlLqstv] [-ccolor] [-pprogress] installers ... DESCRIPTION
innoextract is a tool that can extract installer executables created by Inno Setup. innoextract will extract files from a installers specified on the command line. To extract a multi-part installer with external data files, only the executable (.exe) file needs to be given as an argument to innoex- tract. OPTIONS
-c --color [enable] By default innoextract will try to detect if the terminal supports shell escape codes and enable or disable color output accord- ingly. Pass 1 or true to --color to force color output. Pass 0 or false to never output color codes. --dump Don't convert Windows paths to UNIX paths and don't substitute variables in paths. -e --extract Extract all files to the current directory. This is the default action. You may only specify one of --extract , --list and --test -h --help Show a list of the supported options. --language [lang] Extract only language-independent files and files for the given language. By default all files are extracted. --license Show license information. -l --list List files contained in the installer but don't extract anything. You may only specify one of --extract , --list and --test -L --lowercase Convert filenames stored in the installer to lower-case before extracting. -p --progress [enable] By default innoextract will try to detect if the terminal supports shell escape codes and enable or disable progress bar output accordingly. Pass 1 or true to --progress to force progress bar output. Pass 0 or false to never show a progress bar. -q --quiet Less verbose output. -s --silent Don't output anything except errors and warnings. -t --test Test archive integrity but don't write any output files. You may only specify one of --extract , --list and --test -v --version Show the innoextract version number and supported Inno Setup versions. LIMITATIONS
innoextract currently only supports extracting all the data. There is no support for extracting individual files, components or languages. Included scripts and checks are not executed. Data is always extracted to the current directory and the mapping from Inno Setup variables like the application directory to subdirecto- ries is hard-coded. innoextract does not check if an installer includes multiple files with the same name and will continually overwrite the destination file when extracting. Names for data files in multi-file installers must follow the standard naming scheme. Encrypted installers are not supported. SEE ALSO
cabextract(1), unshield(1) BUGS
No known bugs. AUTHOR
Daniel Scharrer (daniel@constexpr.org) 1.2 2012-04-01 innoextract(1)
All times are GMT -4. The time now is 08:11 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy