Visit The New, Modern Unix Linux Community


Problem with extract PDFs from huge files.


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Problem with extract PDFs from huge files.
# 1  
Problem with extract PDFs from huge files.

Hello Unix experts,

we have a problem Smilie
We have some binary files ~25GB. In this files are many (millions) PDF files included.
How we can extract them from such huge files? In small files I got it with the command:
Code:
awk -v FS="(%PDF-1.4|%%EOF)" '{print $2}' FILE > OUTPUTDIR

so the PDF file begins with PDF-1.? and ends with %%EOF
but it don't works on such big files. So we need another way to extract them.

Thanks in advance!
# 2  
Quote:
We have some binary files
What kind of binary file, exactly? What is the file extension?
# 3  
This is the IBM Spectrum Protect (TSM) bfs files.

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #332
Difficulty: Easy
It is possible to format a write-protected DVD-R hard disk.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script monitor directory and subdirectories for new pdfs

I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors. #!/bin/bash SYSDIR="/var/www/html/Intranet" HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs" FEEDTITLE="Najnoviji dokumenti na... (20 Replies)
Discussion started by: markus1981
20 Replies

2. Shell Programming and Scripting

Extract few content from a huge list of files

I have a huge list of files (about 300,000) which have a pattern like this. .I 1 .U 87049087 .S Am J Emerg .M Allied Health Personnel/*; Electric Countershock/*; .T Refibrillation managed by EMT-Ds: .P ARTICLE. .W Some patients converted from ventricular fibrillation to organized... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

3. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file. I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far.. Any suggestions please ? Thanks !! (9 Replies)
Discussion started by: Kannan K
9 Replies

4. Shell Programming and Scripting

Search pdfs in command line

Hi, I'm trying to search for a particular phrase in a large number of PDFs in a particular directory. What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears. find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase" ... (2 Replies)
Discussion started by: lost.identity
2 Replies

5. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . . (4 Replies)
Discussion started by: patrick87
4 Replies

6. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ... (4 Replies)
Discussion started by: ad23
4 Replies

7. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

8. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies

9. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc... (2 Replies)
Discussion started by: Marcor
2 Replies

10. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies

Featured Tech Videos