Problem with extract PDFs from huge files. Post: 303045999

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab...

2. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc...

3. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is...

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a...

5. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ...

6. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . .

7. Shell Programming and Scripting

Search pdfs in command line

Hi, I'm trying to search for a particular phrase in a large number of PDFs in a particular directory. What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears. find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase" ...

8. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file. I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far.. Any suggestions please ? Thanks !!

9. Shell Programming and Scripting

Extract few content from a huge list of files

I have a huge list of files (about 300,000) which have a pattern like this. .I 1 .U 87049087 .S Am J Emerg .M Allied Health Personnel/*; Electric Countershock/*; .T Refibrillation managed by EMT-Ds: .P ARTICLE. .W Some patients converted from ventricular fibrillation to organized...

10. Shell Programming and Scripting

Bash script monitor directory and subdirectories for new pdfs

I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors. #!/bin/bash SYSDIR="/var/www/html/Intranet" HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs" FEEDTITLE="Najnoviji dokumenti na...

LEARN ABOUT OPENSOLARIS

bfs

bfs(1)								   User Commands							    bfs(1)

NAME

       bfs - big file scanner

SYNOPSIS

       /usr/bin/bfs [-] filename

DESCRIPTION

       The  bfs command is (almost) like ed(1) except that it is read-only and processes much larger files. Files can be up to 1024K bytes and 32K
       lines, with up to 512 characters, including new-line, per line (255 for 16-bit machines). bfs is usually  more  efficient  than	ed(1)  for
       scanning  a  file, since the file is not copied to a buffer. It is most useful for identifying sections of a large file where csplit(1) can
       be used to divide it into more manageable pieces for editing.

       Normally, the size of the file being scanned is printed, as is the size of any file written with the w (write) command. The optional - sup-
       presses printing of sizes. Input is prompted with * if P and a carriage return are typed, as in ed(1). Prompting can be turned off again by
       inputting another P and carriage return. Note that messages are given in response to errors if prompting is turned on.

       All address expressions described under ed(1) are supported. In addition, regular expressions may be surrounded with two symbols besides  /
       and ?:

       >    indicates downward search without wrap-around, and

       <    indicates upward search without wrap-around.

       There is a slight difference in mark names; that is, only the letters a through z may be used, and all 26 marks are remembered.

   bfs Commands
       The  e,	g,  v,	k,  p,	q, w, =, !, and null commands operate as described under ed(1). Commands such as ---, +++-, +++=, -12, and +4p are
       accepted. Note that 1,10p and 1,10 will both print  the first ten lines. The f command only prints the name  of	the  file  being  scanned;
       there is no  remembered file name. The  w command is independent of output diversion, truncation, or crunching (see the xo, xt, and xc com-
       mands, below). The following additional commands are available:

       xf file

	   Further commands are taken from the named file. When an end-of-file is reached, an interrupt signal is received  or	an  error  occurs,
	   reading resumes with the file containing the xf. The xf commands may be nested to a depth of 10.

       xn

	   List the marks currently in use (marks are set by the k command).

       xo [file]

	   Further  output  from  the  p  and  null commands is diverted to the named file, which, if necessary, is created mode 666 (readable and
	   writable by everyone), unless your umask setting (see umask(1)) dictates otherwise. If file is missing, output is diverted to the stan-
	   dard output. Note that each diversion causes truncation or creation of the file.

       : label

	   This  positions  a  label in a command file. The label is terminated by new-line, and blanks between the : (colon) and the start of the
	   label are ignored. This command may also be used to insert comments into a command file, since labels need not be referenced.

       ( . , . )xb/regular expression/label

	   A jump (either upward or downward) is made to label if the command succeeds. It fails under any of the following conditions:

	       1.     Either address is not between 1 and $.

	       2.     The second address is less than the first.

	       3.     The regular expression does not match at least one line in the specified range, including the first and last lines.
	   On success, . (dot) is set to the line matched and a jump is made to label. This command is the only one that does not issue  an  error
	   message  on	bad  addresses, so it may be used to test whether addresses are bad before other commands are executed. Note that the com-
	   mand, xb/^/ label, is an unconditional jump.

	   The xb command is allowed only if it is read from someplace other than a terminal. If it is read from a pipe, only a downward  jump	is
	   possible.

       xt number

	   Output from the p and null commands is truncated to, at most, number characters. The initial number is 255.

       xv[digit][spaces][value]

	   The variable name is the specified digit following the xv. The commands xv5100 or xv5 100 both assign the value  100 to the variable 5.
	   The command xv61,100p assigns the value 1,100p to the variable 6. To reference a variable, put a % in front of the variable	name.  For
	   example, using the above assignments for variables 5 and 6:

	     1,%5p
	     1,%5
	     %6

	   will all print the first 100 lines.

	   g/%5/p

	   would  globally search for the characters 100 and print each line containing a match. To escape the special meaning of %, a  must pre-
	   cede it.

	   g/".*\%[cds]/p

	   could be used to match and list %c, %d, or %s formats (for example, "printf"-like  statements)  of  characters,  decimal  integers,	or
	   strings.  Another  feature of the xv command is that the first line of output from a UNIX system command can be stored into a variable.
	   The only requirement is that the first character of value be an !. For example:

	     .w junk
	     xv5!cat junk
	     !rm junk
	     !echo "%5"
	     xv6!expr %6 + 1

	   would put the current line into variable 35, print it, and increment the variable 36 by one. To escape the special meaning of ! as  the
	   first character of value, precede it with a .

	   xv7!date

	   stores the value !date into variable 7.

       xbz label
       xbn label

	   These  two  commands  will  test  the  last	saved return code from the execution of a UNIX system command (!command) or nonzero value,
	   respectively, to the specified label. The two examples below both  search for the next five lines containing the string size:

	   Example 1:
			   xv55
			   : l
			   /size/
			   xv5!expr %5 - 1
			   !if 0%5 != 0 exit 2
			   xbn l

	   Example 2:
			   xv45
			   : l
			   /size/
			   xv4!expr %4 - 1
			   !if 0%4 = 0 exit 2
			   xbz l

       xc [switch]

	   If switch is 1, output from the p and null commands is crunched; if switch is 0, it is not. Without an argument,  xc  reverses  switch.
	   Initially,  switch  is  set	for no crunching. Crunched output has strings of tabs and blanks reduced to one blank and blank lines sup-
	   pressed.

OPERANDS

       The following operand is supported:

       filename    Any file up to 1024K bytes and 32K lines, with up to 512 characters, including new-line, per line (255  for	16-bit	machines).
		   filename  can  be  a  section  of  a larger file which has been divided into more manageable sections for editing by the use of
		   csplit(1).

EXIT STATUS

       The following exit values are returned:

       0     Successful completion without any file or command errors.

       >0    An error occurred.

ATTRIBUTES

       See attributes(5) for descriptions of the following attributes:

       +-----------------------------+-----------------------------+
       |      ATTRIBUTE TYPE	     |	    ATTRIBUTE VALUE	   |
       +-----------------------------+-----------------------------+
       |Availability		     |SUNWesu			   |
       +-----------------------------+-----------------------------+

SEE ALSO

       csplit(1), ed(1), umask(1), attributes(5)

DIAGNOSTICS

       Message is ? for errors in commands, if prompting is turned off. Self-explanatory error messages are displayed when prompting is on.

SunOS 5.11							    20 May 1996 							    bfs(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract data from a huge file?

Discussion started by: srsahu75

2. Shell Programming and Scripting

How to extract a piece of information from a huge file

Discussion started by: Marcor

3. Shell Programming and Scripting

How to extract a subset from a huge dataset

Discussion started by: cliffyiu

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100

5. Shell Programming and Scripting

Problem running Perl Script with huge data files

Discussion started by: ad23

6. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

Discussion started by: patrick87

7. Shell Programming and Scripting

Search pdfs in command line

Discussion started by: lost.identity