Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Problem with extract PDFs from huge files. Post 303045999 by mrAibo on Tuesday 21st of April 2020 10:03:53 AM
Old 04-21-2020
This is the IBM Spectrum Protect (TSM) bfs files.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies

2. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc... (2 Replies)
Discussion started by: Marcor
2 Replies

3. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

5. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ... (4 Replies)
Discussion started by: ad23
4 Replies

6. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . . (4 Replies)
Discussion started by: patrick87
4 Replies

7. Shell Programming and Scripting

Search pdfs in command line

Hi, I'm trying to search for a particular phrase in a large number of PDFs in a particular directory. What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears. find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase" ... (2 Replies)
Discussion started by: lost.identity
2 Replies

8. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file. I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far.. Any suggestions please ? Thanks !! (9 Replies)
Discussion started by: Kannan K
9 Replies

9. Shell Programming and Scripting

Extract few content from a huge list of files

I have a huge list of files (about 300,000) which have a pattern like this. .I 1 .U 87049087 .S Am J Emerg .M Allied Health Personnel/*; Electric Countershock/*; .T Refibrillation managed by EMT-Ds: .P ARTICLE. .W Some patients converted from ventricular fibrillation to organized... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

10. Shell Programming and Scripting

Bash script monitor directory and subdirectories for new pdfs

I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors. #!/bin/bash SYSDIR="/var/www/html/Intranet" HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs" FEEDTITLE="Najnoviji dokumenti na... (20 Replies)
Discussion started by: markus1981
20 Replies
bfs(1)								   User Commands							    bfs(1)

NAME
bfs - big file scanner SYNOPSIS
/usr/bin/bfs [-] filename DESCRIPTION
The bfs command is (almost) like ed(1) except that it is read-only and processes much larger files. Files can be up to 1024K bytes and 32K lines, with up to 512 characters, including new-line, per line (255 for 16-bit machines). bfs is usually more efficient than ed(1) for scanning a file, since the file is not copied to a buffer. It is most useful for identifying sections of a large file where csplit(1) can be used to divide it into more manageable pieces for editing. Normally, the size of the file being scanned is printed, as is the size of any file written with the w (write) command. The optional - sup- presses printing of sizes. Input is prompted with * if P and a carriage return are typed, as in ed(1). Prompting can be turned off again by inputting another P and carriage return. Note that messages are given in response to errors if prompting is turned on. All address expressions described under ed(1) are supported. In addition, regular expressions may be surrounded with two symbols besides / and ?: > indicates downward search without wrap-around, and < indicates upward search without wrap-around. There is a slight difference in mark names; that is, only the letters a through z may be used, and all 26 marks are remembered. bfs Commands The e, g, v, k, p, q, w, =, !, and null commands operate as described under ed(1). Commands such as ---, +++-, +++=, -12, and +4p are accepted. Note that 1,10p and 1,10 will both print the first ten lines. The f command only prints the name of the file being scanned; there is no remembered file name. The w command is independent of output diversion, truncation, or crunching (see the xo, xt, and xc com- mands, below). The following additional commands are available: xf file Further commands are taken from the named file. When an end-of-file is reached, an interrupt signal is received or an error occurs, reading resumes with the file containing the xf. The xf commands may be nested to a depth of 10. xn List the marks currently in use (marks are set by the k command). xo [file] Further output from the p and null commands is diverted to the named file, which, if necessary, is created mode 666 (readable and writable by everyone), unless your umask setting (see umask(1)) dictates otherwise. If file is missing, output is diverted to the stan- dard output. Note that each diversion causes truncation or creation of the file. : label This positions a label in a command file. The label is terminated by new-line, and blanks between the : (colon) and the start of the label are ignored. This command may also be used to insert comments into a command file, since labels need not be referenced. ( . , . )xb/regular expression/label A jump (either upward or downward) is made to label if the command succeeds. It fails under any of the following conditions: 1. Either address is not between 1 and $. 2. The second address is less than the first. 3. The regular expression does not match at least one line in the specified range, including the first and last lines. On success, . (dot) is set to the line matched and a jump is made to label. This command is the only one that does not issue an error message on bad addresses, so it may be used to test whether addresses are bad before other commands are executed. Note that the com- mand, xb/^/ label, is an unconditional jump. The xb command is allowed only if it is read from someplace other than a terminal. If it is read from a pipe, only a downward jump is possible. xt number Output from the p and null commands is truncated to, at most, number characters. The initial number is 255. xv[digit][spaces][value] The variable name is the specified digit following the xv. The commands xv5100 or xv5 100 both assign the value 100 to the variable 5. The command xv61,100p assigns the value 1,100p to the variable 6. To reference a variable, put a % in front of the variable name. For example, using the above assignments for variables 5 and 6: 1,%5p 1,%5 %6 will all print the first 100 lines. g/%5/p would globally search for the characters 100 and print each line containing a match. To escape the special meaning of %, a must pre- cede it. g/".*\%[cds]/p could be used to match and list %c, %d, or %s formats (for example, "printf"-like statements) of characters, decimal integers, or strings. Another feature of the xv command is that the first line of output from a UNIX system command can be stored into a variable. The only requirement is that the first character of value be an !. For example: .w junk xv5!cat junk !rm junk !echo "%5" xv6!expr %6 + 1 would put the current line into variable 35, print it, and increment the variable 36 by one. To escape the special meaning of ! as the first character of value, precede it with a . xv7!date stores the value !date into variable 7. xbz label xbn label These two commands will test the last saved return code from the execution of a UNIX system command (!command) or nonzero value, respectively, to the specified label. The two examples below both search for the next five lines containing the string size: Example 1: xv55 : l /size/ xv5!expr %5 - 1 !if 0%5 != 0 exit 2 xbn l Example 2: xv45 : l /size/ xv4!expr %4 - 1 !if 0%4 = 0 exit 2 xbz l xc [switch] If switch is 1, output from the p and null commands is crunched; if switch is 0, it is not. Without an argument, xc reverses switch. Initially, switch is set for no crunching. Crunched output has strings of tabs and blanks reduced to one blank and blank lines sup- pressed. OPERANDS
The following operand is supported: filename Any file up to 1024K bytes and 32K lines, with up to 512 characters, including new-line, per line (255 for 16-bit machines). filename can be a section of a larger file which has been divided into more manageable sections for editing by the use of csplit(1). EXIT STATUS
The following exit values are returned: 0 Successful completion without any file or command errors. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ SEE ALSO
csplit(1), ed(1), umask(1), attributes(5) DIAGNOSTICS
Message is ? for errors in commands, if prompting is turned off. Self-explanatory error messages are displayed when prompting is on. SunOS 5.11 20 May 1996 bfs(1)
All times are GMT -4. The time now is 03:57 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy