Sponsored Content
Top Forums Shell Programming and Scripting Removing lines from large files.. quickest method? Post 302281619 by frustrated1 on Thursday 29th of January 2009 07:33:21 AM
Old 01-29-2009
Removing lines from large files.. quickest method?

Hi

I have some files that contain be anything up to 100k lines - eg. file100k
I have another file called file5k and I need to produce filec which will contain everything in file100k minus what matches in file 5k..

ie.
File100k contains
1FP
2FP
3FP

File5k contains
2FP

I would normally do a grep pattern search with a for loop or something so I would output entire contents of file100k in to filec except anything found in file5k..

Problem is that with 100k entries to search - 5 thousand times.. its takes some time with normal unix tools (can take 10-15 mins for one of these 100k files) and I am wondering is there a way to do this faster - maybe with a perl command or something..

Hope I am making sense... can you help out??
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Drop common lines at head/tail of a large set of files

Hi! I have a large set of pairs of text files (each pair in their own subdirectory) and each pair shares head/tail (a couple of first and last lines) but differs in the middle part. I need to delete the heads/tails and keep only the middle portions in which they differ. The lengths of heads/tails... (1 Reply)
Discussion started by: dobryden
1 Replies

2. Shell Programming and Scripting

PERL: removing blank lines from multiple files

Hi Guru's , I have a whole bunch of files in /var/tmp that i need to strip any blank lines from, so ive written the following script to identify the lines (which works perfectly).. but i wanted to know, how can I actually strip the identified lines from the actual source files ?? my... (11 Replies)
Discussion started by: hcclnoodles
11 Replies

3. Shell Programming and Scripting

How to replace a text containing new lines using sed or any other method?

Hi, i want to replace "Hi How are You when did you go to delhi" to "Hi How are you when did you come from delhi" in a file. Any idea how to do it? (2 Replies)
Discussion started by: abhitanshu
2 Replies

4. UNIX for Dummies Questions & Answers

Removing Lines Shared by Multiple Files

Hey everyone, I have a question about comparing two files. I have two lists of files. The first list, todo.csv, lists a series of compounds my supervisor wants me to perform calculations on. The second list, done.csv, lists a series of compounds that I have already performed calculations on.... (2 Replies)
Discussion started by: Stuart Ness
2 Replies

5. Shell Programming and Scripting

quickest way to get the total number of lines in a file

i have a file that's about 2GB, i have to get the total number of lines in this file every 10 minutes. the interval is not an issue. i just need the proper, most efficient way to do this. any ideas? i got the following from another thread on this site, but: awk 'int(100*rand())%5<1'... (12 Replies)
Discussion started by: SkySmart
12 Replies

6. Shell Programming and Scripting

Removing specific lines from script files.

Hello, Activity to perform: 1. Find all of the "*.tmp" files in a given user directory 2. Determine which ones have "find" in them. 3. Replace the "find sequence" of commands with a "list set" of commands. Example: Original file: -------------- define lastn1 = "A" define... (7 Replies)
Discussion started by: manishdivs
7 Replies

7. Shell Programming and Scripting

Right method for removing a file

if then `rm /52/bip_log_1.txt` echo "file bip_eg.txt removed" fi I am using above code to remove a temorary log file if then `rm /52/bip_log_1.txt` echo "file bip_eg.txt removed" fi The file - e is showing error. WHY? (7 Replies)
Discussion started by: rafa_fed2
7 Replies

8. UNIX for Dummies Questions & Answers

Removing PATTERN from txt without removing lines and general text formatting

Hi Everybody! First post! Totally noobie. I'm using the terminal to read a poorly formatted book. The text file contains, in the middle of paragraphs, hyphenation to split words that are supposed to be on multiple pages. It looks ve -- ry much like this. I was hoping to use grep -v " -- "... (5 Replies)
Discussion started by: AxeHandle
5 Replies

9. Programming

Best Method For Query Content In Large JSON Files

I wanted to know what is the best way to query json formatted files for content? Ex. Data https://usn.ubuntu.com/usn-db/database-all.json.bz2 When looking at keys as in: import json json_data = json.load(open('database-all.json')) for keys in json_data.iterkeys(): print 'Keys--> {}... (0 Replies)
Discussion started by: metallica1973
0 Replies

10. Shell Programming and Scripting

Removing large number of temp files

Hi All, I am having a situation now to delete a huge number of temp files created during run times approx. 16700+ files. We have never imagined that we will get this this much big list of files during run time. It worked fine for lesser no of files in the list. But when list is huge we are... (7 Replies)
Discussion started by: mad man
7 Replies
bdiff(1)                                                           User Commands                                                          bdiff(1)

NAME
bdiff - big diff SYNOPSIS
bdiff filename1 filename2 [n] [-s] DESCRIPTION
bdiff is used in a manner analogous to diff to find which lines in filename1 and filename2 must be changed to bring the files into agree- ment. Its purpose is to allow processing of files too large for diff. If filename1 (filename2) is -, the standard input is read. bdiff ignores lines common to the beginning of both files, splits the remainder of each file into n-line segments, and invokes diff on cor- responding segments. If both optional arguments are specified, they must appear in the order indicated above. The output of bdiff is exactly that of diff, with line numbers adjusted to account for the segmenting of the files (that is, to make it look as if the files had been processed whole). Note: Because of the segmenting of the files, bdiff does not necessarily find a smallest sufficient set of file differences. OPTIONS
n The number of line segments. The value of n is 3500 by default. If the optional third argument is given and it is numeric, it is used as the value for n. This is useful in those cases in which 3500-line segments are too large for diff, causing it to fail. -s Specifies that no diagnostics are to be printed by bdiff (silent option). Note: However, this does not suppress possible diagnos- tic messages from diff, which bdiff calls. USAGE
See largefile(5) for the description of the behavior of bdiff when encountering files greater than or equal to 2 Gbyte ( 2**31 bytes). FILES
/tmp/bd????? ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ SEE ALSO
diff(1), attributes(5), largefile(5) DIAGNOSTICS
Use help for explanations. SunOS 5.10 14 Sep 1992 bdiff(1)
All times are GMT -4. The time now is 07:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy