02-18-2008
omitting lines from file A that are in file B
I've got file A with (say) 1M lines in it ... ascii text, space delimited ...
I've got file B with (say) 10M lines in it ... same structure.
I want to remove any lines from A that appear (identically) in B and print the remaining (say) 900K lines. (And I want to do it in zero time of course!)
Best I've come up with so far is somehow marking the lines in A, then doing a sort and applying an awk script to the result so that the marked lines are only printed if the following (or previous) line isn't "identical" except for the mark.
But after 1000 years of shell programming I've GOT to believe I'm missing an easier/faster solution ... I'm using bash and cygwin tools - and compiling is not an option.
ADVthanksANCE for your help!
=Gneen
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I need help to parse a file where there are many records, all of which are consistently separated by lines containing “^=============” and "^ End of Report".
Example:
=============
1
2
3
4
End of record
=============
1
3
4
End of record
Etc....
I only need specific lines... (5 Replies)
Discussion started by: jouuu
5 Replies
2. Shell Programming and Scripting
Hi,
I have two files. 1st file has 1 column (huge file containing ~19200000 lines) and 2nd file has 2 columns (small file containing ~6000 lines).
#################################
huge_file.txt
a
a
ab
b
##################################
small_file.txt
a 1.5
b 2.5
ab ... (4 Replies)
Discussion started by: AshwaniSharma09
4 Replies
3. Shell Programming and Scripting
hi,
i have two files.
file1.sh
echo "unix"
echo "linux"
file2.sh
echo "unix linux forums"
now the output i need is
$./file2.sh
unix linux forums (3 Replies)
Discussion started by: snreddy_gopu
3 Replies
4. Shell Programming and Scripting
I have a configuration file that contains hundreds of these chunks. Each "chunk" is the section that begins with "define service {" and ends with "}".
define service {
check_command check_proc!java
hostgroup_name
service_description ... (5 Replies)
Discussion started by: SkySmart
5 Replies
5. Shell Programming and Scripting
I have a file that looks like this:
cat includes
CORP-CRASHTEST-BU
e:\crashplan\
CORP-TEST
/usr/openv/java
/usr/openv/logs
/usr/openv/man
CORP-LABS_TEST
/usr/openv/java
/usr/openv/logs
/usr/openv/man
What I want to do is make three new files with just those selections. So the three... (4 Replies)
Discussion started by: newbie2010
4 Replies
6. Shell Programming and Scripting
I really can't figure this one out.
I have 2 files, one file is a list of hostnames and the other is a list of their corresponding IPs:
fileA:
example.com
another.org
thirdie.net
fileB:
1.1.1.1
2.2.2.2
3.3.3.3
I want to create a fileC that looks like:
example.com 1.1.1.1... (2 Replies)
Discussion started by: zstar
2 Replies
7. UNIX for Dummies Questions & Answers
Hello All,
this is my first post so I don't know if I am doing this right.
I would like to append entries from a series of strings (contained in a text file) consecutively at the end of specifically labeled lines in another file.
As an example:
- the file that contains the values to be... (3 Replies)
Discussion started by: gus74
3 Replies
8. Shell Programming and Scripting
I have a bunch of file numbers in the file 'test':
I'm trying the above command to change all the instances of "H" to "Na+" in the file testsds.pdb at the line numbers indicated in the file 'test'. I've tried the following and various similar alternatives but nothing is working:
cat test |... (3 Replies)
Discussion started by: crunchgargoyle
3 Replies
9. Shell Programming and Scripting
HI,
I have 2 text files. file1 and file2.
file1.txt (There are no duplicates in this file)
1234
3232
4343
3435
6564
6767
1213
file2.txt
1234,wq,wewe,qwqw
1234,as,dfdf,dfdf
4343,asas,sdds,dsds
6767,asas,fdfd,fdffd
I need to search each number in file1.txt in file2.txt's 1st... (6 Replies)
Discussion started by: Little
6 Replies
10. Shell Programming and Scripting
I have a file where every line includes four expressions with a caret in the middle (plus some other "words" or fields, always separated by spaces). I would like to extract from this file, all those lines such that each of the four expressions containing a caret appears in at least four different... (9 Replies)
Discussion started by: uncleMonty
9 Replies
LEARN ABOUT OPENSOLARIS
fmt
fmt(1) User Commands fmt(1)
NAME
fmt - simple text formatters
SYNOPSIS
fmt [-cs] [-w width | -width] [inputfile]...
DESCRIPTION
fmt is a simple text formatter that fills and joins lines to produce output lines of (up to) the number of characters specified in the -w
width option. The default width is 72. fmt concatenates the inputfiles listed as arguments. If none are given, fmt formats text from the
standard input.
Blank lines are preserved in the output, as is the spacing between words. fmt does not fill nor split lines beginning with a `.' (dot), for
compatibility with nroff(1). Nor does it fill or split a set of contiguous non-blank lines which is determined to be a mail header, the
first line of which must begin with "From".
Indentation is preserved in the output, and input lines with differing indentation are not joined (unless -c is used).
fmt can also be used as an in-line text filter for vi(1). The vi command:
!}fmt
reformats the text between the cursor location and the end of the paragraph.
OPTIONS
-c Crown margin mode. Preserve the indentation of the first two lines within a paragraph, and align the left margin of
each subsequent line with that of the second line. This is useful for tagged paragraphs.
-s Split lines only. Do not join short lines to form longer ones. This prevents sample lines of code, and other such for-
matted text, from being unduly combined.
-w width | -width Fill output lines to up to width columns.
OPERANDS
inputfile Input file.
ENVIRONMENT VARIABLES
See environ(5) for a description of the LC_CTYPE environment variable that affects the execution of fmt.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWcsu |
+-----------------------------+-----------------------------+
SEE ALSO
nroff(1), vi(1), attributes(5), environ(5)
NOTES
The -width option is acceptable for BSD compatibility, but it may go away in future releases.
SunOS 5.11 9 May 1997 fmt(1)