04-26-2013
Find common numbers from two very large files using awk or the like
I've got two files that each contain a 16-digit number in positions 1-16. The first file has 63,120 entries all sorted numerically. The second file has 142,479 entries, also sorted numerically.
I want to read through each file and output the entries that appear in both. So far I've had no success with comm -12, nor with grep -f. I've had some success wtih sdiff, but it's not entirely accurate as it's missing some matches.
What I need is a script that loops through one file to see if an entry corresponds to the other file, but this is beyond my skills.
I am using sh on hp-ux 11.31, so I can't use nawk or gawk, etc.
Thank you for your assistance.
Last edited by Scottie1954; 04-26-2013 at 05:52 PM..
9 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I have two files:
abc :
50040
123123
31703
cde:
104
97
50040
123123
31703
36609
50534 (3 Replies)
Discussion started by: jingi1234
3 Replies
2. Shell Programming and Scripting
Hi,
I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file.
Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies
3. UNIX for Dummies Questions & Answers
I am looking for a file with 'MCR0000000716214' in it. I tried the following command:
grep MCR0000000716214 *
The problem is that the folder I am searching in has over 87000 files and I am getting the following:
bash: /bin/grep: Arg list too long
Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies
4. Shell Programming and Scripting
Hi! I have a large set of pairs of text files (each pair in their own subdirectory) and each pair shares head/tail (a couple of first and last lines) but differs in the middle part. I need to delete the heads/tails and keep only the middle portions in which they differ. The lengths of heads/tails... (1 Reply)
Discussion started by: dobryden
1 Replies
5. UNIX for Advanced & Expert Users
Hi ,
I have a text file in the format
DB2:
DB2:
WB:
WB:
WB:
WB:
and a second text file of the format
Time=00:00:00.473
Time=00:00:00.436
Time=00:00:00.016
Time=00:00:00.027
Time=00:00:00.471
Time=00:00:00.436
the last string in both the text files is of the... (4 Replies)
Discussion started by: kanthrajgowda
4 Replies
6. Shell Programming and Scripting
I have 3 files which are tab delimited and have numbers in it.
file 1
1
2
3
4
5
6
7
File 2
3
5
7
8
File 3
1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies
7. Shell Programming and Scripting
Hi
I have 2 files with following data
First file,
sp|Q676U5|A16L1_HUMAN,
Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2,
Maximum coiled-coil residue probability: 0.657 in position 163.
Maximum dimeric residue probability: 0.288 in position 163.
... (1 Reply)
Discussion started by: manigrover
1 Replies
8. Shell Programming and Scripting
Hi All,
I have two files like below:
File1
MYFILE_28012012_1112.txt|4
MYFILE_28012012_1113.txt|51
MYFILE_28012012_1114.txt|57
MYFILE_28012012_1115.txt|57
MYFILE_28012012_1116.txt|57
MYFILE_28012012_1117.txt|57
File2
MYFILE_28012012_1110.txt|57
MYFILE_28012012_1111.txt|57... (2 Replies)
Discussion started by: angshuman
2 Replies
9. Shell Programming and Scripting
I have two directories
Dir 1
/home/sid/release1
Dir 2
/home/sid/release2
I want to find the common files between the two directories
Dir 1 files
/home/sid/release1>ls -lrt
total 16
-rw-r--r-- 1 sid cool 0 Jun 19 12:53 File123
-rw-r--r-- 1 sid cool 0 Jun 19 12:53... (5 Replies)
Discussion started by: sidnow
5 Replies
LEARN ABOUT SUNOS
expireover
EXPIREOVER(8) System Manager's Manual EXPIREOVER(8)
NAME
expireover - Expire entries from the news overview database
SYNOPSIS
expireover [ -a ] [ -D overviewdir ] [ -f file ] [ -n ] [ -O overview.fmt ] [ -s ] [ -v ] [ -z ] [ file... ]
DESCRIPTION
Expireover expires entries from the news overview database. It reads a list of pathnames (relative to the spool directory,
/var/spool/news), from the specified files or standard input if none are specified. (A file name of ``-'' may be used to specify the stan-
dard input.) It then removes any mention of those articles from the appropriate overview database.
OPTIONS
-z If the ``-z'' flag is used, then the input is assumed to be sorted such that all entries for a newsgroup appear together so that it
can be purged at once. This flag can be useful when used with the sorted output of expire(8)'s ``-z'' flag.
-s If the ``-s'' flag is used, then expireover will read the spool directory for all groups mentioned in the active(5) file, and remove
the overview entries of any articles that do not appear in the directory.
-f To specify an alternate file, use the ``-f'' flag; a name of ``-'' is taken to mean the standard input.
-a The ``-a'' flag reads the spool directory and adds any missing overview entries. It will create files if necessary. This can be
used to initialize a database, or to sync up a overview database that may be lacking articles due to a crash. Overchan should be
running, to ensure that any incoming articles get included. Using this flag implies the ``-s'' flag; the ``-f'' flag may be used to
add only a subset of the newsgroups.
-v To see a list of the entries that would be added or deleted, use the ``-v'' flag.
-n To perform no real updates, use the ``-n'' flag.
-D The ``-D'' flag can be used to specify where the databases are stored. The default directory is /var/spool/news/over.view.
-O The ``-O'' flag may be used to specify an alternate location for the overview.fmt(5) file; this is normally only useful for debug-
ging.
HISTORY
Written by Rob Robertson <rob@violet.berkeley.edu> and Rich $alz <rsalz@uunet.uu.net> (with help from Dave Lawrence <tale@uunet.uu.net>)
for InterNetNews. This is revision 1.8, dated 1996/10/29.
SEE ALSO
expire(8), overview.fmt(5).
EXPIREOVER(8)