Find common numbers from two very large files using awk or the like Post: 302799579

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Get un common numbers from two files

Hi, I have two files: abc : 50040 123123 31703 cde: 104 97 50040 123123 31703 36609 50534

2. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of...

3. UNIX for Dummies Questions & Answers

Grep alternative to handle large numbers of files

I am looking for a file with 'MCR0000000716214' in it. I tried the following command: grep MCR0000000716214 * The problem is that the folder I am searching in has over 87000 files and I am getting the following: bash: /bin/grep: Arg list too long Is there any command I can use that can...

4. Shell Programming and Scripting

Drop common lines at head/tail of a large set of files

Hi! I have a large set of pairs of text files (each pair in their own subdirectory) and each pair shares head/tail (a couple of first and last lines) but differs in the middle part. I need to delete the heads/tails and keep only the middle portions in which they differ. The lengths of heads/tails...

5. UNIX for Advanced & Expert Users

Find common Strings in two large files

Hi , I have a text file in the format DB2: DB2: WB: WB: WB: WB: and a second text file of the format Time=00:00:00.473 Time=00:00:00.436 Time=00:00:00.016 Time=00:00:00.027 Time=00:00:00.471 Time=00:00:00.436 the last string in both the text files is of the...

6. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1

7. Shell Programming and Scripting

Find common numbers and print yes or no

Hi I have 2 files with following data First file, sp|Q676U5|A16L1_HUMAN, Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2, Maximum coiled-coil residue probability: 0.657 in position 163. Maximum dimeric residue probability: 0.288 in position 163. ...

8. Shell Programming and Scripting

Find Common Values Across Two Files

9. Shell Programming and Scripting

Find common files between two directories

I have two directories Dir 1 /home/sid/release1 Dir 2 /home/sid/release2 I want to find the common files between the two directories Dir 1 files /home/sid/release1>ls -lrt total 16 -rw-r--r-- 1 sid cool 0 Jun 19 12:53 File123 -rw-r--r-- 1 sid cool 0 Jun 19 12:53...

LEARN ABOUT ULTRIX

sort

sort(1) General Commands Manual sort(1)

Name
sort - sort file data

Syntax
sort [options] [-k keydef] [+pos1[-pos2]] [file...]

Description
The command sorts lines of all the named files together and writes the result on the standard output. The name `-' means the standard
input. If no input files are named, the standard input is sorted.

Options
The default sort key is an entire line. Default ordering is lexicographic by bytes in machine collating sequence. The ordering is
affected globally by the following options, one or more of which may appear.

-b Ignores leading blanks (spaces and tabs) in field comparisons.

-d Sorts data according to dictionary ordering: letters, digits, and blanks only.

-f Folds uppercase to lowercase while sorting.

-i Ignore characters outside the ASCII range 040-0176 in nonnumeric comparisons.

-k keydef The keydefargument is a key field definition. The format is field_start, [field_end] [type], where field_start and field_end
are the definition of the restricted search key, and type is a modifier from the option list [bdfinr]. These modifiers have the
functionality, for this key only, that their command line counter-parts have for the entire record.

-n Sorts fields with numbers numerically. An initial numeric string, consisting of optional blanks, optional minus sign, and zero
or more digits with optional decimal point, is sorted by arithmetic value. (Note that -0 is taken to be equal to 0.) Option n
implies option b.

-r Reverses the sense of comparisons.

-tx Uses specified character as field separator.

The notation +pos1 -pos2 restricts a sort key to a field beginning at pos1 and ending just before pos2. Pos1 and pos2 each have the form
m.n, optionally followed by one or more of the options bdfinr, where m tells a number of fields to skip from the beginning of the line and
n tells a number of characters to skip further. If any options are present they override all the global ordering options for this key. If
the b option is in effect n is counted from the first nonblank in the field; b is attached independently to pos2. A missing .n means .0; a
missing -pos2 means the end of the line. Under the -tx option, fields are strings separated by x; otherwise fields are nonempty nonblank
strings separated by blanks.

When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal
are ordered with all bytes significant.

These are additional options:

-c Checks sorting order and displays output only if out of order.

-m Merges previously sorted data.

-o name Uses specified file as output file. This file may be the same as one of the inputs.

-T dir Uses specified directory to build temporary files.

-u Suppresses all duplicate entries. Ignored bytes and bytes outside keys do not participate in this comparison.

Examples
Print in alphabetical order all the unique spellings in a list of words. Capitalized words differ from uncapitalized.
sort -u +0f +0 list

Print the password file, sorted by user id number (the 3rd colon-separated field).
sort -t: +2n /etc/passwd

Print the first instance of each month in an already sorted file of (month day) entries. The options -um with just one input file make the
choice of a unique representative from a set of equal lines predictable.
sort -um +0 -1 dates

Restrictions
Very long lines are silently truncated.

Diagnostics
Comments and exits with nonzero status for various trouble conditions and for disorder discovered under option c.

Files
/usr/tmp/stm*, /tmp/* first and second tries for temporary files

See Also
comm(1), join(1), rev(1), uniq(1)

sort(1)