"Find common numbers from two very large files using awk or the like"

Post #302799623 by alister on Friday 26th of April 2013 08:47:13 PM

Quote:
Originally Posted by hanson44
You're right. In this case, the numeric sort has no effect for good or ill, but is superfluous, should not be used
To make sure I made my point, please allow me to reiterate: In every case, it is a mistake to feed a numerically sorted file to a tool which only understands lexicographic sorting. In some cases, such as this one, it may not hurt, but it is never the right thing to do.

Tools which require lexicographic sorting include comm, join, and uniq.

join requires special attention because by default it requires sort -b, but if join's -t option is used, sort's -b must not be.

Quote:
Originally Posted by hanson44
Code:
$ comm -1 -2 file1 file2

But the OP said there was some problem with this. Smilie
And that piqued my curiosity, because it should work if the actual data does not deviate from the form of the sample data provided in post #4.

Regards,
Alister

Last edited by alister; 04-26-2013 at 11:17 PM..
 
Test Your Knowledge in Computers #414
Difficulty: Easy
JavaScript is a high-level, interpreted scripting language that conforms to the ECMAScript specification.
True or False?

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Get un common numbers from two files

Hi, I have two files: abc : 50040 123123 31703 cde: 104 97 50040 123123 31703 36609 50534 (3 Replies)
Discussion started by: jingi1234
3 Replies

2. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies

3. Shell Programming and Scripting

Files common in two sets ??? How to find ??

Suppose we have 2 set of files set 1 set 2 ------ ------ abc hgb def ppp mgh vvv nmk sdf hgb ... (1 Reply)
Discussion started by: skyineyes
1 Replies

4. UNIX for Dummies Questions & Answers

Grep alternative to handle large numbers of files

I am looking for a file with 'MCR0000000716214' in it. I tried the following command: grep MCR0000000716214 * The problem is that the folder I am searching in has over 87000 files and I am getting the following: bash: /bin/grep: Arg list too long Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies

5. UNIX for Dummies Questions & Answers

how to find common words and take them out from two files

Hi, everyone, Let's say, we have xxx.txt A 1 2 3 4 5 C 1 2 3 4 5 E 1 2 3 4 5 yyy.txt A 1 2 3 4 5 B 1 2 3 4 5 C 1 2 3 4 5 D 1 2 3 4 5 E 1 2 3 4 5 First I match the first column I find intersection (A,C, E), then I want to take those lines with ACE out from yyy.txt, like A 1... (11 Replies)
Discussion started by: kaixinsjtu
11 Replies

6. Shell Programming and Scripting

Drop common lines at head/tail of a large set of files

Hi! I have a large set of pairs of text files (each pair in their own subdirectory) and each pair shares head/tail (a couple of first and last lines) but differs in the middle part. I need to delete the heads/tails and keep only the middle portions in which they differ. The lengths of heads/tails... (1 Reply)
Discussion started by: dobryden
1 Replies

7. UNIX for Advanced & Expert Users

Find common Strings in two large files

Hi , I have a text file in the format DB2: DB2: WB: WB: WB: WB: and a second text file of the format Time=00:00:00.473 Time=00:00:00.436 Time=00:00:00.016 Time=00:00:00.027 Time=00:00:00.471 Time=00:00:00.436 the last string in both the text files is of the... (4 Replies)
Discussion started by: kanthrajgowda
4 Replies

8. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies

9. Shell Programming and Scripting

Find common numbers and print yes or no

Hi I have 2 files with following data First file, sp|Q676U5|A16L1_HUMAN, Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2, Maximum coiled-coil residue probability: 0.657 in position 163. Maximum dimeric residue probability: 0.288 in position 163. ... (1 Reply)
Discussion started by: manigrover
1 Replies
slamrg.f(3)							      LAPACK							       slamrg.f(3)

NAME
slamrg.f - SYNOPSIS
Functions/Subroutines subroutine slamrg (N1, N2, A, STRD1, STRD2, INDEX) SLAMRG creates a permutation list to merge the entries of two independently sorted sets into a single set sorted in ascending order. Function/Subroutine Documentation subroutine slamrg (integerN1, integerN2, real, dimension( * )A, integerSTRD1, integerSTRD2, integer, dimension( * )INDEX) SLAMRG creates a permutation list to merge the entries of two independently sorted sets into a single set sorted in ascending order. Purpose: SLAMRG will create a permutation list which will merge the elements of A (which is composed of two independently sorted sets) into a single set which is sorted in ascending order. Parameters: N1 N1 is INTEGER N2 N2 is INTEGER These arguements contain the respective lengths of the two sorted lists to be merged. A A is REAL array, dimension (N1+N2) The first N1 elements of A contain a list of numbers which are sorted in either ascending or descending order. Likewise for the final N2 elements. STRD1 STRD1 is INTEGER STRD2 STRD2 is INTEGER These are the strides to be taken through the array A. Allowable strides are 1 and -1. They indicate whether a subset of A is sorted in ascending (STRDx = 1) or descending (STRDx = -1) order. INDEX INDEX is INTEGER array, dimension (N1+N2) On exit this array will contain a permutation such that if B( I ) = A( INDEX( I ) ) for I=1,N1+N2, then B will be sorted in ascending order. Author: Univ. of Tennessee Univ. of California Berkeley Univ. of Colorado Denver NAG Ltd. Date: September 2012 Definition at line 100 of file slamrg.f. Author Generated automatically by Doxygen for LAPACK from the source code. Version 3.4.2 Tue Sep 25 2012 slamrg.f(3)

Featured Tech Videos

All times are GMT -4. The time now is 11:39 AM.
Unix & Linux Forums Content Copyright 1993-2019. All Rights Reserved.
Privacy Policy