Sponsored Content
Top Forums UNIX for Dummies Questions & Answers how to select lines from one file based on another file Post 302632323 by drl on Sunday 29th of April 2012 08:01:37 PM
Old 04-29-2012
Hi.

Here is a more complex approach. The augmented data files include duplicates and lines that are missing in the other file:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate join on common field, preserve ordering.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C nl sort join sed

pl " Input data files data[12]:"
head data[12]

pl " Results of adding line numbers and sorting on field 2, remove duplicates:"
# To allow duplicates, remove the "-u" option.
nl data1 | sort -u -k2,2 > f1
nl data2 | sort -u -k2,2 > f2
head f[12]

pl " Result of join on field 2, sort on line numbers, remove line numbers:"
join -j 2 -o 2.1 1.2 1.3 1.4 1.5  f1 f2 |
sort -k1,1n |
sed 's/[^ ]* //'

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
nl (GNU coreutils) 6.10
sort (GNU coreutils) 6.10
join (GNU coreutils) 6.10
sed GNU sed version 4.1.5

-----
 Input data files data[12]:
==> data1 <==
1-30 1 2 3
5-60 4 5 6
1-20 7 8 9
4-40 4 4 4

==> data2 <==
1-20 1 20
1-30 1 30
5-60 5 60
5-60 5 60
7-70 7 70
7-70 7 70

-----
 Results of adding line numbers and sorting on field 2, remove duplicates:
==> f1 <==
     3	1-20 7 8 9
     1	1-30 1 2 3
     4	4-40 4 4 4
     2	5-60 4 5 6

==> f2 <==
     1	1-20 1 20
     2	1-30 1 30
     3	5-60 5 60
     5	7-70 7 70

-----
 Result of join on field 2, sort on line numbers, remove line numbers:
1-20 7 8 9
1-30 1 2 3
5-60 4 5 6

See man pages for details ... cheers, drl
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Select a portion of file based on query

Hi friends :) I am having a small problem and ur help is needed... I have a long file from which i want to select only some portions after filtering (grep). My file looks like : header xxyy lmno xxyy wxyz footer header abcd xy pqrs footer . . (14 Replies)
Discussion started by: vanand420
14 Replies

2. Shell Programming and Scripting

select records from one file based on a second file

Hi all: I have two files: file1: 74 DS 9871 199009871 1 1990 4 1 165200 Sc pr de te sa ox 1.0 1.0 13.0000 35.7560 5.950 3.0 3.0 13.0100 35.7550 5.970 ** 74 DS 99004 74DS99004 6738 1990 4 1 165200 Eb pr de te sa ox 1.0 1.0 13.0000 ... (7 Replies)
Discussion started by: rleal
7 Replies

3. Shell Programming and Scripting

Select some lines from a txt file and create a new file with awk

Hi there, I have a text file with several colums separated by "|;#" I need to search the file extracting all columns starting with the value of "1" or "2" saving in a separate file just the first 7 columns of each row maching the criteria, with replacement of the saparators in the nearly created... (4 Replies)
Discussion started by: capnino
4 Replies

4. Shell Programming and Scripting

Select lines in which column have value greater than some percent of total file lines

i have a file in following format 1 32 3 4 6 4 4 45 1 45 4 61 54 66 4 5 65 51 56 65 1 12 32 85 now here the total number of lines are 8(they vary each time) Now i want to select only those lines in which the values... (6 Replies)
Discussion started by: vaibhavkorde
6 Replies

5. UNIX for Dummies Questions & Answers

How to randomly select lines from a text file

I have a text file with 1000 lines, I want to randomly select 200 lines from it and print them as output. How do I go about doing that? Thanks! (7 Replies)
Discussion started by: evelibertine
7 Replies

6. Shell Programming and Scripting

Short program to select lines from a file based on a second file

Hello, I use UBUNTU 12.04. I want to write a short program using awk to select some lines in a file based on a second file. My first file has this format with about 400,000 lines and 47 fields: SNP1 1 12.1 SNP2 1 13.2 SNP3 1 45.2 SNP4 1 23.4 My second file has this format: SNP2 SNP3... (1 Reply)
Discussion started by: Homa
1 Replies

7. Shell Programming and Scripting

Select lines from a file based on a criteria

Hi I need to select lines from a txt file, I have got a line starting with ZMIO:MSISDN= and after a few line I have another line starting with 'MOBILE STATION ISDN NUMBER' and another one starting with 'VLR-ADDRESS' I need to copy these three lines as three different columns in a separate... (3 Replies)
Discussion started by: Tlcm sam
3 Replies

8. Shell Programming and Scripting

Script to select the rows from the feed file based on the input value provided

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (3 Replies)
Discussion started by: punpun66
3 Replies

9. UNIX for Dummies Questions & Answers

Select last update data based on file name

Hi All, I need to remove all files except the most update data based on date on filename Input data_AIDS_20150312.txt data_AIDS_20150311.txt data_AIDS_20150411.txt data_AIDS_20140312.txt the most updated data is data_AIDS_20150411.txt, so I'll remove other files. My expected output... (3 Replies)
Discussion started by: radius
3 Replies

10. UNIX for Dummies Questions & Answers

Select lines based on character length

Hi, I've got a file like this: 22 22:35645163:T:<CN0>:0 0 35645163 T <CN0> 22 rs140738445:20902439:TTTTTTTG:T 0 20902439 T TTTTTTTG 22 rs149602065:40537763:TTTTTTG:T 0 40537763 T TTTTTTG 22 rs71670155:50538408:TTTTTTG:T 0 50538408 T TTTTTTG... (3 Replies)
Discussion started by: zajtat
3 Replies
COMM(1) 							   User Commands							   COMM(1)

NAME
comm - compare two sorted files line by line SYNOPSIS
comm [OPTION]... FILE1 FILE2 DESCRIPTION
Compare sorted files FILE1 and FILE2 line by line. When FILE1 or FILE2 (not both) is -, read standard input. With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. -1 suppress column 1 (lines unique to FILE1) -2 suppress column 2 (lines unique to FILE2) -3 suppress column 3 (lines that appear in both files) --check-order check that the input is correctly sorted, even if all input lines are pairable --nocheck-order do not check that the input is correctly sorted --output-delimiter=STR separate columns with STR --total output a summary -z, --zero-terminated line delimiter is NUL, not newline --help display this help and exit --version output version information and exit Note, comparisons honor the rules specified by 'LC_COLLATE'. EXAMPLES
comm -12 file1 file2 Print only lines present in both file1 and file2. comm -3 file1 file2 Print lines in file1 not in file2, and vice versa. AUTHOR
Written by Richard M. Stallman and David MacKenzie. REPORTING BUGS
GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Report comm translation bugs to <http://translationproject.org/team/> COPYRIGHT
Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
join(1), uniq(1) Full documentation at: <http://www.gnu.org/software/coreutils/comm> or available locally via: info '(coreutils) comm invocation' GNU coreutils 8.28 January 2018 COMM(1)
All times are GMT -4. The time now is 09:02 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy