Sponsored Content
Top Forums UNIX for Dummies Questions & Answers how to select lines from one file based on another file Post 302632323 by drl on Sunday 29th of April 2012 08:01:37 PM
Old 04-29-2012
Hi.

Here is a more complex approach. The augmented data files include duplicates and lines that are missing in the other file:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate join on common field, preserve ordering.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C nl sort join sed

pl " Input data files data[12]:"
head data[12]

pl " Results of adding line numbers and sorting on field 2, remove duplicates:"
# To allow duplicates, remove the "-u" option.
nl data1 | sort -u -k2,2 > f1
nl data2 | sort -u -k2,2 > f2
head f[12]

pl " Result of join on field 2, sort on line numbers, remove line numbers:"
join -j 2 -o 2.1 1.2 1.3 1.4 1.5  f1 f2 |
sort -k1,1n |
sed 's/[^ ]* //'

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
nl (GNU coreutils) 6.10
sort (GNU coreutils) 6.10
join (GNU coreutils) 6.10
sed GNU sed version 4.1.5

-----
 Input data files data[12]:
==> data1 <==
1-30 1 2 3
5-60 4 5 6
1-20 7 8 9
4-40 4 4 4

==> data2 <==
1-20 1 20
1-30 1 30
5-60 5 60
5-60 5 60
7-70 7 70
7-70 7 70

-----
 Results of adding line numbers and sorting on field 2, remove duplicates:
==> f1 <==
     3	1-20 7 8 9
     1	1-30 1 2 3
     4	4-40 4 4 4
     2	5-60 4 5 6

==> f2 <==
     1	1-20 1 20
     2	1-30 1 30
     3	5-60 5 60
     5	7-70 7 70

-----
 Result of join on field 2, sort on line numbers, remove line numbers:
1-20 7 8 9
1-30 1 2 3
5-60 4 5 6

See man pages for details ... cheers, drl
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Select a portion of file based on query

Hi friends :) I am having a small problem and ur help is needed... I have a long file from which i want to select only some portions after filtering (grep). My file looks like : header xxyy lmno xxyy wxyz footer header abcd xy pqrs footer . . (14 Replies)
Discussion started by: vanand420
14 Replies

2. Shell Programming and Scripting

select records from one file based on a second file

Hi all: I have two files: file1: 74 DS 9871 199009871 1 1990 4 1 165200 Sc pr de te sa ox 1.0 1.0 13.0000 35.7560 5.950 3.0 3.0 13.0100 35.7550 5.970 ** 74 DS 99004 74DS99004 6738 1990 4 1 165200 Eb pr de te sa ox 1.0 1.0 13.0000 ... (7 Replies)
Discussion started by: rleal
7 Replies

3. Shell Programming and Scripting

Select some lines from a txt file and create a new file with awk

Hi there, I have a text file with several colums separated by "|;#" I need to search the file extracting all columns starting with the value of "1" or "2" saving in a separate file just the first 7 columns of each row maching the criteria, with replacement of the saparators in the nearly created... (4 Replies)
Discussion started by: capnino
4 Replies

4. Shell Programming and Scripting

Select lines in which column have value greater than some percent of total file lines

i have a file in following format 1 32 3 4 6 4 4 45 1 45 4 61 54 66 4 5 65 51 56 65 1 12 32 85 now here the total number of lines are 8(they vary each time) Now i want to select only those lines in which the values... (6 Replies)
Discussion started by: vaibhavkorde
6 Replies

5. UNIX for Dummies Questions & Answers

How to randomly select lines from a text file

I have a text file with 1000 lines, I want to randomly select 200 lines from it and print them as output. How do I go about doing that? Thanks! (7 Replies)
Discussion started by: evelibertine
7 Replies

6. Shell Programming and Scripting

Short program to select lines from a file based on a second file

Hello, I use UBUNTU 12.04. I want to write a short program using awk to select some lines in a file based on a second file. My first file has this format with about 400,000 lines and 47 fields: SNP1 1 12.1 SNP2 1 13.2 SNP3 1 45.2 SNP4 1 23.4 My second file has this format: SNP2 SNP3... (1 Reply)
Discussion started by: Homa
1 Replies

7. Shell Programming and Scripting

Select lines from a file based on a criteria

Hi I need to select lines from a txt file, I have got a line starting with ZMIO:MSISDN= and after a few line I have another line starting with 'MOBILE STATION ISDN NUMBER' and another one starting with 'VLR-ADDRESS' I need to copy these three lines as three different columns in a separate... (3 Replies)
Discussion started by: Tlcm sam
3 Replies

8. Shell Programming and Scripting

Script to select the rows from the feed file based on the input value provided

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (3 Replies)
Discussion started by: punpun66
3 Replies

9. UNIX for Dummies Questions & Answers

Select last update data based on file name

Hi All, I need to remove all files except the most update data based on date on filename Input data_AIDS_20150312.txt data_AIDS_20150311.txt data_AIDS_20150411.txt data_AIDS_20140312.txt the most updated data is data_AIDS_20150411.txt, so I'll remove other files. My expected output... (3 Replies)
Discussion started by: radius
3 Replies

10. UNIX for Dummies Questions & Answers

Select lines based on character length

Hi, I've got a file like this: 22 22:35645163:T:<CN0>:0 0 35645163 T <CN0> 22 rs140738445:20902439:TTTTTTTG:T 0 20902439 T TTTTTTTG 22 rs149602065:40537763:TTTTTTG:T 0 40537763 T TTTTTTG 22 rs71670155:50538408:TTTTTTG:T 0 50538408 T TTTTTTG... (3 Replies)
Discussion started by: zajtat
3 Replies
COMM(1) 							   User Commands							   COMM(1)

NAME
comm - compare two sorted files line by line SYNOPSIS
comm [OPTION]... FILE1 FILE2 DESCRIPTION
Compare sorted files FILE1 and FILE2 line by line. With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. -1 suppress column 1 (lines unique to FILE1) -2 suppress column 2 (lines unique to FILE2) -3 suppress column 3 (lines that appear in both files) --check-order check that the input is correctly sorted, even if all input lines are pairable --nocheck-order do not check that the input is correctly sorted --output-delimiter=STR separate columns with STR --help display this help and exit --version output version information and exit Note, comparisons honor the rules specified by `LC_COLLATE'. EXAMPLES
comm -12 file1 file2 Print only lines present in both file1 and file2. comm -3 file1 file2 Print lines in file1 not in file2, and vice versa. AUTHOR
Written by Richard M. Stallman and David MacKenzie. REPORTING BUGS
Report comm bugs to bug-coreutils@gnu.org GNU coreutils home page: <http://www.gnu.org/software/coreutils/> General help using GNU software: <http://www.gnu.org/gethelp/> Report comm translation bugs to <http://translationproject.org/team/> COPYRIGHT
Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
join(1), uniq(1) The full documentation for comm is maintained as a Texinfo manual. If the info and comm programs are properly installed at your site, the command info coreutils 'comm invocation' should give you access to the complete manual. GNU coreutils 8.12.197-032bb September 2011 COMM(1)
All times are GMT -4. The time now is 06:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy