how to select lines from one file based on another file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers how to select lines from one file based on another file
# 1  
Old 04-29-2012
how to select lines from one file based on another file

Hi,

I would like to know how can I select lines of one file based on a common ID column from another file (keeping the order of the second file).

Example of file1:
ID A B C D
1-30 1 2 3
5-60 4 5 6
1-20 7 8 9

Example of file2:
ID chr pos
1-20 1 20
1-30 1 30
5-60 5 60

Desired output ile:
ID A B C D
1-20 7 8 9
1-30 1 2 3
5-60 4 5 6

Thanks.
# 2  
Old 04-29-2012
Code:
awk ' FILENAME=="file1" {arr[$1]=$0; next}
        FILENAME=="file2"  {print arr[$1]} ' file1 file2 > newfile

This only works if all if the "key fields" are identical. If file2 has a key that is not in file1, you get blank line. If file2 is missing a key found in file1, that line never gets printed.
# 3  
Old 04-29-2012
Hi.

Here is a more complex approach. The augmented data files include duplicates and lines that are missing in the other file:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate join on common field, preserve ordering.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C nl sort join sed

pl " Input data files data[12]:"
head data[12]

pl " Results of adding line numbers and sorting on field 2, remove duplicates:"
# To allow duplicates, remove the "-u" option.
nl data1 | sort -u -k2,2 > f1
nl data2 | sort -u -k2,2 > f2
head f[12]

pl " Result of join on field 2, sort on line numbers, remove line numbers:"
join -j 2 -o 2.1 1.2 1.3 1.4 1.5  f1 f2 |
sort -k1,1n |
sed 's/[^ ]* //'

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
nl (GNU coreutils) 6.10
sort (GNU coreutils) 6.10
join (GNU coreutils) 6.10
sed GNU sed version 4.1.5

-----
 Input data files data[12]:
==> data1 <==
1-30 1 2 3
5-60 4 5 6
1-20 7 8 9
4-40 4 4 4

==> data2 <==
1-20 1 20
1-30 1 30
5-60 5 60
5-60 5 60
7-70 7 70
7-70 7 70

-----
 Results of adding line numbers and sorting on field 2, remove duplicates:
==> f1 <==
     3	1-20 7 8 9
     1	1-30 1 2 3
     4	4-40 4 4 4
     2	5-60 4 5 6

==> f2 <==
     1	1-20 1 20
     2	1-30 1 30
     3	5-60 5 60
     5	7-70 7 70

-----
 Result of join on field 2, sort on line numbers, remove line numbers:
1-20 7 8 9
1-30 1 2 3
5-60 4 5 6

See man pages for details ... cheers, drl
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Select lines based on character length

Hi, I've got a file like this: 22 22:35645163:T:<CN0>:0 0 35645163 T <CN0> 22 rs140738445:20902439:TTTTTTTG:T 0 20902439 T TTTTTTTG 22 rs149602065:40537763:TTTTTTG:T 0 40537763 T TTTTTTG 22 rs71670155:50538408:TTTTTTG:T 0 50538408 T TTTTTTG... (3 Replies)
Discussion started by: zajtat
3 Replies

2. UNIX for Dummies Questions & Answers

Select last update data based on file name

Hi All, I need to remove all files except the most update data based on date on filename Input data_AIDS_20150312.txt data_AIDS_20150311.txt data_AIDS_20150411.txt data_AIDS_20140312.txt the most updated data is data_AIDS_20150411.txt, so I'll remove other files. My expected output... (3 Replies)
Discussion started by: radius
3 Replies

3. Shell Programming and Scripting

Script to select the rows from the feed file based on the input value provided

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (3 Replies)
Discussion started by: punpun66
3 Replies

4. Shell Programming and Scripting

Select lines from a file based on a criteria

Hi I need to select lines from a txt file, I have got a line starting with ZMIO:MSISDN= and after a few line I have another line starting with 'MOBILE STATION ISDN NUMBER' and another one starting with 'VLR-ADDRESS' I need to copy these three lines as three different columns in a separate... (3 Replies)
Discussion started by: Tlcm sam
3 Replies

5. Shell Programming and Scripting

Short program to select lines from a file based on a second file

Hello, I use UBUNTU 12.04. I want to write a short program using awk to select some lines in a file based on a second file. My first file has this format with about 400,000 lines and 47 fields: SNP1 1 12.1 SNP2 1 13.2 SNP3 1 45.2 SNP4 1 23.4 My second file has this format: SNP2 SNP3... (1 Reply)
Discussion started by: Homa
1 Replies

6. UNIX for Dummies Questions & Answers

How to randomly select lines from a text file

I have a text file with 1000 lines, I want to randomly select 200 lines from it and print them as output. How do I go about doing that? Thanks! (7 Replies)
Discussion started by: evelibertine
7 Replies

7. Shell Programming and Scripting

Select lines in which column have value greater than some percent of total file lines

i have a file in following format 1 32 3 4 6 4 4 45 1 45 4 61 54 66 4 5 65 51 56 65 1 12 32 85 now here the total number of lines are 8(they vary each time) Now i want to select only those lines in which the values... (6 Replies)
Discussion started by: vaibhavkorde
6 Replies

8. Shell Programming and Scripting

Select some lines from a txt file and create a new file with awk

Hi there, I have a text file with several colums separated by "|;#" I need to search the file extracting all columns starting with the value of "1" or "2" saving in a separate file just the first 7 columns of each row maching the criteria, with replacement of the saparators in the nearly created... (4 Replies)
Discussion started by: capnino
4 Replies

9. Shell Programming and Scripting

select records from one file based on a second file

Hi all: I have two files: file1: 74 DS 9871 199009871 1 1990 4 1 165200 Sc pr de te sa ox 1.0 1.0 13.0000 35.7560 5.950 3.0 3.0 13.0100 35.7550 5.970 ** 74 DS 99004 74DS99004 6738 1990 4 1 165200 Eb pr de te sa ox 1.0 1.0 13.0000 ... (7 Replies)
Discussion started by: rleal
7 Replies

10. Shell Programming and Scripting

Select a portion of file based on query

Hi friends :) I am having a small problem and ur help is needed... I have a long file from which i want to select only some portions after filtering (grep). My file looks like : header xxyy lmno xxyy wxyz footer header abcd xy pqrs footer . . (14 Replies)
Discussion started by: vanand420
14 Replies
Login or Register to Ask a Question