Merge two files with non-overlapping identities


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Merge two files with non-overlapping identities
# 1  
Old 01-07-2013
Merge two files with non-overlapping identities

Hi All,
I wish to merge two files:
file1: with header
Code:
rsSNP-ID Chromosome Chr-Pos
rs171 1 175261679
rs242 1 20869461
rs538 1 6160958

file2: without header
Code:
disease:AAT deficiency:M0525101 rs1243168        20109307       1
disease:AAT deficiency:M0525101 rs4900229        20109307       1
disease:Abdominal Pain:PA446220 rs11209026       17068223       1
disease:Abdominal Pain:PA446220 rs11706052       20061166       1

I want to merge these two files with rsID (column 1 in file 1, column 2 in file 2) to create a file like this:
Code:
rs171 1 175261679  20109307       1 disease:AAT deficiency:M0525101
rs242 1 20869461  17068223       1 disease:Abdominal Pain:PA446220
rs538 1 6160958   17068223       1 disease:Abdominal Pain:PA446220

However, there are quite a lot of rsIDs, only in file 1, for those, I wish to create a line like this:
Code:
rsXXX 1 6160958   ""       0  ""

how can I do that?

Last edited by Scott; 01-07-2013 at 04:20 PM.. Reason: Code tags
# 2  
Old 01-07-2013
Not sure how you matched the lines. What keys them together?
# 3  
Old 01-07-2013
If field to match in file2 is field 3 (rsID), try:
Code:
awk 'NR==FNR {a[$3]=$4" "$5" "$1" "$2; next} {if (a[$1]) {print $0, a[$1]} else {print $0, "\"\"", 0, "\"\""}}' file2 file1

# 4  
Old 01-10-2013
What match? Fields by white space or : or both?
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies

2. UNIX for Dummies Questions & Answers

Retrieving names of files in a dir without overlapping

Hi, I have been trying to retrieve the names of files present in a directory one by one but the names of files are getting overlapped on one another. I tried the below command. ls -1 > filename please help me in getting the file names line by line without overlapping. I am using korn... (6 Replies)
Discussion started by: Pradhikshan
6 Replies

3. Shell Programming and Scripting

Identify the overlapping and non overlapping regions

file1 chr pos1 pos2 pos3 pos4 1)chr1 1000 2000 3000 4000 2)chr1 1380 1480 6800 7800 3)chr1 6700 7700 1200 2200 4)chr2 8500 9500 5670 6670 file2 chr pos1 pos2 pos3 pos4 1)chr2 8500 9500 5000 6000 2)chr1 6700 7700 1200 2200 3)chr1 1380 1480 6700 7700 4)chr1 1000 2000 4900 5900 I... (2 Replies)
Discussion started by: data_miner
2 Replies

4. Shell Programming and Scripting

Print the overlapping entries in 2 files to separate file

I have two files that contain overlapping positions. i want to put them together each overlapping entries in both files in to a new file (the entries of first file first and the entries of second file next) followed by blank line then next overlapping entries and so on. input1 chr1 22 ... (10 Replies)
Discussion started by: raj_k
10 Replies

5. Shell Programming and Scripting

Merge files and generate a resume in two files

Dear Gents, Please I need your help... I need small script :) to do the following. I have a thousand of files in a folder produced daily. I need first to merge all files called. txt (0009.txt, 0010.txt, 0011.txt) and and to output a resume of all information on 2 separate files in csv... (14 Replies)
Discussion started by: jiam912
14 Replies

6. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated... (10 Replies)
Discussion started by: srikanth_sagi
10 Replies

7. UNIX for Dummies Questions & Answers

finding overlapping names in different txt files

Dear Gurus, I have 57 tab-delimited different text files, each one containing entries in 3 columns. The first column in each file contains names of objects. Some names are present in more than one file. I would like to find those names and store them in a separate text file, preferably with a... (6 Replies)
Discussion started by: Unilearn
6 Replies

8. Cybersecurity

How to exclude openssh identities?

I have a few different ssh identities configured for my client. Sometimes I am logging into a system I don't manage that either limits the number of failing identities to a smaller number than I have configured, or is refusing because of a wrong identity (as opposed to finding a valid one and... (0 Replies)
Discussion started by: Skaperen
0 Replies

9. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies
Login or Register to Ask a Question