Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Merge 4 bim files by keeping only the overlapping variants (unique rs values ) Post 303042724 by fondan on Saturday 4th of January 2020 11:17:26 AM
Old 01-04-2020
Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help:


I have 4 different data sets consisted from 3 different types of array.



On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets:


x2014:
Code:
1       rs3094315       0       752566  G       A
1       rs3131972       0       752721  G       A

....more 550.000


x2016:
Code:
0       200610-10       0       0       G       A
0       200610-108      0       0       G       A

...


x2017
Code:
0       200610-10       0       0       G       A
0       200610-108      0       0       G       A

...



x2018:
Code:
0       200610-10       0       0       G       A
0       200610-108      0       0       G       A

.....more 550K rows




How can I merge all files together, without having any duplicate values based on the 2nd column (rs_id)?

Last edited by vbe; 01-04-2020 at 12:40 PM.. Reason: code tage please
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Need to find only unique values for a given tag across the files

Need to find only unique values for a given tag across the files: For eg: Test1: <Tag1>aaa</Tag1> <Tag2>bbb</Tag2> <Tag3>ccc</Tag3> Test2: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test3: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test4: (8 Replies)
Discussion started by: sudheshnaiyer
8 Replies

2. Shell Programming and Scripting

comparing 2 text files to get unique values??

Hi all, I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable). For eg: file1.txt 1 2 3 4 file2.txt 2 3 So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out. (7 Replies)
Discussion started by: smarty86
7 Replies

3. Shell Programming and Scripting

merge files with same row values

Hi everyone, I'm just wondering how could I using awk language merge two files by comparison of one their row. I mean, I have one file like this: file#1: 21/07/2009 11:45:00 100.0000000 27.2727280 21/07/2009 11:50:00 75.9856644 25.2492676 21/07/2009 11:55:00 51.9713287 23.2258072... (4 Replies)
Discussion started by: tonet
4 Replies

4. Shell Programming and Scripting

sort split merge -u unique

Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns. The line originally looked like this: sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted please note the -u flag. The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies

5. UNIX for Dummies Questions & Answers

How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do. I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns,... (0 Replies)
Discussion started by: JamesT
0 Replies

6. UNIX for Dummies Questions & Answers

Merge two files with non-overlapping identities

Hi All, I wish to merge two files: file1: with header rsSNP-ID Chromosome Chr-Pos rs171 1 175261679 rs242 1 20869461 rs538 1 6160958 file2: without header disease:AAT deficiency:M0525101 rs1243168 20109307 1 disease:AAT deficiency:M0525101 rs4900229 20109307 1... (3 Replies)
Discussion started by: luoruicd
3 Replies

7. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

8. Shell Programming and Scripting

Identify the overlapping and non overlapping regions

file1 chr pos1 pos2 pos3 pos4 1)chr1 1000 2000 3000 4000 2)chr1 1380 1480 6800 7800 3)chr1 6700 7700 1200 2200 4)chr2 8500 9500 5670 6670 file2 chr pos1 pos2 pos3 pos4 1)chr2 8500 9500 5000 6000 2)chr1 6700 7700 1200 2200 3)chr1 1380 1480 6700 7700 4)chr1 1000 2000 4900 5900 I... (2 Replies)
Discussion started by: data_miner
2 Replies

9. Shell Programming and Scripting

Count Unique values from multiple lists of files

Looking for a little help here. I have 1000's of text files within a multiple folders. YYYY/ /MM /1000's Files Eg. 2014/01/1000 files 2014/02/1237 files 2014/03/1400 files There are folders for each year and each month, and within each monthly folder there are... (4 Replies)
Discussion started by: whegra
4 Replies

10. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies
slaqp2.f(3)							      LAPACK							       slaqp2.f(3)

NAME
slaqp2.f - SYNOPSIS
Functions/Subroutines subroutine slaqp2 (M, N, OFFSET, A, LDA, JPVT, TAU, VN1, VN2, WORK) SLAQP2 Function/Subroutine Documentation subroutine slaqp2 (integerM, integerN, integerOFFSET, real, dimension( lda, * )A, integerLDA, integer, dimension( * )JPVT, real, dimension( * )TAU, real, dimension( * )VN1, real, dimension( * )VN2, real, dimension( * )WORK) SLAQP2 Purpose: SLAQP2 computes a QR factorization with column pivoting of the block A(OFFSET+1:M,1:N). The block A(1:OFFSET,1:N) is accordingly pivoted, but not factorized. Parameters: M M is INTEGER The number of rows of the matrix A. M >= 0. N N is INTEGER The number of columns of the matrix A. N >= 0. OFFSET OFFSET is INTEGER The number of rows of the matrix A that must be pivoted but no factorized. OFFSET >= 0. A A is REAL array, dimension (LDA,N) On entry, the M-by-N matrix A. On exit, the upper triangle of block A(OFFSET+1:M,1:N) is the triangular factor obtained; the elements in block A(OFFSET+1:M,1:N) below the diagonal, together with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors. Block A(1:OFFSET,1:N) has been accordingly pivoted, but no factorized. LDA LDA is INTEGER The leading dimension of the array A. LDA >= max(1,M). JPVT JPVT is INTEGER array, dimension (N) On entry, if JPVT(i) .ne. 0, the i-th column of A is permuted to the front of A*P (a leading column); if JPVT(i) = 0, the i-th column of A is a free column. On exit, if JPVT(i) = k, then the i-th column of A*P was the k-th column of A. TAU TAU is REAL array, dimension (min(M,N)) The scalar factors of the elementary reflectors. VN1 VN1 is REAL array, dimension (N) The vector with the partial column norms. VN2 VN2 is REAL array, dimension (N) The vector with the exact column norms. WORK WORK is REAL array, dimension (N) Author: Univ. of Tennessee Univ. of California Berkeley Univ. of Colorado Denver NAG Ltd. Date: November 2011 Contributors: G. Quintana-Orti, Depto. de Informatica, Universidad Jaime I, Spain X. Sun, Computer Science Dept., Duke University, USA Partial column norm updating strategy modified on April 2011 Z. Drmac and Z. Bujanovic, Dept. of Mathematics, University of Zagreb, Croatia. References: LAPACK Working Note 176 Definition at line 149 of file slaqp2.f. Author Generated automatically by Doxygen for LAPACK from the source code. Version 3.4.1 Sun May 26 2013 slaqp2.f(3)
All times are GMT -4. The time now is 07:47 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy