Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Merge 4 bim files by keeping only the overlapping variants (unique rs values ) Post 303042724 by fondan on Saturday 4th of January 2020 11:17:26 AM
Old 01-04-2020
Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help:


I have 4 different data sets consisted from 3 different types of array.



On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets:


x2014:
Code:
1       rs3094315       0       752566  G       A
1       rs3131972       0       752721  G       A

....more 550.000


x2016:
Code:
0       200610-10       0       0       G       A
0       200610-108      0       0       G       A

...


x2017
Code:
0       200610-10       0       0       G       A
0       200610-108      0       0       G       A

...



x2018:
Code:
0       200610-10       0       0       G       A
0       200610-108      0       0       G       A

.....more 550K rows




How can I merge all files together, without having any duplicate values based on the 2nd column (rs_id)?

Last edited by vbe; 01-04-2020 at 12:40 PM.. Reason: code tage please
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Need to find only unique values for a given tag across the files

Need to find only unique values for a given tag across the files: For eg: Test1: <Tag1>aaa</Tag1> <Tag2>bbb</Tag2> <Tag3>ccc</Tag3> Test2: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test3: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test4: (8 Replies)
Discussion started by: sudheshnaiyer
8 Replies

2. Shell Programming and Scripting

comparing 2 text files to get unique values??

Hi all, I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable). For eg: file1.txt 1 2 3 4 file2.txt 2 3 So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out. (7 Replies)
Discussion started by: smarty86
7 Replies

3. Shell Programming and Scripting

merge files with same row values

Hi everyone, I'm just wondering how could I using awk language merge two files by comparison of one their row. I mean, I have one file like this: file#1: 21/07/2009 11:45:00 100.0000000 27.2727280 21/07/2009 11:50:00 75.9856644 25.2492676 21/07/2009 11:55:00 51.9713287 23.2258072... (4 Replies)
Discussion started by: tonet
4 Replies

4. Shell Programming and Scripting

sort split merge -u unique

Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns. The line originally looked like this: sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted please note the -u flag. The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies

5. UNIX for Dummies Questions & Answers

How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do. I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns,... (0 Replies)
Discussion started by: JamesT
0 Replies

6. UNIX for Dummies Questions & Answers

Merge two files with non-overlapping identities

Hi All, I wish to merge two files: file1: with header rsSNP-ID Chromosome Chr-Pos rs171 1 175261679 rs242 1 20869461 rs538 1 6160958 file2: without header disease:AAT deficiency:M0525101 rs1243168 20109307 1 disease:AAT deficiency:M0525101 rs4900229 20109307 1... (3 Replies)
Discussion started by: luoruicd
3 Replies

7. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

8. Shell Programming and Scripting

Identify the overlapping and non overlapping regions

file1 chr pos1 pos2 pos3 pos4 1)chr1 1000 2000 3000 4000 2)chr1 1380 1480 6800 7800 3)chr1 6700 7700 1200 2200 4)chr2 8500 9500 5670 6670 file2 chr pos1 pos2 pos3 pos4 1)chr2 8500 9500 5000 6000 2)chr1 6700 7700 1200 2200 3)chr1 1380 1480 6700 7700 4)chr1 1000 2000 4900 5900 I... (2 Replies)
Discussion started by: data_miner
2 Replies

9. Shell Programming and Scripting

Count Unique values from multiple lists of files

Looking for a little help here. I have 1000's of text files within a multiple folders. YYYY/ /MM /1000's Files Eg. 2014/01/1000 files 2014/02/1237 files 2014/03/1400 files There are folders for each year and each month, and within each monthly folder there are... (4 Replies)
Discussion started by: whegra
4 Replies

10. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies
INSERT(7)							   SQL Commands 							 INSERT(7)

NAME
INSERT - create new rows in a table SYNOPSIS
INSERT INTO table [ ( column [, ...] ) ] { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query } [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] DESCRIPTION
INSERT inserts new rows into a table. One can insert one or more rows specified by value expressions, or zero or more rows resulting from a query. The target column names can be listed in any order. If no list of column names is given at all, the default is all the columns of the table in their declared order; or the first N column names, if there are only N columns supplied by the VALUES clause or query. The values sup- plied by the VALUES clause or query are associated with the explicit or implicit column list left-to-right. Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or null if there is none. If the expression for any column is not of the correct data type, automatic type conversion will be attempted. The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted. This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. You must have INSERT privilege on a table in order to insert into it. If a column list is specified, you only need INSERT privilege on the listed columns. Use of the RETURNING clause requires SELECT privilege on all columns mentioned in RETURNING. If you use the query clause to insert rows from a query, you of course need to have SELECT privilege on any table or column used in the query. PARAMETERS
table The name (optionally schema-qualified) of an existing table. column The name of a column in table. The column name can be qualified with a subfield name or array subscript, if needed. (Inserting into only some fields of a composite column leaves the other fields null.) DEFAULT VALUES All columns will be filled with their default values. expression An expression or value to assign to the corresponding column. DEFAULT The corresponding column will be filled with its default value. query A query (SELECT statement) that supplies the rows to be inserted. Refer to the SELECT [select(7)] statement for a description of the syntax. output_expression An expression to be computed and returned by the INSERT command after each row is inserted. The expression can use any column names of the table. Write * to return all columns of the inserted row(s). output_name A name to use for a returned column. OUTPUTS
On successful completion, an INSERT command returns a command tag of the form INSERT oid count The count is the number of rows inserted. If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. Otherwise oid is zero. If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and val- ues defined in the RETURNING list, computed over the row(s) inserted by the command. EXAMPLES
Insert a single row into table films: INSERT INTO films VALUES ('UA502', 'Bananas', 105, '1971-07-13', 'Comedy', '82 minutes'); In this example, the len column is omitted and therefore it will have the default value: INSERT INTO films (code, title, did, date_prod, kind) VALUES ('T_601', 'Yojimbo', 106, '1961-06-16', 'Drama'); This example uses the DEFAULT clause for the date columns rather than specifying a value: INSERT INTO films VALUES ('UA502', 'Bananas', 105, DEFAULT, 'Comedy', '82 minutes'); INSERT INTO films (code, title, did, date_prod, kind) VALUES ('T_601', 'Yojimbo', 106, DEFAULT, 'Drama'); To insert a row consisting entirely of default values: INSERT INTO films DEFAULT VALUES; To insert multiple rows using the multirow VALUES syntax: INSERT INTO films (code, title, did, date_prod, kind) VALUES ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'), ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy'); This example inserts some rows into table films from a table tmp_films with the same column layout as films: INSERT INTO films SELECT * FROM tmp_films WHERE date_prod < '2004-05-07'; This example inserts into array columns: -- Create an empty 3x3 gameboard for noughts-and-crosses INSERT INTO tictactoe (game, board[1:3][1:3]) VALUES (1, '{{" "," "," "},{" "," "," "},{" "," "," "}}'); -- The subscripts in the above example aren't really needed INSERT INTO tictactoe (game, board) VALUES (2, '{{X," "," "},{" ",O," "},{" ",X," "}}'); Insert a single row into table distributors, returning the sequence number generated by the DEFAULT clause: INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets') RETURNING did; COMPATIBILITY
INSERT conforms to the SQL standard, except that the RETURNING clause is a PostgreSQL extension. Also, the case in which a column name list is omitted, but not all the columns are filled from the VALUES clause or query, is disallowed by the standard. Possible limitations of the query clause are documented under SELECT [select(7)]. SQL - Language Statements 2010-05-14 INSERT(7)
All times are GMT -4. The time now is 06:16 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy