Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Merge 4 bim files by keeping only the overlapping variants (unique rs values ) Post 303042789 by fondan on Tuesday 7th of January 2020 04:21:55 AM
Old 01-07-2020
Thank you for your replies. I know that it may seems easy but I am a beginner with Bash.



@nezabudka



Yes, x2014, x2015 are the file-names! There are like like the 2nd one:
Code:
cat x2014
 1       rs3094315       0       752566  G       A 

1       rs3131972       0       752721  G       A


etc..



Moderator's Comments:
Mod Comment
Please wrap all code, files, input & output/errors in CODE tags.
It makes it easier to read and preserves spacing for indenting or fixed-width data.

Last edited by rbatte1; 01-07-2020 at 01:57 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Need to find only unique values for a given tag across the files

Need to find only unique values for a given tag across the files: For eg: Test1: <Tag1>aaa</Tag1> <Tag2>bbb</Tag2> <Tag3>ccc</Tag3> Test2: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test3: <Tag1>aaa</Tag1> <Tag2>ddd</Tag2> <Tag3>eee</Tag3> Test4: (8 Replies)
Discussion started by: sudheshnaiyer
8 Replies

2. Shell Programming and Scripting

comparing 2 text files to get unique values??

Hi all, I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable). For eg: file1.txt 1 2 3 4 file2.txt 2 3 So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out. (7 Replies)
Discussion started by: smarty86
7 Replies

3. Shell Programming and Scripting

merge files with same row values

Hi everyone, I'm just wondering how could I using awk language merge two files by comparison of one their row. I mean, I have one file like this: file#1: 21/07/2009 11:45:00 100.0000000 27.2727280 21/07/2009 11:50:00 75.9856644 25.2492676 21/07/2009 11:55:00 51.9713287 23.2258072... (4 Replies)
Discussion started by: tonet
4 Replies

4. Shell Programming and Scripting

sort split merge -u unique

Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns. The line originally looked like this: sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted please note the -u flag. The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies

5. UNIX for Dummies Questions & Answers

How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do. I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns,... (0 Replies)
Discussion started by: JamesT
0 Replies

6. UNIX for Dummies Questions & Answers

Merge two files with non-overlapping identities

Hi All, I wish to merge two files: file1: with header rsSNP-ID Chromosome Chr-Pos rs171 1 175261679 rs242 1 20869461 rs538 1 6160958 file2: without header disease:AAT deficiency:M0525101 rs1243168 20109307 1 disease:AAT deficiency:M0525101 rs4900229 20109307 1... (3 Replies)
Discussion started by: luoruicd
3 Replies

7. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

8. Shell Programming and Scripting

Identify the overlapping and non overlapping regions

file1 chr pos1 pos2 pos3 pos4 1)chr1 1000 2000 3000 4000 2)chr1 1380 1480 6800 7800 3)chr1 6700 7700 1200 2200 4)chr2 8500 9500 5670 6670 file2 chr pos1 pos2 pos3 pos4 1)chr2 8500 9500 5000 6000 2)chr1 6700 7700 1200 2200 3)chr1 1380 1480 6700 7700 4)chr1 1000 2000 4900 5900 I... (2 Replies)
Discussion started by: data_miner
2 Replies

9. Shell Programming and Scripting

Count Unique values from multiple lists of files

Looking for a little help here. I have 1000's of text files within a multiple folders. YYYY/ /MM /1000's Files Eg. 2014/01/1000 files 2014/02/1237 files 2014/03/1400 files There are folders for each year and each month, and within each monthly folder there are... (4 Replies)
Discussion started by: whegra
4 Replies

10. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies
DNSTOP(8)						    BSD System Manager's Manual 						 DNSTOP(8)

NAME
dnstop -- displays various tables of DNS traffic on your network SYNOPSIS
dnstop [-46apsQR] [-b expression] [-i address] [-f filter] [-r interval] [device] [savefile] DESCRIPTION
dnstop is a small tool to listen on device or to parse the file savefile and collect and print statistics on the local network's DNS traffic. You must have read access to /dev/bpf*. COMMAND LINE OPTIONS
The options are as follows: -4 count only messages with IPv4 addresses -6 count only messages with IPv6 addresses -Q count only DNS query messages -R count only DNS reply messages -a anonymize addresses -b expression BPF filter expression (default: udp port 53) -i address ignore select addresses -p Do not put the interface into promiscuous mode. -r Redraw interval (seconds). -l level keep counts on names up to level domain name levels. For example, with -l 2 (the default), dnstop will keep two tables: one with top-level domain names, and another with second-level domain names. Increasing the level provides more details, but also requires more memory and CPU. -f input filter name The "unknown-tlds" filter includes only queries for TLDs that are bogus. Useful for identifying hosts/servers that leak queries for things like "localhost" or "workgroup." The "A-for-A" filter includes only A queries for names that are already IP addresses. Certain Microsoft Windows DNS servers have a known bug that forward these queries. The "rfc1918-ptr" filter includes only PTR queries for addresses in RFC1918 space. These should never leak from inside an organiza- tion. The "refused" filter, when used with the -R option, tells dnstop to count only replies with rcode REFUSED. The "qtype-any" filter tells dnstop to count only message of type ANY. -n name Only count messages within the domain name -P Print "progress" messages on stderr when in non-interactive mode. -B buckets Use buckets hash table buckets. -X Do not tabulate the sources + query name counters. This can significantly reduce memory usage on busy servers and large savefiles. savefile a captured network trace in pcap format device ethernet device (ie fxp0) RUN TIME OPTIONS
While running, the following options are available to alter the display: s display the source address table d display the destination address table t display the breakdown of query types seen r display the breakdown of response codes seen o display the breakdown of opcodes seen 1 show 1st level query names 2 show 2nd level query names 3 show 3rd level query names 4 show 4th level query names 5 show 5th level query names 6 show 6th level query names 7 show 7th level query names 8 show 8th level query names 9 show 9th level query names ! show sources + 1st level query names @ show sources + 2nd level query names # show sources + 3rd level query names $ show sources + 4th level query names % show sources + 5th level query names ^ show sources + 6th level query names & show sources + 7th level query names * show sources + 8th level query names ( show sources + 9th level query names ^R reset the counters ^X exit the program space redraw ? help NON-INTERACTIVE MODE If stdout is not a tty, dnstop runs in non-interactive mode. In this case, you must supply a savefile for reading, instead of capturing live packets. After reading the entire savefile, dnstop prints the top 50 entries for each table. HOW MESSAGES ARE COUNTED
By default dnstop examines only query messages and ignores replies. In this case the response code table is meaningless and will likely show 100% "Noerror." If you supply (only) the -R command line option, dnstop examines replies and ignores queries. This allows you to see meaningful response code values, as well as all the other tables. In this case all the query attributes (such as type and name) are taken from the Question sec- tion of the reply. Note, however, that it is common for a stream of DNS messages to contain more queries than replies. This could happen, for example, if the server is too busy to respond to every single query, or if the server is designed to ignore malformed query messages. Therefore, you might want to examine both queries and replies by giving both -R and -Q command line options. In this case, only the response code counts are taken from the replies and all other attributes are taken from the queries. AUTHORS
Duane Wessels (wessels@measurement-factory.com) Mark Foster (mark@foster.cc) Jose Nazario (jose@monkey.org) Sam Norris <@ChangeIP.com> Max Horn <@quendi.de> John Morrissey <jwm@horde.net> Florian Forster <octo@verplant.org> Dave Plonka <plonka@cs.wisc.edu> http://dnstop.measurement-factory.com/ BUGS
Does not support TCP at this time. BSD
21 March, 2008 BSD
All times are GMT -4. The time now is 11:20 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy