Calculate ratios for each pair in a given file


 
Thread Tools Search this Thread
Top Forums Programming Calculate ratios for each pair in a given file
# 1  
Old 07-03-2019
Calculate ratios for each pair in a given file

Hello,
My input file looks like this
Code:
#CHROM        POS        ID        REF        ALT        QUAL        FILTER        INFO        FORMAT        Individual1        Individual2        Individual3        Individual4        Individual5        Individual6
22        10000        ID1        A        G        0        PASS                GT        0|1        0|1        0|1        1|1        1|1        1|1
22        10001        ID2        A        G        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10002        ID3        A        G        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10003        ID4        G        A        0        PASS                GT        1|0        1|0        1|0        1|0        1|1        1|1
22        10004        ID5        A        G        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10005        ID6        A        G        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10006        ID7        A        G        0        PASS                GT        0|1        0|1        0|1        1|1        1|1        1|1
22        10007        ID8        A        G        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10008        ID9        C        T        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10009        ID10        C        T        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10010        ID11        A        G        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10011        ID12        C        T        0        PASS                GT        0|1        0|1        0|1        1|0        1|1        1|1
22        10012        ID13        C        A        0        PASS                GT        1|0        1|0        1|0        1|0        1|1        1|1
22        10013        ID14        T        G        0        PASS                GT        1|0        1|0        1|0        1|0        1|1        1|1
22        10014        ID15        G        A        0        PASS                GT        1|0        1|0        1|0        1|0        1|1        1|1

I need to calculate the average fraction of 'ID' that are identical, averaged across all possible pairings (Individual1,Individual2), (Individual1,Individual3), (Individual1,Individual4),

The formula to calculate this is as below


Code:
Ratio(Ind1,Ind2;IDk)= (a|b, c|d)= [(a,c) + (a,d) + (b,c) + (b,d)]/4

For example for for ID1, the values for (Individual1,Individual2) are (0|1,0|1)so the ratio will be calculated as such
Code:
Ratio(Individual1,Individual2;ID1)=(0|1,0|1)=[(0,0) + (0,1) + (1,0) + (1,1)]/4
 =(0+1+1+2)/4
 =1

....

For ID15
Code:
Ratio(Individual1,Individual2;ID1)=(1|0,1|0)=[(1,1) + (1,0) + (0,1) + (0,0)]/4
 =2+1+1+0 /4
=1

This is averaged over all ID, and reported for all pairs of individuals. In this file, I have over 2000 IDs and several individuals and am currently doing it in excel. Is this possible to code this formula in bash or python at all to accept a text tab delimited file ?


The output I am looking for is such


Code:
ID-Pair Avg-ratio-across-all-IDs

 Individual 1-Individual2 1.00
Indidivual1-Indidivual6 0.5


Last edited by Scrutinizer; 07-03-2019 at 11:05 AM.. Reason: quote tags -> code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing a name Value pair file

Hi, I have a file having rows as Row1 : model=.1.3.6.1.4.1.9.1.1047,location=abc, pollgrp=PG_CISCO4, ifindex=3, ip=10.10.10.1,parttype=Interface, devtype=Router,part=GigabitEthernet0/1,ifmtu=1520 Row2 :... (2 Replies)
Discussion started by: mksuneel
2 Replies

2. UNIX for Dummies Questions & Answers

Deleting a file with no corresponding pair

Hi, I am working with 2 sets of files (*csv and *asc) and I wanted to delete asc file with no corresponding csv counterpart. I did tried it manually but its been difficult working with a longer list of files. sample files in directory 20120601.csv 20120601_f1.asc 20120603.csv 20120602_f1.asc... (3 Replies)
Discussion started by: ida1215
3 Replies

3. Shell Programming and Scripting

Sort files as pair file

Hello, I am wondering if there is a way to sort file in directory by pair name : I am looking to get the extension .txt above the .archlike this if possible liste_NATIVE_HINDCAST_PSY1V2R2_R20120314.txt flag.NATIVE_HINDCAST_PSY1V2R2.R20120314.arch... (4 Replies)
Discussion started by: Aswex
4 Replies

4. Shell Programming and Scripting

Parsing with Name value pair and creating a normalized file

I have url string as follows and I need to parse the name value pair into fields /rows event_id date time payload 1329130951 20120214 22.30.40... (1 Reply)
Discussion started by: smee
1 Replies

5. Red Hat

Compression ratios of .tbz file

Hi, I have a question about finding the compression ratios of a zip (bzip2) file.I have written a procedure which upon certain criteria tar's and bzip2 certain directories and moves them to a near line storage. Yesterday I happened to stumble upon it. The procedure has tared and bzipped 6... (3 Replies)
Discussion started by: maverick_here
3 Replies

6. Shell Programming and Scripting

Calculate age of a file | calculate time difference

Hello, I'm trying to create a shell script (#!/bin/sh) which should tell me the age of a file in minutes... I have a process, which delivers me all 15 minutes a new file and I want to have a monitoring script, which sends me an email, if the present file is older than 20 minutes. To do... (10 Replies)
Discussion started by: worm
10 Replies

7. Shell Programming and Scripting

Parse file from remote server to calculate count of string existence in that file

Hi I need to parse the file of same name which exist on different servers and calculate the count of string existed in both files. Say a file abc.log exist on 2 servers. I want to search for string "test" on both files and calculate the total count of search string's existence. For... (6 Replies)
Discussion started by: poweroflinux
6 Replies

8. Shell Programming and Scripting

Calculate the time difference between a local file and a remote file.

I m stuck with a issue. I need to calculate the time difference between two files.. one on the local machine and one on the remote machine using a script. Can any one suggest the way this can be achevied Thanks, manohar (1 Reply)
Discussion started by: meetmano143
1 Replies
Login or Register to Ask a Question