UNIX command to select the best edge values from a network file

03-10-2020

Registered User

27, 0

Join Date: Sep 2013

Last Activity: 23 April 2020, 7:26 AM EDT

Posts: 27

Thanks Given: 17

Thanked 0 Times in 0 Posts

UNIX command to select the best edge values from a network file

I have a tab-delimited data representing network data (undirected). Among the duplicated edges, I wanted to select those edges for which I have the higher absolute value of the log values.
I have written a code in python, but its taking a lot of time. I would be grateful if someone helps me with an awk command. Kindly note, the network is undirected, i.e. A--B and B--A are duplicate edges. My original file has a large number of columns, I have given a simplified test data

Test data

Code:

     Gene1    Gene2    Log
    AT1G01020    AT1G01010    1.682708
    AT1G01020    AT1G01010    -1.90043
    AT1G01020    AT1G01010    -1.832192
    AT1G01070    AT1G01060    -0.591932
    AT1G01070    AT1G01060    -1.204241
    AT1G01073    AT1G01070    0.790549
    AT1G01060    AT1G01070    1.214972

Expected Output

Code:

    AT1G01020    AT1G01010    -1.90043
    AT1G01070    AT1G01060    1.214972
    AT1G01073    AT1G01070    0.790549

Code:

gene_table=file1.readlines() # In the real file, j[12]=Gene1, j[13]=Gene2 and j[27]=log value
lfc=[]
for j in gene_table:
    j=j.split("\t")
    j[12]=j[12].strip()
    j[13]=j[13].strip()
    lfc=[]
    int_list=[]
    lfc.append(float(j[27]))
    int_list.append(j[0])
    dict_int={}
    for k in gene_table:
        k=k.split("\t")
        k[12]=k[12].strip()
        k[13]=k[13].strip()
        if (j[0]!=k[0]) and ((j[12]==k[12] and j[13]==k[13]) or (j[12]==k[13] and j[12]==k[13])):
            lfc.append(float(k[27]))
    dict_int=dict(zip(int_list, lfc))
    x=max(lfc, key=abs)
    #print x
    listOfKeys = [key  for (key, value) in dict_int.items() if value == x]
    print listOfKeys

Last edited by Scrutinizer; 03-11-2020 at 12:29 AM..

Sanchari

View Public Profile for Sanchari

Find all posts by Sanchari

03-11-2020

Registered User

489, 285

Join Date: Nov 2018

Last Activity: 30 October 2021, 10:47 AM EDT

Location: undefined

Posts: 489

Thanks Given: 382

Thanked 285 Times in 215 Posts

Hi, @Sanchari
Check have you an error?

Quote:

Originally Posted by Sanchari

Test data

Code:

     Gene1    Gene2    Log
    AT1G01020    AT1G01010    1.682708
    AT1G01020    AT1G01010    -1.90043
    AT1G01020    AT1G01010    -1.832192
    AT1G01070    AT1G01060    -0.591932
    AT1G01070    AT1G01060    -1.204241
    AT1G01073    AT1G01070    0.790549
    AT1G01060    AT1G01070    1.214972

Expected Output

Code:

    AT1G01020    AT1G01010    -1.90043
    AT1G01070    AT1G01060    1.214972
    AT1G01073    AT1G01070    0.790549

If you need to display and unique fields
then the result should be

Code:

AT1G01070 AT1G01060 -1.204241
AT1G01060 AT1G01070 1.214972
AT1G01020 AT1G01010 -1.90043
AT1G01073 AT1G01070 0.790549

and if don't

Code:

AT1G01070 AT1G01060 -1.204241
AT1G01020 AT1G01010 -1.90043

Is the solution suitable for you with the 'awk' tool?

--- Post updated at 17:40 ---

Code:

uniq -Dw 26 file |
awk '
NR==1 {next}
{if(abs(A[$1 FS $2]) < abs($3)) A[$1 FS $2] = $3}
END {for(i in A) print i, A[i]}
func abs(x) { return (x<0) ? x*-1 : x }'

awk '
NR==1 {next}
{if(abs(A[$1 FS $2]) < abs($3)) A[$1 FS $2] = $3}
END {for(i in A) print i, A[i]}
func abs(x) { return (x<0) ? x*-1 : x }' file

nezabudka

View Public Profile for nezabudka

Find all posts by nezabudka

03-11-2020

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

how about (a bit verbose):
awk -f san.awk myInputFile, where san.awk is:

Code:

BEGIN {
  FS=OFS="\t"
  i1=1
  i2=2
  v=3
}
function abs(x)    { return x < 0 ? -x : x }

FNR>1 {
   idx=($i1 > $i2)? $i1 OFS $i2 : $i2 OFS $i1
   if (abs(a[idx])<abs($v))
      a[idx]=$v
}
END {
  for (i in a)
    print i,a[i]
}

results in:

Code:

AT1G01070       AT1G01060       1.214972
AT1G01020       AT1G01010       -1.90043
AT1G01073       AT1G01070       0.790549

Last edited by vgersh99; 03-11-2020 at 11:02 AM..

This User Gave Thanks to vgersh99 For This Post:

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

UNIX for Beginners Questions & Answers

UNIX command to select the best edge values from a network file

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a content in a file with specific interval base on the delimited values using UNIX command

Discussion started by: KK230689

2. Shell Programming and Scripting

UNIX command -Filter rows in fixed width file based on column values

Discussion started by: ashok.k

3. Shell Programming and Scripting

Comparing multiple network files (edge lists)

Discussion started by: Sanchari

4. Shell Programming and Scripting

Unix command to select first few characters and last character of a line

Discussion started by: Sanjeev Yadav

5. Shell Programming and Scripting

Running a select script through UNIX and sending output to file

Discussion started by: dbchud

6. Shell Programming and Scripting

Identify high values "�" in a text file using Unix command

Discussion started by: devina

7. Shell Programming and Scripting

Select distinct values from a flat file

Discussion started by: smalya

8. UNIX for Dummies Questions & Answers

How to select a particular media from the printer with a UNIX command

Discussion started by: HelpMeOUt