Shortest path for each query from a csv file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shortest path for each query from a csv file
# 1  
Old 11-25-2013
Shortest path for each query from a csv file

Hi all, I have this file
Code:
__DATA__
 child,    Parent,    probability,
 M7,    Q,    P,
 M7,      M28,     E,
 M28,     M6,      E,
 M6,      Q,      Pl,
 & several hundred lines.....

Legends: P(= Probable) > Pl(=Plausible) > E(=Equivocal). What I want is for each child I want to trace it back to Q(Query), but I need the shortest path which leads it to the Q(Query) along with their probabilities. For example for the input data shown above the output should be :-
Code:
__OUTPUT_1__
 M7:  M7<-Q = P
 M28: M28<-M6<-Q = Pl.E
 M6:  M6<-Q = Pl

But as we can see from second row of input data M7 has another longer path tracing to Q : M7<-M28<-M6<-Q = Pl.E.E. But the code should have an option to neglect the largest path and thus show only the shortest path OR to show all of them. i.e.
Code:
__OUTPUT_2__
 M7:  M7<-Q = P
 M7:  M7<-M28<-M6<-Q = Pl.E.E
 M28: M28<-M6<-Q = Pl.E
 M6:  M6<-Q = Pl

Thus this second output prints a path tracing back to Q for each of the rows, so if we have N input rows to the program , we will have N corresponding output rows.

The code that I have is not working on the data. please help
Code:
#! /usr/bin/perl
 my %DEF = (
 I   => [qw( P Pl P.P P.Pl Pl.P Pl.Pl P.P.P P.P.Pl P.Pl.P P.Pl.Pl Pl.P.P Pl.P.Pl Pl.Pl.P Pl.Pl.Pl )],
 II  => [qw( E P.E Pl.E P.P.E P.Pl.E Pl.P.E Pl.Pl.E )],
 III => [qw( E.P E.Pl P.E.P P.E.Pl Pl.E.P Pl.E.Pl E.P.P E.P.Pl E.Pl.P E.Pl.Pl )],
 IV  => [qw( E.E P.E.E Pl.E.E E.P.E E.Pl.E E.E.P E.E.Pl E.E.E )] );
 my @rank = map @$_, @DEF{qw(I II III IV)};
 my %rank = map {$rank[$_-1] => $_} 1..@rank;
 my @group = map {($_) x @{$DEF{$_}}} qw(I II III IV);
 my %group = map {$rank[$_-1] => $group[$_-1]."_".$_} 1..@group;
 sub rank { $rank{$a->[2]} <=> $rank{$b->[2]} }
 my %T;

 sub oh { map values %$_, @_ }
 sub ab {   my ($b, $a) = @_;   [$b->[0], $a->[1], qq($a->[2].$b->[2]), qq($b->[3]<-$a->[3])] 
}
 sub xtend {
 my $a = shift;
 map {ab $_, $a} oh @{$T{$a->[0]}}{@_} }
 sub ins { $T{$_[3] //= $_[1]}{$_[2]}{$_[0]} = \@_ }

 ins split /,\s*/ for <DATA>;
 ins @$_ for map {xtend $_, qw(P E Pl)} (oh oh oh \%T);
 ins @$_ for map {xtend $_, qw(P E Pl)} (oh oh oh \%T);

 for (sort {rank} grep {$_->[1] eq 'Q'} (oh oh oh \%T)) {
 printf "%-4s: %20s,  %-8s %6s\n",
     $_->[0], qq($_->[0]<-$_->[3]), $_->[2], $group{$_->[2]};
 }  

 __DATA__
 M7    Q    P
 M54    M7    Pl
 M213    M54    E
 M206    M54    E
 M194    M54    E
 M53    M7    Pl
 M186    M53    Pl
 M194    M53    Pl
 M187    M53    E
 M204    M53    E
 M201    M53    E
 M202    M53    E
 M179    M53    E
 M173    M53    E
 M205    M53    E
 M195    M53    E
 M196    M53    E
 M197    M53    E
 M198    M53    E
 M57    M7    E
 M44    M7    E
 M61    M7    E
 M13    M7    E
 M50    M7    E
 M158    M50    P
 M157    M50    P
 M153    M50    Pl
 M162    M50    E
 M164    M50    E
 M165    M50    E
 M147    M50    E
 M159    M50    E


# 2  
Old 11-25-2013
If your happy with an awk solution you could try:

Code:
awk -F', *' -v shortest=1 '
function follow(i,j,v,b,r)
{
  b=999
  if(N[i] == "Q") return C[i]"<-Q = "P[i];

  for(j=split(U[N[i]],V);j;j--) {
     v=C[i] "<-" follow(V[j]) "." P[i]
     if(split(v,H,".") < b) {
        r=v
        b=split(v,H,".")
     }
  }
  return r;
}
{ U[$1] = (U[$1]?U[$1] " ":"")NR
  C[NR]=$1
  N[NR]=$2
  P[NR]=$3
}

END {
   for(i=1;i<=NR;i++) {
       if (shortest) {
          v=follow(i)
          if(!(C[i] in A) || split(v,H,".") < split(A[C[i]],H,"."))
             A[C[i]]=v
       } else printf "%s: %s\n",C[i],follow(i)
    }
    if (shortest)
       for(i=1;i<=NR;i++) {
             if(C[i] in A) {
                 printf "%s: %s\n",C[i],A[C[i]];
                 delete A[C[i]];
            }
        }
}' infile

Change red value above to shortest=0 for full list.

Output(2) for your data:
Code:
M7: M7<-Q = P
M54: M54<-M7<-Q = P.Pl
M213: M213<-M54<-M7<-Q = P.Pl.E
M206: M206<-M54<-M7<-Q = P.Pl.E
M194: M194<-M54<-M7<-Q = P.Pl.E
M53: M53<-M7<-Q = P.Pl
M186: M186<-M53<-M7<-Q = P.Pl.Pl
M187: M187<-M53<-M7<-Q = P.Pl.E
M204: M204<-M53<-M7<-Q = P.Pl.E
M201: M201<-M53<-M7<-Q = P.Pl.E
M202: M202<-M53<-M7<-Q = P.Pl.E
M179: M179<-M53<-M7<-Q = P.Pl.E
M173: M173<-M53<-M7<-Q = P.Pl.E
M205: M205<-M53<-M7<-Q = P.Pl.E
M195: M195<-M53<-M7<-Q = P.Pl.E
M196: M196<-M53<-M7<-Q = P.Pl.E
M197: M197<-M53<-M7<-Q = P.Pl.E
M198: M198<-M53<-M7<-Q = P.Pl.E
M57: M57<-M7<-Q = P.E
M44: M44<-M7<-Q = P.E
M61: M61<-M7<-Q = P.E
M13: M13<-M7<-Q = P.E
M50: M50<-M7<-Q = P.E
M158: M158<-M50<-M7<-Q = P.E.P
M157: M157<-M50<-M7<-Q = P.E.P
M153: M153<-M50<-M7<-Q = P.E.Pl
M162: M162<-M50<-M7<-Q = P.E.E
M164: M164<-M50<-M7<-Q = P.E.E
M165: M165<-M50<-M7<-Q = P.E.E
M147: M147<-M50<-M7<-Q = P.E.E
M159: M159<-M50<-M7<-Q = P.E.E


Last edited by Chubler_XL; 11-25-2013 at 06:05 PM.. Reason: Simplified answer
# 3  
Old 11-26-2013
Thanks Chubler,
But how do I run this script of yours. I mean how to give it a file from command line to process on. Could you please suggest.

I tried this , but am getting errors:-
Code:
awk -f check_26nov_pred.awk input.csv
awk: check_26nov_pred.awk:1: awk -F', *' -v shortest=1 '
awk: check_26nov_pred.awk:1:       ^ invalid char ''' in expression

# 4  
Old 11-26-2013
If you save the file as a .awk you don't need the first line and the last line should be replaced with }


Call it like this:
Code:
$ awk -F ', *' -v shortest=1 -f check_26nov_pred.awk input.csv

# 5  
Old 11-26-2013
I modified as you suggested :-
So your code now starts with
Code:
function follow(i,j,v,b,r)
{
  b=999
  if(N[i] == "Q") return C[i]"<-Q = "P[i];
.........

And end with
Code:
    if (shortest)
       for(i=1;i<=NR;i++) {
             if(C[i] in A) {
                 printf "%s: %s\n",C[i],A[C[i]];
                 delete A[C[i]];
            }
        }
}

When I run it on sample file, Sample_input.csv as
awk -F ', *' -v shortest=0 -f check_26nov_pred.awk prediction/Sample_input.csv :-
Code:
M7,    Q,    P,
M54,    M7,    Pl,
M213,    M54,    E,
M206,    M54,    E,
M194,    M54,    E,
M53,    M7,    Pl,
M186,    M53,    Pl,
M194,    M53,    Pl,
M187,    M53,    E,
M204,    M53,    E,
M201,    M53,    E,
M202,    M53,    E,
M179,    M53,    E,
M13,    M53,    E,
M157,    M53,    E,
M173,    M53,    E,
M205,    M53,    E,
M195,    M53,    E,

I get empty fields as output :-
Code:
M7:
M54:
M213:
M206:
M194:
M53:
M186:
M194:
M187:
M204:
M201:
M202:
M179:
M13:
M157:
M173:
M205:
M195:

# 6  
Old 11-26-2013
Seems to be working fine for me. If you are running this on a Solaris/SunOS system, change awk to /usr/xpg4bin/awk , /usr/xpg6/bin/awk , or nawk.

Edit: I think those are tab characters after the comma in your data file, running the script like this:

Code:
$ awk -F ',\\s*' -v shortest=0 -f check_26nov_pred.awk prediction/Sample_input.csv

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Not able to write SQL query output in to .csv file with shell script.

I am trying to write SQL query output into a .csv file. But in the output columns are displaying in different lines instead of coming in one line. Main Code shell script: this is my code: #!/bin/bash file="db_detail.txt" . $file rm /batch/corpplan/bin/dan.csv... (6 Replies)
Discussion started by: sandeepgoli53
6 Replies

2. Shell Programming and Scripting

I am trying to merge all csv files from source path into 1 file

I am trying to merge all csv files from source path into one single csv file in target. but getting error message: hadoop fs -cat /user/hive/warehouse/stage.db/PK_CLOUD_CHARGE/TCH-charge_*.csv > /user/hive/warehouse/stage.db/PK_CLOUD_CHARGE/final/TCH_pb_charge.csv getting error message:... (0 Replies)
Discussion started by: cplusplus1
0 Replies

3. Homework & Coursework Questions

Need help how to search for shortest line from a file

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: I have to write a program that have to read every standard input then print out the line number and the content of... (10 Replies)
Discussion started by: scopiop
10 Replies

4. UNIX for Dummies Questions & Answers

Need help to move .csv file from UNIX path to windows shared drive or c:\ drive

Hi Guys, Can any one help me on this. I need help to move .csv/.xls file from unix path to windows shared drive or c:\ drive? Regards, LKR (1 Reply)
Discussion started by: lakshmanraok117
1 Replies

5. Shell Programming and Scripting

Need help to move .csv file from UNIX path to window c: shared drive

Hi Guys, I need to move myfile.csv file from unix path(\oracle_home) to window c:\ shared drive h:\. Thanks in advance! Regards, Lakshman (1 Reply)
Discussion started by: lakshmanraok117
1 Replies

6. UNIX for Advanced & Expert Users

Need help on moving .csv file from UNIX to windows file path

Need help on moving .csv file from unix to windows file path. (1 Reply)
Discussion started by: lakshmanraok117
1 Replies

7. Shell Programming and Scripting

How to select the shortest path in grep search?

Hi, How can I display only one shortest path (A/B/configure)? $ grep configure file.txt A/B/configure A/B/C/configure A/B/C/D/configure Thank you. (9 Replies)
Discussion started by: hce
9 Replies

8. Shell Programming and Scripting

How to write result of a query to more than 1 .csv

If the result of the query is greater than say, 50,000 then the next 50,000 should be written to the second file and so on. Is it possible? (1 Reply)
Discussion started by: Jassz
1 Replies

9. Programming

Implementing a shortest path algorithm in C

Hello, I have a question. I have to implement a shortest path algorithm in n*n grid, moving from one coordinate to another coordinate. But i have no clue how to start. Can anyone help? Thanks a tonn in advance! (1 Reply)
Discussion started by: mind@work
1 Replies

10. Shell Programming and Scripting

How to use sql data file in unix csv file as input to an sql query from shell

Hi , I used the below script to get the sql data into csv file using unix scripting. I m getting the output into an output file but the output file is not displayed in a separe columns . #!/bin/ksh export FILE_PATH=/maav/home/xyz/abc/ rm $FILE_PATH/sample.csv sqlplus -s... (2 Replies)
Discussion started by: Nareshp
2 Replies
Login or Register to Ask a Question