Reduce redundant file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Reduce redundant file
# 1  
Old 03-21-2016
Reduce redundant file

Dear All,
I have to reduce the redundancy of a file that is like this:

Code:
a b 0
a c 0
a f 1
b a 1
b a 0
b c 1
d f 0 
g h 1
f d 1

Basically, this file describe a network with relative nodes and edges.
The nodes are the different letters and the edges are represented by the numbers (in particluar 0, means that the direction of edges is from left to right, 1 is viceversa).


As you may notice, some interaction are duplicates (in bold). For example interaction:

Code:
a b 0
b a 1

a-->b
b<--a

Are exactly the same. The first line interaction go from a to b (0 means inreaction go from left to right), in second line interaction still go from a to b (1 means interaction go from right to left).


What I would like is to filter the file above and output a file like this:

Code:
a b 0
a c 0
a f 1
b a 0
b c 1
d f 0 
g h 1

So, all the duplicated interaction are removed.
!Interactions

Code:
a b 0
b a 0

are not the same! Both go from left to right but is different the starting node.
a-->b
b-->a

Hope is clear.

Best

Giuliano
# 2  
Old 03-21-2016
Any attempts/thoughts/ideas from your side?

---------- Post updated at 15:08 ---------- Previous update was at 15:00 ----------

Howsoever, try
Code:
awk '
($2,$1) in B &&
B[$2,$1] != $3  {next
                }
!(($1,$2) in B) {B[$1,$2] = $3
                }
END     {for (b in B)   {split (b, C, SUBSEP)
                         print C[1], C[2], B[b]
                        }
        }
' file
a b 0
a c 0
a f 1
b a 0
b c 1
d f 0
g h 1

The order of the output lines cannot be guaranteed.
# 3  
Old 03-21-2016
Well, basically I can filter the file and exlude the interaction that are single.

Code:

a b 0
a c 0 
a f 1 
b a 1 
b a 0 
b c 1 
d f 0  
g h 1 
f d 1

In this case for each row I could check if value in column 1 is present in column 2 and viceversa. If so (present) the interaction is bidirectional.

But still, no ideas in how to apply subsequent filter that is the most important.

I am trying to concatenate the column, sort but really I can't figure out anything.

Best

Giuliano
# 4  
Old 03-21-2016
Although above solution works for the sample given, it will fail for others, e.g. the sequence of a b 0 and a b 1. Try this instead:
Code:
awk '
($2,$1,!$3) in B        {next
                        }

                        {B[$0]
                        }
END                     {for (b in B)   {split (b, C)
                         print C[1], C[2], C[3]
                        }
        }
' SUBSEP=" " file

This User Gave Thanks to RudiC For This Post:
# 5  
Old 03-21-2016
try also:
Code:
awk '!a[($3) ? $1 : $2, ($3) ? $2 : $1]++' infile

This User Gave Thanks to rdrtx1 For This Post:
# 6  
Old 03-21-2016
Quote:
Originally Posted by rdrtx1
try also:
Code:
awk '!a[($3) ? $1 : $2, ($3) ? $2 : $1]++' infile

You could also write that so it only evaluates field 3 once:
Code:
awk '!($3 ? A[$2,$1]++ : A[$1,$2]++)' infile

These 2 Users Gave Thanks to Don Cragun For This Post:
# 7  
Old 03-21-2016
Code:
perl -lane '!$seen{$F[2]?"$F[1] $F[0] 0":$_}++ and print' giuliangiuseppe.input

Code:
a b 0
a c 0
a f 1
b a 0
b c 1
d f 0
g h 1

This User Gave Thanks to Aia For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

AIX flag to reduce size of shared file

I am using xlC (Version: 11.01.0000.0011). While build i am using "-g" to have debug information in build. there are many object files (>500) due to which resultant shared file (.so) will have huge size. I can't reduce optimization level. Is there any way or flag is present by using which i... (2 Replies)
Discussion started by: Abhi04
2 Replies

2. Shell Programming and Scripting

Find redundant text in a file

I want to find which pattern or strings have occurred more than one time so that I can remove unnecessary redundancy. For example: If I have the sentence: A quick brown brown fox jumps jumps jumps over the lazy dog in a file, then I want to know that 1. the word "brown" has... (7 Replies)
Discussion started by: hbar
7 Replies

3. Shell Programming and Scripting

reduce pdf file size through multiple folders

Dear all, i have a lot of .pdf files that i need to reduce size with pdf2ps and ps2pdf app. I need a script which i can reduce file size of all .pdf files in every subfolder of WORKDIR folder. folder tree like: WORKDIR SUBBWORK DIR1 SUB_SUB_WORKDIR1 ... (1 Reply)
Discussion started by: migor78
1 Replies

4. Shell Programming and Scripting

How to reduce the length of records in a file?

I have a file with 400 characters How can I create another file with only a portion of them (like 300 within 400) and get rid of the rest? Thanks (5 Replies)
Discussion started by: fafchi
5 Replies

5. Filesystems, Disks and Memory

ZFS Raidz not redundant?

My ZFS on debian media server just died in a power outage, the zpool status shows this: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas raidz1 UNAVAIL 0 0 0 corrupted data sda ONLINE 0 0 0 sdb ONLINE 0 0 0 sdf ONLINE 0 0 0 sdh ONLINE 0 0 0 sdi ONLINE 0 0 0 sdk ONLINE 0... (2 Replies)
Discussion started by: mastersarg
2 Replies

6. Shell Programming and Scripting

how to reduce a length in a file?

i want to reduce a length in the file called text in the file im having 10 byte length. want to reduce it to 9 byte length for all lines. (5 Replies)
Discussion started by: laknar
5 Replies

7. Shell Programming and Scripting

command to reduce size of file/directory???

Hello, I want to compress any given file or directory. I used 1)gzip 2)zip But when I do "ls -l". I found that the zipped file is in fact greater in size than the original file. Can you please tell me the commands which will show me the difference in its size. (2 Replies)
Discussion started by: nsharath
2 Replies

8. Shell Programming and Scripting

How to reduce font size in a file

HPUX 11iv2 #!/bin/sh Hi all. I have a script that results in the creation of an ascii file which is ultimately emailed out to several people. The email wraps each line so I would like to reduce the font size of the ascii file. I looked at nroff and also tr but it wasn't clear to me how to do... (2 Replies)
Discussion started by: lyoncc
2 Replies

9. UNIX for Dummies Questions & Answers

Question is redundant but please advice

I am really really new to Unix. I'm lost with so many books around for different shell. I'm thinking of taking a course on Operating Systems but it contains a lot of Unix programming I think. For example, someone was talking about a "which" command. But I wasn't able to figure out what it does...... (10 Replies)
Discussion started by: Legend986
10 Replies

10. Shell Programming and Scripting

to check redundant file names

hi i have a very simple problem iam moving files from download to archive folder but before such a transfer want to make sure no two file of same are present in my download directory how to check for redundant file names i thought of using WC but it counts inside the file (lines and... (5 Replies)
Discussion started by: maverick
5 Replies
Login or Register to Ask a Question