Delete data blocks based on missing combinations


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Delete data blocks based on missing combinations
# 1  
Old 11-05-2014
Delete data blocks based on missing combinations

Hello masters,

I am filtering data based on completeness. A (Name , Group) combination in File2 is only complete when it has data for all subgroups specified in File1.
All incomplete (Name , Group) combinations do not appear in the output.

So for example , Name1 Group 1 in File2 is incomplete because it doesnt have data for all subgroups ( a,b and c), so this Name Group combination doesnt appear in the output.

File1
Code:
Group Subgroup
1 a
1 b
1 c
2 a
2 b
2 c
3 a

File2
Code:
Name Group Subgroup Data
Name1 1 a d1
Name1 1 b d2
Name1 2 a d1
Name1 2 b d2
Name1 2 c d3
Name2 1 a d1
Name2 1 b d2
Name2 1 c d5
Name2 3 a d2
Name3 1 a d1
Name3 1 b d2
Name3 2 c d1
Name4 1 a f1
Name4 1 b f2
Name4 1 c f5
Name4 2 a f1
Name4 2 b f2
Name4 2 c f5

Output
Code:
Name1 2 a d1
Name1 2 b d2
Name1 2 c d3
Name2 1 a d1
Name2 1 b d2
Name2 1 c d5
Name2 3 a d2
Name4 1 a f1
Name4 1 b f2
Name4 1 c f5
Name4 2 a f1
Name4 2 b f2
Name4 2 c f5


Last edited by senhia83; 11-05-2014 at 12:36 PM..
# 2  
Old 11-05-2014
Any attempt from your side?
# 3  
Old 11-05-2014
Quote:
Originally Posted by RudiC
Any attempt from your side?
yes, I have tried, but its coming up with syntax errors

Code:
awk -F'\t' 'NR==FNR{a[$1"\t"$2]=$0;next} $[$2"\t"$3] in a {b[$2"\t"$3]=$0; if ($0=="") else print b[$0]}1' OFS='\t' file1 file2

# 4  
Old 11-05-2014
Based on exactly your simple samples, try
Code:
awk     'FNR==NR        {GRP[$1]=GRP[$1]$2; next}
         $2 != GR       {if (GRP[GR] == TMPGR) print ST
                         ST=DL=TMPGR=""}
                        {ST=ST DL $0
                         DL=RS
                         NM=$1
                         GR=$2
                         TMPGR=TMPGR $3
                        }
         END            {if (GRP[GR] == TMPGR) print ST}
        ' file1 file2
Name Group Subgroup Data
Name1 2 a d1
Name1 2 b d2
Name1 2 c d3
Name2 1 a d1
Name2 1 b d2
Name2 1 c d5
Name2 3 a d2
Name4 1 a f1
Name4 1 b f2
Name4 1 c f5
Name4 2 a f1
Name4 2 b f2
Name4 2 c f5

This User Gave Thanks to RudiC For This Post:
# 5  
Old 11-05-2014
Does the data need to be sorted in any way?
# 6  
Old 11-05-2014
Yes, it depends on the order given.
This User Gave Thanks to RudiC For This Post:
# 7  
Old 11-05-2014
It will take a while for me to try and break the code with my huge dataset...I will get back to you..thank you Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Mismatched free() / delete / delete [] line no missing

Could you tell me the possibilities of the reason to get the Mismatched free() / delete / delete . I unable to see the line no in the valgrind report. it displays the function name. with that function name, I am not able to find where exactly the issue is there.I am getting the Mismatched free()... (3 Replies)
Discussion started by: SA_Palani
3 Replies

2. Shell Programming and Scripting

Create combinations based on scores

Hi experts, I have a score matrix like below, where the 3rd column ( 1 max, 0 min) says how close the 2nd column variable is to the 1st column variable a b 0.3 a c 0.87 a d 0.75 b x 0.67 b y 0.98 b z 0.24 c ... (4 Replies)
Discussion started by: jianp83
4 Replies

3. Shell Programming and Scripting

Modify blocks of text by printing missing columns

Hi Experts, I have a problem where I want to print missing columns (3,4) within a block of text. Each block is separated by "###". Some rows have missing column 3 and 4 which should be same as the previous value in column 3 and 4. The file is space delimited. For example: INPUT ###... (5 Replies)
Discussion started by: mira
5 Replies

4. UNIX for Dummies Questions & Answers

using sed delete a line from csv file based on specific data in two separate fields

Hello, :wall: I have a 12 column csv file. I wish to delete the entire line if column 7 = hello and column 12 = goodbye. I have tried everything that I can find in all of my ref books. I know this does not work /^*,*,*,*,*,*,"hello",*,*,*,*,"goodbye"/d Any ideas? Thanks Please... (2 Replies)
Discussion started by: Chris Eagleson
2 Replies

5. Shell Programming and Scripting

Extracting data blocks from file

Hi all, I want to extract blocks of data from a file depending on the contents of that block. The input file(table) has several blocks each starting with 'gene' in the first column. I want to extract only those blocks which do not have the expression '_T02' in the second column. Input file ... (3 Replies)
Discussion started by: newbie83
3 Replies

6. UNIX for Dummies Questions & Answers

Using SED to delete between two blocks.....and then repeating.

Hi All I'm still on my slow and painful self teach learning experience with SED. My latest issue is getting my head around how best to do the following. I have a file that's created using iwlist that I want to chop up into paragraphs then only keep the ones I see as potential threats. I... (3 Replies)
Discussion started by: Bashingaway
3 Replies

7. Shell Programming and Scripting

Delete Blank Lines Between DHCP Host Blocks

Hi All, I have a dhcpd.conf file that gets static hosts added and removed via a shell script. After sometime, there becomes huge gaps of space ( blank lines ) between each host block. I tried a couple of sed one-liners; but, I can't seem to get the output I'm looking for. Also, I would like... (4 Replies)
Discussion started by: cstovall
4 Replies

8. Shell Programming and Scripting

Delete blocks of lines from text file

Hello, Hello Firends, I have file like below. I want to remove selected blocks say abc,pqr,lst. how can i remove those blocks from file. zone abc { blah blah blah } zone xyz { blah blah blah } zone pqr { blah blah blah } (4 Replies)
Discussion started by: nrbhole
4 Replies

9. Shell Programming and Scripting

Sorting blocks of data

Hello all, Below is what I am trying to accomplish: I have a file that looks like this /* ----------------- xxxx.y_abcd_00000050 ----------------- */ jdghjghkla sadgsdags asdgsdgasd asdgsagasdg /* ----------------- xxxx.y_abcd_00000055 ----------------- */ sdgsdg sdgxcvzxcbv... (8 Replies)
Discussion started by: alfredo123
8 Replies

10. Shell Programming and Scripting

Delete blocks with no data..

Hi, I tried this but could not get it... here is what I need I have an xml where I get all the data in blocks but some times I get empty blocks with no data...shown below..I need to delete only those blocks with no data, I tried couple of ways but could not do it..any help is appreciated...... (1 Reply)
Discussion started by: mgirinath
1 Replies
Login or Register to Ask a Question