Detecting subset of a word


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Detecting subset of a word
# 1  
Old 01-29-2013
Detecting subset of a word

Each line of the file has some words exactly same letters as of the first one. But has zero or more "_+" inserted. I am interested in those words and remove the other cases.
Example:
Code:
abcde abcd_+e abcd_+de
fghig  fghigi fghi_+g  
klmn klmn

I want to get this:
Code:
abcde abcd_+e 
fghig fghi_+g  
klmn klmn

# 2  
Old 01-29-2013
Only/always one per line?
# 3  
Old 01-29-2013
You might want to try
Code:
$ awk '{for (i=2; i<=NF; i++) {tmp=$i; gsub("_\+","", tmp); if (tmp!=$1) $i=""}; $0=$0; $1=$1}1' file
abcde abcd_+e 
fghig fghi_+g
klmn klmn

(the $0=$0; $1=$1 stolen from scrutinizer: https://www.unix.com/302761935-post7.html)
# 4  
Old 01-29-2013
Another one:
Code:
awk '{split($0,F); gsub(/_\+/,x); for(i=2;i<=NF;i++) if($1==$i) $1=$1 OFS F[i]}NF=1' file

(will work in most awks, but not all)

Code:
awk '{s=$1; split($0,F); gsub(/_\+/,x); for(i=2;i<=NF;i++) if($1==$i) s=s OFS F[i]; print s}' file


--
@RudiC, you would need to use /_\+/ instead of "_\+", or "_\\+"

Last edited by Scrutinizer; 01-29-2013 at 07:50 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 01-30-2013
No, could be more that one duplicate in a line.
# 6  
Old 01-30-2013
Three or more of the same net value, not just unique pairs (and unmatched)?

If two on a line, intermixed or one pair before the other?
# 7  
Old 01-30-2013
DGPickett, I am not sure if I get your first question.

For the second one, the similar words should contain the same letters as the first word in the same order.
Ex. abs ab_+s a_+bs

Then per line, similar words could exist anywhere on that line, not necessarily consecutive.
Ex. abs abas ab_+s


Does that answer your question?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Cannot subset ranges from another range set

Ca21chr2_C_albicans_SC5314 2159343 2228327 Ca21chr2_C_albicans_SC5314 636587 638608 Ca21chr2_C_albicans_SC5314 5286 50509 Ca21chr2_C_albicans_SC5314 634021 636276 Ca21chr2_C_albicans_SC5314 1886545 1900975 Ca21chr2_C_albicans_SC5314 610758 613544... (9 Replies)
Discussion started by: cryptodice
9 Replies

2. Shell Programming and Scripting

How to check if file2 is a subset of file1?

In-order to check and print if file2 is a subset of file one i do the below. var1=$(cat //tmp/file1 | sort -u | wc) var2=$(cat /tmp/file2 /tmp/file1 | sort -u | wc) if ; then echo "file2 is a subset of file1 becoz var1 and var2 have the same values." fi However, i get the following error ... (1 Reply)
Discussion started by: mohtashims
1 Replies

3. Shell Programming and Scripting

How-to check if file1 a subset of file2 ?

I need to know if file1 is a subset of file2 i.e all the contents of file1 are present in file2 or not. Here is how i would do it. Read line by line file1 and grep every line in file2 in a for loop. any failing grep would means that it is not a subset. Is there a quicker or easier way... (3 Replies)
Discussion started by: mohtashims
3 Replies

4. UNIX for Advanced & Expert Users

How to extract subset file from dataset?

Hello I have a data set which looks like this : progeny sire dam gender 12 1 3 M 13 2 4 F 14 2 5 F 15 6 5 ... (13 Replies)
Discussion started by: sajmar
13 Replies

5. UNIX for Dummies Questions & Answers

How to subset data?

Hi. I have a large data file. the first column has unique identifiers. I have approximately 5 of these files and they have varying number of columns in their rows. I need to extract ~300 of the rows in to a separate file. I'm not looking for something that would do all 5 files at once, but... (7 Replies)
Discussion started by: kadm
7 Replies

6. Shell Programming and Scripting

Creating subset of compilation errors

I am compiling a fortran program using gfortran and the result looks as below I want to write a bash or awk script that will scan the information and output only problems within a range of line numbers Example: If I specify the file createmodl.f08, start line 1000 and end line 1100, I will... (8 Replies)
Discussion started by: kristinu
8 Replies

7. UNIX for Dummies Questions & Answers

how to get a subset of such a file

Dear all, I have a file lik below: n of row=420, n of letters in each row=100000 like below: there is no space between the letters. what I want is: the 75000th letter to the 85000th letter in each row. how to do that? thanks a lot! ... (2 Replies)
Discussion started by: forevertl
2 Replies

8. UNIX for Dummies Questions & Answers

Find EXACT word in files, just the word: no prefix, no suffix, no 'similar', just the word

I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL. I need the whole word... (6 Replies)
Discussion started by: chicchan
6 Replies

9. UNIX for Dummies Questions & Answers

Help with subset and if-then statements

Hello everyone. I'm new to the boards, I hope I can get and possibly give some help through these forums. I need some help. I have two CSV files, let's call them File A and File B. This is the structure for File A: ID, VAR1, VAR2, VAR3 - VAR50 (where the VAR 1-VAR50 are either 0 or 1) ... (1 Reply)
Discussion started by: JWill
1 Replies

10. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies
Login or Register to Ask a Question