Joining ends of strings in certain order with repeated ID's


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Joining ends of strings in certain order with repeated ID's
# 1  
Old 03-24-2014
Joining ends of strings in certain order with repeated ID's

I posted this a few days ago and got some help (Putting together substrings if pattern is matched - Page 2 | Unix Linux Forums | Shell Programming and Scripting)

But I am now stuck on an issue that is similar but not the same really. I want to join parts of one line with parts of another line that share the same ID. The ID can repeat many times. Below I show an example of 4 repeats of the same ID. The combinations I want to generate are the following:

Code:
end of 4 with beginning of 4
            with beginning of 3
            with beginning of 2
end of 3 with beginning of 3
            with beginning of 2
end of 2 with beginning of 2
end of 1 with beginning of 1

The substrings that should be joined should be 9 characters in length. Here are the 4 repeated ID's as input:

Code:
%dog  aaaaaaaaaaAAAAAAAAA
%dog  bbbbbbbbbbBBBBBBBBB
%dog  cccccccccccCCCCCCCCC
%dog  xxxxxxxxxxXXXXXXXXX

the output should look like this:

Code:
%dog  XXXXXXXXXxxxxxxxxx
%dog  XXXXXXXXXccccccccc
%dog  XXXXXXXXXbbbbbbbbb
%dog  CCCCCCCCCccccccccc
%dog  CCCCCCCCCbbbbbbbbb
%dog  BBBBBBBBBbbbbbbbbb
%dog  AAAAAAAAAaaaaaaaa


How can this be done?
# 2  
Old 03-24-2014
Does the file just a few records or hundreds/thousands or more records?

If it has hundreds/thousands or more records, Is this the correct rules for outputting the new file?

Code:
f1    f2        f3
%dog  aaaaaaaaaaAAAAAAAAA
%dog  bbbbbbbbbbBBBBBBBBB
%dog  cccccccccccCCCCCCCC
%dog  xxxxxxxxxxXXXXXXXXX

if f3 = XXXXXXXXX
 then f1, f2 from record where f2 = xxxxxxxxx or ccccccccc or bbbbbbbbb
 then f3

if f3 = CCCCCCCCC
 then f1, f2 from record where f2 = ccccccccc or bbbbbbbbb
 then f3

if f3 = BBBBBBBBB
 then f1, f2 from record where f2 = bbbbbbbbb
 then f3

if f3 = AAAAAAAAA
 then f1, f2 from record where f2 = aaaaaaaaaa
 then f3

# 3  
Old 03-24-2014
hi spacebar, the rules you wrote out are correct. I should just mention that the file only has two columns (ID and string column). I made the last segments of the strings capital letters for clarity in the post but the strings are really all lowercase.

I should also mention that the ID can repeat more than 4 times, and that there exists multiple ID's, each with varying numbers of repetitions.
# 4  
Old 03-24-2014
Does the ID always have "whitespace" following it?
How long is "string" field, always 19?
The last part of the string will always be 9 positions and the first part 10?
# 5  
Old 03-24-2014
after the ID is a tab space. The string field varies in length, but the parts that should be joined should just be 9 character substrings from either the beginning or the end. In total, each output string should be 18 characters long
# 6  
Old 03-25-2014
Can you post a sample of the input file and the output desired because from your post it aint clear what exactly you want to do...
# 7  
Old 03-25-2014
Here's a sample input for an example of 3 occurrences. I want to generate specific combinations of a 9 character substring based on the last occurrence of the ID %dog (line 3) with the beginning substring of the second occurrence of the ID.

input:
Code:
%dog  CATCAToooooooooooooooDOGDOG
%dog  HIHIoooooooooooooooBYEBYE
%dog  hellooooooooooooooooGOGOGO

output:
Code:
%dog   oooGOGOGOHIHIooooo

In this example, I show what the output looks like for 1 of the rules. Notice how the last 9 character substring from the third line is now in the beginning of column 2 and it is followed by the first 9 character substring from the second occurrence of the line with the ID %dog. Another possibility exists and that is to swap the beginning and end substrings of the last line (line 3) with itself. And also to swap the last substring with the beginning substring of the first line with itself.

output:
Code:
%dog   oooGOGOGOhellooooo
%dog   oooDOGDOGCATCATooo


The final output for this example should have three lines, based on the following rules:

Code:
1) last substring of line 3 with beginning substring of line 3
2) last substring of line 3 with beginning substring of line 2
3) last substring of line 1 with beginning substring of line 1

final output:
Code:
%dog   oooGOGOGOhellooooo
%dog   oooGOGOGOHIHIooooo
%dog   oooDOGDOGCATCATooo

Does this help?
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help tabulating file putting repeated strings as headers

Hi. May somebody help me with this. Im trying to tabulate the following input file, but the desired output Im getting is incorrect. I have access to GNU/LINUX (Ubuntu) and Cygwin Input file STAGE = 1 ID = 0 NAME = JFMSC TYPE = MLRR DFRUL = PERMISSION ADDR = 1001 RRUL =... (10 Replies)
Discussion started by: Ophiuchus
10 Replies

2. Shell Programming and Scripting

How to find a file which are not ends with ".zip" and which are ends with "*.log*" or "*.out*"?

I am new to bash/shell scripting. I want to find all the files in directory and subdirectories, which are not ends with “.zip” and which are contains in the file name “*.log*” or “*.out*”. I know below command to get the files which ends with “.log”; but I need which are not ends with this... (4 Replies)
Discussion started by: Mallikgm
4 Replies

3. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

4. UNIX for Dummies Questions & Answers

Strings in ascending order

Hi, I have a sequence which has 30000 strings which looks like this >string2991 234445 >string224 470561 >string121 675386 >string4098 177229 >string8049 255838 >string8 672382 >string1115 578415 I want it to be arranged in ascending order >string8 672382 >string121... (5 Replies)
Discussion started by: siya@
5 Replies

5. Shell Programming and Scripting

Sorting strings in reverse order

Hello, I have a large database of words and would like them sorted in reverse order i.e. from the end up. An example will make this clear: I have tried to write a program in Perl which basically takes the string from the end and tries to sort from that end but it does not seem... (5 Replies)
Discussion started by: gimley
5 Replies

6. Programming

Sorting a vector of strings into numerical order.

I have a vector of strings that contain a list of channels like this: 101,99,22HD,432,300HD I have tried using the sort routine like this: sort(mychans.begin(),mychans.end()); For some reason my channels are not being sorted at all. I was hoping someone might have some input that might... (2 Replies)
Discussion started by: sepoto
2 Replies

7. Shell Programming and Scripting

delete repeated strings (tags) in a line and concatenate corresponding words

Hello friends! Each line of my input file has this format: word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the... (2 Replies)
Discussion started by: mjomba
2 Replies

8. Shell Programming and Scripting

Deleting repeated strings in column 2

Hi to all, I have a file where the subject could contain "Summarized Availability Report" or only "Summarized Report" If the subject is "Summarized Availability Report" I want to apply it Scrip1 and if the subject is "Summarized Report" I want to apply it Scrip2. 1-) I would like you... (5 Replies)
Discussion started by: cgkmal
5 Replies

9. Shell Programming and Scripting

compare strings, words in different order

Hi, Would anyone know how to compare two strings, and only throw an error if there were different words, not that the same words were in a different order? e.g "A B C" vs "B C A" ->OK "A B C" vs "A D C" -> BAD Thanks! (2 Replies)
Discussion started by: rebelbuttmunch
2 Replies

10. Shell Programming and Scripting

shell program for sorting strings in an alphabetical order

Hi, I trying to find the solution for writing the programming in unix by shell programming for sorting thr string in alphabetical order. I getting diffculty in that ,, so i want to find out the solution for that Please do needful Thanks Bhagyesh (1 Reply)
Discussion started by: bp_vanarse
1 Replies
Login or Register to Ask a Question