Sponsored Content
Top Forums Programming Merge two strings by overlapped region Post 302895628 by yifangt on Tuesday 1st of April 2014 01:52:10 PM
Old 04-01-2014
Merge two strings by overlapped region

Hello, I am trying to concatenate two strings by merging the overlapped region. E.g.
Code:
Seq1=ACGTGCCC
Seq2=CCCCCGTGTGTGT
Seq_merged=ACGTGCCCCCGTGTGTGT

Function strcat(char *dest, char *src) appends the src string to the dest string, ignoring the overlapped parts (prefix of src and suffix of dest). Googled for a while, this seems to be related to longest common substring computing, which is a too big question for me.
I have tried following code, but always got an error: Seq_merged=ACGTGCCCCCCGTGTGTGT, which has an exra "C". What did I miss?
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXLEN 4096

//strmerg was from: http://effprog.wordpress.com/2010/11/18/concatenation-of-two-strings-omitting-overlapping-string/
char *strmerg(char *dst, const char *src)
{
    size_t dstLen = strlen(dst);
    size_t srcLen = strlen(src);

    char *p = dst + dstLen + srcLen;            /* Pointer to the end of the concatenated string */
    const char *q = src + srcLen - 1;            /* Pointer to the last character of the src */
    char *r = dst + dstLen - 1;                    /* Temp Pointer to the last character of the dst */
    char *end = r;                                /* Permanent Pointer to the last character of the dst */
    *p = '\0';                                    /*terminating the concatened string with NULL character */

    while (q >= src) 
{        /*Copy src in reverse */
    if (*r == *q) {                                /*Till it matches with the src, decrement r */
        r--;
    } else {
        r = end;
        if (*r == *q) {
        r--;
        }
    }

    *p-- = *q--;
    }

    while (r >= dst)                            /*Copy dst, ending with r */
    *p-- = *r--;

    return p + 1;
}

int main(int argc, char **argv)
{
    char *str1, *str2;        //Original two strings
    char *str3;                //resulting string

    str1 = malloc(sizeof(char) * MAXLEN);    //allocate memory
    str2 = malloc(sizeof(char) * MAXLEN);    //allocate memory

    str3 = malloc(sizeof(char) * MAXLEN * 2);    //allocate memory, maximum space needed is the sum of the two original string lengths

    if (argc != 3) {
    printf("Error! \nUsage: ./arg[0]=program argv[1]=string1 argv[2]=string2\n");
    exit(EXIT_FAILURE);
    }

    strcpy(str1, argv[1]);
    strcpy(str2, argv[2]);

    printf("Input strings are: \nSeq1=%s\nSeq2=%s\n", str1, str2);

    str3=strmerg(str1, str2);
    printf("\nConcatenated string is: Seq_merged=%s\n", str3);
/*Some problem with these free(), do not know why?
free(str1);
free(str2);
free(str3);
*/
    return 0;
}

I tried more cases, it seems the problem comes if the overlapping region is repetitive.
Code:
./prog ACGTGCCC CCCCCGTGTGTGT 
Seq1=ACGTGCCC
Seq2=CCCCCGTGTGTGT 
Seq_merged=ACGTGCCCCCCGTGTGTGT 
./prog ACGTGatcg atcgCCGTGTGTGT
Seq1= ACGTGatcg
Seq2= atcgCCGTGTGTGT
Seq_merged=ACGTGatcgCCGTGTGTGT
./prog ACGTGatatat atatCCGTGTGTGT
Seq1=ACGTGatatat
Seq2=atatCCGTGTGTGT
Seq_merged=ACGTGatatatatatCCGTGTGTGT

Can anyone have a look at it for me? Thanks a lot!

Last edited by yifangt; 04-01-2014 at 04:29 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

stack region

how can i determine that what percentage of stack region is currently is used? (i am using tru64 unix) (2 Replies)
Discussion started by: yakari
2 Replies

2. UNIX for Dummies Questions & Answers

Merge two strings not from files

str1="this oracle data base record" str2="one two three four five" Output: this one oracle two data three base four record five str1 and str2 have the same column but they are not fixed columns. I can do it with "paste" but I do not want to create file everytime the script runs from... (2 Replies)
Discussion started by: buddyme
2 Replies

3. UNIX for Advanced & Expert Users

Best practice - determining what region you are on

Hello all, I have a question about what you think the best practice is to determine what region you are running on when you have a system setup with a DEV/TEST, QA, and PROD regions running the same scripts in all. So, when you run in DEV, you have a different directory structure, and you... (4 Replies)
Discussion started by: Rediranch
4 Replies

4. Shell Programming and Scripting

Region between lines

How can I find the regions between specific lines? I have a file which contains lines like this: chr1 0 17388 0 chr1 17388 17444 1 chr1 17444 17599 2 chr1 17599 17601 1 chr1 17601 569791 0 chr1 569791 569795 1 chr1 569795 569808 2 chr1 569808 569890 3 chr1 569890 570047 4 ... (9 Replies)
Discussion started by: linseyr
9 Replies

5. UNIX for Dummies Questions & Answers

overlapped genomic coordinates

Hi, I would like to know how can I get the ID of a feature if its genomic coordinates overlap the coordinates of another file. Example: Get the 4th column (ID) of this file1: chr1 10 100 gene1 chr2 3000 5000 gene2 chr3 200 1500 gene3 if it overlaps with a feature in this file2: chr2... (1 Reply)
Discussion started by: fadista
1 Replies

6. AIX

Change lv REGION in HDISK1

Dears my rootvg is missed up i can not extend the /opt as soon as i try to extend the Filesystem its give me that there is not enough space . as there any way to change the REGION of the LVs in HDISK1 ? lspv -p hdisk0 hdisk0: PP RANGE STATE REGION LV NAME TYPE ... (8 Replies)
Discussion started by: thecobra151
8 Replies

7. Shell Programming and Scripting

Merge left hand strings mapping to different right hand strings

Hello, I am working on an Urdu to Hindi dictionary which has the following structure: a=b a=c n=d n=q and so on. i.e. Headword separated from gloss by a = I am giving below a live sample بتا=बता بتا=बित्ता بتا=बुत्ता بتان=बतान بتان=बितान بتانا=बिताना I need the following... (3 Replies)
Discussion started by: gimley
3 Replies

8. Programming

Perl script to merge cells in column1 which has same strings, for all sheets in a excel workbook

Perl script to merge cells ---------- Post updated at 12:59 AM ---------- Previous update was at 12:54 AM ---------- I am using below code to read files from a dir and print to excel. open(my $in, '<', $file) or die "Could not open file: $!"; my $rowCount = 0; my $colCount = 0;... (11 Replies)
Discussion started by: Jack_Bruce
11 Replies

9. Shell Programming and Scripting

Merge strings from a file into a template

I am preparing a morphological grammar of Marathi to be placed in open-source. I have two files. The first file called Adverbs contains a whole list of words, one word per line A sample is given below: आधी इतक इतपत उलट एवढ ऐवजी कड कडनं कडल कडील कडून कडे करता करिता खाल (2 Replies)
Discussion started by: gimley
2 Replies

10. Shell Programming and Scripting

Merge strings with ignore case

I have a bi-lingual database of a large number of dictionaries. It so happens that in some a given string is in upper case and in others it is in lower case. An example will illustrate the issue. toll Tax=पथ-कर Toll tax=राहदारी कर toll tax=टोल I want to treat all three instances of toll tax... (3 Replies)
Discussion started by: gimley
3 Replies
All times are GMT -4. The time now is 10:44 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy