Merge two strings by overlapped region


 
Thread Tools Search this Thread
Top Forums Programming Merge two strings by overlapped region
# 15  
Old 04-02-2014
When I saw your answers, I did not know what I should reply, but my feeling is a mixture of embarrassment (so little I know about C), disappointment (buggy code), discourage (so much to learn) and maybe, hopelessness (not working code). I am so far away from catching the spirits of C language as compared with your catch!
From your code I need go back the bitwise and other stuffs, not my string merge anymore. I wish I could have any comments with my code corrected side by side, so I know the correct way in that situation.
Thank you very much for your time and effort, Corona688!
Yifangt
# 16  
Old 04-02-2014
Quote:
Originally Posted by yifangt
From your code I need go back the bitwise and other stuffs
Don't worry, you can ignore everything but main() in my code like I said. They're not relevant to your problem -- they're for debugging, they print memory, I whipped them up to make those charts.

If you can start using 'char * const', that would really help I think. The compiler would catch that mistake -- just wouldn't let you do it. (Which is mostly what const is for, FYI -- a label to inform the programmer what they can and cannot do to a variable.)
Quote:
I wish I could have any comments with my code corrected side by side, so I know the correct way in that situation.
You post such large programs that fixing them pretty much means rewriting them. I can update the first two lines, sure -- but if all the lines after it are written on false assumptions, that's not much help.

It'd help us if you showed your entire program again when you made changes.

Last edited by Corona688; 04-02-2014 at 07:21 PM..
This User Gave Thanks to Corona688 For This Post:
# 17  
Old 04-02-2014
I wonder if learning assembly language would help. The way pointers work is mostly because of how the CPU works. Seeing that can be revealing.
# 18  
Old 04-02-2014
What I wanted is to concatenate two strings by joining the overlapping suffix of string1 and prefix of string2. The overlapping region is merged. Here is the entire program.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXLEN 4096                            //Line 5, at this moment no string longer than 4096 is allowed.

//strmerg was from: http://effprog.wordpress.com/2010/11/18/concatenation-of-two-strings-omitting-overlapping-string/
char *strmerg(char *dst, char *src)                    //Line 8
{
    size_t dstLen = strlen(dst);
    size_t srcLen = strlen(src);

    char *p = dst + dstLen + srcLen;            /* Line 13, Pointer to the end of the concatenated string */
    char *q = src + srcLen - 1;                    /* Line 14, Pointer to the last character of the src */
    char *r = dst + dstLen - 1;                    /* Line 15, Temp Pointer to the last character of the dst */
    char *end = r;                                /* Line 16, Permanent Pointer to the last character of the dst */
    *p = '\0';                                    /*terminating the concatened string with NULL character */

    while (q >= src) 
{                                                /*Copy src in reverse */
    if (*r == *q) {                                /*Till it matches with the src, decrement r */
        r--;
    } else {
        r = end;
        if (*r == *q) {
        r--;
        }
    }

    *p-- = *q--;
    }

    while (r >= dst)                            /*Copy dst, ending with r */
    *p-- = *r--;

    return p + 1;                                //pointer of string, i.e. the start of string
}

int main(int argc, char **argv)                    //Line 39
{
    const char *str1 = argv[1];                 //Line 41, Original two strings
    const char *str2 = argv[2];                    //Line 42, Original two strings
    char *str3;                                    //Line 43, resulting string

    str1 = malloc(sizeof(char) * MAXLEN);        //Line 45, allocate memory
    str2 = malloc(sizeof(char) * MAXLEN);        //Line 46, allocate memory

    if (argc != 3) {
    printf("Error! \nUsage: ./arg[0]=program argv[1]=string1 argv[2]=string2\n");
    exit(EXIT_FAILURE);
    }

    strcpy(str1, argv[1]);                        //Line 53, 
    strcpy(str2, argv[2]);                        //Line 54, 

    str3=strmerg(str1, str2);                    //Line 56,
    
    printf("Input strings are: \nSeq1=%s\nSeq2=%s\n", str1, str2);
    printf("\nConcatenated string is: Seq_merged=%s\n", str3);
    printf("\nConcatenated string is: Seq_merged=%s\n", strmerg(str1, str2));
    return 0;
}

Two questions:
1a) Syntax related: accroding to the replies, what are the right way for Lines 8, 41, 42, & 43 (and probably within the strmerg() function, Lines 13, 14, 15 may need change too)?
Code:
./prog ACGTGatatat  atatGTGTGTGT  
Input strings are: 
Seq1=ACGTGACGTGatatatGTGTGTGT       //Extra ACGTG
Seq2=atatGTGTGTGT

Concatenated string is: Seq_merged=ACGTGatatatGTGTGTGT
Concatenated string is: Seq_merged=ACGTGACGTGatatatGTGTGTGT           //Extra ACGTG

1b)Similar to 1a) Syntax related: Lines 53, 54, & 56 that should be changed coordingly, especially Line 56;
2) Algorithm related: If the repetitive overlapping region in src is longer than those in dest, merged string is NOT correct!
Code:
./prog ACGTGatat   atatatGTGTGTGT  
Input strings are: 
Seq1=ACGTGACGTGatatatGTGTGTGT       //Extra ACGTG
Seq2=atatGTGTGTGT

Concatenated string is: Seq_merged=ACGTGatatatatGTGTGTGT
Concatenated string is: Seq_merged=ACGTGACGTGatatatatGTGTGTGT           //Extra ACGTG

This is a bug with the strmerg() function.
The extra ACGTG part in seq1 seems related to the const char * str1 delcaration, but I am not sure.

Thank you again!
# 19  
Old 04-03-2014
I repeat: You've written your program around assumptions which don't hold water -- like the very idea of "str3" being a separate entity from "str1" and "str2" just because you threw pointers to the same memory into a different function. You're still modifying the memory pointed to by str1 whenever you do that!

Also, you are calling strmerge twice before you print its output -- considering str1 gets modified each time, I'm not surprised its output is doubly strange.

I suggest rewriting it from scratch with this new knowledge in mind, using that pointer suggestion I told you about so you're forced to break these habits.

Last edited by Corona688; 04-03-2014 at 01:40 AM..
# 20  
Old 04-03-2014
Since all you need to reconstruct the last try is the length of str1 you used, it is easy to reconstruct the winner even though you need to go too far to find the limit. You might even use bisection to find the right number! If str1 has 6 bytes, try 3, then 1 or 5, then 0, 2, 4, 6 to find the highest substring that works. Use the shorter length of str1 and str2, as the max overlap is that length and the min is always 0.

I helps in C to know what is going on, at least in a sensible model. Imagine a char* in a 32 bit system is a 4 byte unsigned integer offset from the bottom of memory. Values in the environment are in the heap below free memory. The heap grows up like stalagmites. Subroutine arguments and auto variables are on the stack at the top of memory growing down like stalactites. When you call, automatics may have whatever old ram content in them, so write before you read. When you exit a subroutine, the stack pointer rises above the automatics and passed variables, and the space is reused/overwritten on the next call. When you create a static or global variable, the compiler/linker allocates it on the heap. When you malloc, that is on the heap, too. Since you can free, someone has to keep track of the holes and try to reuse them. Fancy items like structs and such may be allocated on a mod 4 or 8 address, for speed. Some CPUs need aligned variables -- 4 byte integers have to have mod-4 addresses. So, space can be wasted when little and big items are mixed. If you mmap() a file region, address space is allocated, but rather than setting it up to swap to swap space, it is tied to the file. Some people do not like calling everything at the bottom 'the heap'. Dynamically linked code is mmap'd into several places. Itmight have initialized variables and constants with it, which are put in different areas, sometimes because code is on executable pages but data may be not executable, even not writable. If you run a command under truss, tusc or strace, you can see all this going on -- very educational.
This User Gave Thanks to DGPickett For This Post:
# 21  
Old 04-03-2014
Thanks DGPickett!
That's too comprehensive to integrate your idea into my code. What I lack is the practice, and I was wondering if anyone can correct my code. I like to learn by example, and not sure I should take some theory courses, if that would help.
For sure I have not yet caught the whole picture of "stack" and "heap" in theory, blank with them when handle situations in real C code.
Thank you again.
yifangt
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge strings with ignore case

I have a bi-lingual database of a large number of dictionaries. It so happens that in some a given string is in upper case and in others it is in lower case. An example will illustrate the issue. toll Tax=पथ-कर Toll tax=राहदारी कर toll tax=टोल I want to treat all three instances of toll tax... (3 Replies)
Discussion started by: gimley
3 Replies

2. Shell Programming and Scripting

Merge strings from a file into a template

I am preparing a morphological grammar of Marathi to be placed in open-source. I have two files. The first file called Adverbs contains a whole list of words, one word per line A sample is given below: आधी इतक इतपत उलट एवढ ऐवजी कड कडनं कडल कडील कडून कडे करता करिता खाल (2 Replies)
Discussion started by: gimley
2 Replies

3. Programming

Perl script to merge cells in column1 which has same strings, for all sheets in a excel workbook

Perl script to merge cells ---------- Post updated at 12:59 AM ---------- Previous update was at 12:54 AM ---------- I am using below code to read files from a dir and print to excel. open(my $in, '<', $file) or die "Could not open file: $!"; my $rowCount = 0; my $colCount = 0;... (11 Replies)
Discussion started by: Jack_Bruce
11 Replies

4. Shell Programming and Scripting

Merge left hand strings mapping to different right hand strings

Hello, I am working on an Urdu to Hindi dictionary which has the following structure: a=b a=c n=d n=q and so on. i.e. Headword separated from gloss by a = I am giving below a live sample بتا=बता بتا=बित्ता بتا=बुत्ता بتان=बतान بتان=बितान بتانا=बिताना I need the following... (3 Replies)
Discussion started by: gimley
3 Replies

5. AIX

Change lv REGION in HDISK1

Dears my rootvg is missed up i can not extend the /opt as soon as i try to extend the Filesystem its give me that there is not enough space . as there any way to change the REGION of the LVs in HDISK1 ? lspv -p hdisk0 hdisk0: PP RANGE STATE REGION LV NAME TYPE ... (8 Replies)
Discussion started by: thecobra151
8 Replies

6. UNIX for Dummies Questions & Answers

overlapped genomic coordinates

Hi, I would like to know how can I get the ID of a feature if its genomic coordinates overlap the coordinates of another file. Example: Get the 4th column (ID) of this file1: chr1 10 100 gene1 chr2 3000 5000 gene2 chr3 200 1500 gene3 if it overlaps with a feature in this file2: chr2... (1 Reply)
Discussion started by: fadista
1 Replies

7. Shell Programming and Scripting

Region between lines

How can I find the regions between specific lines? I have a file which contains lines like this: chr1 0 17388 0 chr1 17388 17444 1 chr1 17444 17599 2 chr1 17599 17601 1 chr1 17601 569791 0 chr1 569791 569795 1 chr1 569795 569808 2 chr1 569808 569890 3 chr1 569890 570047 4 ... (9 Replies)
Discussion started by: linseyr
9 Replies

8. UNIX for Advanced & Expert Users

Best practice - determining what region you are on

Hello all, I have a question about what you think the best practice is to determine what region you are running on when you have a system setup with a DEV/TEST, QA, and PROD regions running the same scripts in all. So, when you run in DEV, you have a different directory structure, and you... (4 Replies)
Discussion started by: Rediranch
4 Replies

9. UNIX for Dummies Questions & Answers

Merge two strings not from files

str1="this oracle data base record" str2="one two three four five" Output: this one oracle two data three base four record five str1 and str2 have the same column but they are not fixed columns. I can do it with "paste" but I do not want to create file everytime the script runs from... (2 Replies)
Discussion started by: buddyme
2 Replies

10. UNIX for Advanced & Expert Users

stack region

how can i determine that what percentage of stack region is currently is used? (i am using tru64 unix) (2 Replies)
Discussion started by: yakari
2 Replies
Login or Register to Ask a Question