Sponsored Content
Top Forums Programming Merge two strings by overlapped region Post 302896189 by Don Cragun on Saturday 5th of April 2014 05:30:33 AM
Old 04-05-2014
Here is a strmerg() similar to Corona688's. But, where his version checks for a match for every trailing substring of string 1; this version starts with the longest possible match based on the length of the shorter string and stops as soon as it finds a match. It also verifies that malloc() succeeded before copying data into the buffer allocated by malloc(). With multi-megabyte input strings of similar length and short matches, the speed difference is likely to be unnoticeable. But if there are frequent relatively long matching strings or if string 1 is longer than string 2, this version might be significantly faster.

This won't work on some older systems, because it uses <inttypes.h> to produce relatively portable printf() conversion specifications for objects of type size_t. And, Corona688's code makes much better use of the const qualifier, than this code does.

Code:
#include <inttypes.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *
strmerg(const char *str1, const char *str2) {
	size_t	str1_l = strlen(str1),	// length of string1
		str2_l = strlen(str2),	// length of string2
		// match_l is longest possible match (length of shortest input)
		match_l = str1_l < str2_l ? str1_l : str2_l;
		// base is ptr to start of longest possible match in string1
	char	*base = (char *)str1 + str1_l - match_l,
		c1,		// 1st char in string2 possible match
		c2 = *str2,	// 1st char in string2
		*out;
	// At this point, match_l is the longest possible match (based on the
	// lengths of string1 and string2) and base points to the spot in
	// string1 where the longest possible match could start.

	// While there is a mismatch...
	while((c1 = *base) && ((c1 != c2) || strncmp(base, str2, match_l))) {
		// Decrement the longest possible match, and increment the
		// spot in string1 where the longest possible match could
		// still start.  Note that even though we aren't checking
		// the value of match_l in this loop, we are guaranteed that
		// the loop will end with match_l greater than or equal to
		// zero.  (If it is zero, the only match is the terminating
		// null byte in string1.)
		match_l--;
		base++;
	}
	// When we get to this point, match_l is the length of the longest
	// substring of the tail of string1 that matches the head of string2.

	// Allocate space to hold the resulting string:	length of string1 +
	// length of string2 - length of matched substring + 1 for the
	// terminating null byte.  NOTE:  The caller is responsible for
	// freeing the allocated space when it is no longer needed.
	out = malloc(str1_l + str2_l - match_l + 1);
	if(out) {
		// Copy string1 and unmatched portion of string2 to out.
		strcpy(out, str1);
		strcpy(out + str1_l, str2 + match_l);
	}
	return(out);
}

int
main(int argc, char *argv[]) {
	char *result;	// Space for the pointer that strmerg() will return

	if (argc != 3) {
		fprintf(stderr, "Error! Bad arg count\nUsage: %s %s\n",
			argv[0], "string1 string2");
		exit(EXIT_FAILURE);
	}

	printf("string1: \"%s\" (%"PRIuMAX" bytes)\n",
		argv[1], (intmax_t)strlen(argv[1]));
	printf("string2: \"%s\" (%"PRIuMAX" bytes)\n",
		argv[2], (intmax_t)strlen(argv[2]));

	result = strmerg(argv[1], argv[2]);
	if(result == NULL) {
		perror("strmerg() failed:");
		exit(EXIT_FAILURE);
	}
	printf("strmerg(string1, string2) output: \"%s\" (%"PRIuMAX" bytes)\n",
		result, (intmax_t)strlen(result));
	free(result);
	result = strmerg(argv[2], argv[1]);
	if(result == NULL) {
		perror("strmerg() failed:");
		exit(EXIT_FAILURE);
	}
	printf("strmerg(string2, string1) output: \"%s\" (%"PRIuMAX" bytes)\n",
		result, (intmax_t)strlen(result));
	free(result);
	return(0);
}

When compiled and linked into a.out, the command:
Code:
./a.out ACGTGCCC CCCCCGTGTGTGT

produces:
Code:
string1: "ACGTGCCC" (8 bytes)
string2: "CCCCCGTGTGTGT" (13 bytes)
strmerg(string1, string2) output: "ACGTGCCCCCGTGTGTGT" (18 bytes)
strmerg(string2, string1) output: "CCCCCGTGTGTGTACGTGCCC" (21 bytes)

and the command:
Code:
./a.out aaaaaaaaaa1 1aaaaaaaaaaaa

produces the output:
Code:
string1: "aaaaaaaaaa1" (11 bytes)
string2: "1aaaaaaaaaaaa" (13 bytes)
strmerg(string1, string2) output: "aaaaaaaaaa1aaaaaaaaaaaa" (23 bytes)
strmerg(string2, string1) output: "1aaaaaaaaaaaa1" (14 bytes)

These 2 Users Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

stack region

how can i determine that what percentage of stack region is currently is used? (i am using tru64 unix) (2 Replies)
Discussion started by: yakari
2 Replies

2. UNIX for Dummies Questions & Answers

Merge two strings not from files

str1="this oracle data base record" str2="one two three four five" Output: this one oracle two data three base four record five str1 and str2 have the same column but they are not fixed columns. I can do it with "paste" but I do not want to create file everytime the script runs from... (2 Replies)
Discussion started by: buddyme
2 Replies

3. UNIX for Advanced & Expert Users

Best practice - determining what region you are on

Hello all, I have a question about what you think the best practice is to determine what region you are running on when you have a system setup with a DEV/TEST, QA, and PROD regions running the same scripts in all. So, when you run in DEV, you have a different directory structure, and you... (4 Replies)
Discussion started by: Rediranch
4 Replies

4. Shell Programming and Scripting

Region between lines

How can I find the regions between specific lines? I have a file which contains lines like this: chr1 0 17388 0 chr1 17388 17444 1 chr1 17444 17599 2 chr1 17599 17601 1 chr1 17601 569791 0 chr1 569791 569795 1 chr1 569795 569808 2 chr1 569808 569890 3 chr1 569890 570047 4 ... (9 Replies)
Discussion started by: linseyr
9 Replies

5. UNIX for Dummies Questions & Answers

overlapped genomic coordinates

Hi, I would like to know how can I get the ID of a feature if its genomic coordinates overlap the coordinates of another file. Example: Get the 4th column (ID) of this file1: chr1 10 100 gene1 chr2 3000 5000 gene2 chr3 200 1500 gene3 if it overlaps with a feature in this file2: chr2... (1 Reply)
Discussion started by: fadista
1 Replies

6. AIX

Change lv REGION in HDISK1

Dears my rootvg is missed up i can not extend the /opt as soon as i try to extend the Filesystem its give me that there is not enough space . as there any way to change the REGION of the LVs in HDISK1 ? lspv -p hdisk0 hdisk0: PP RANGE STATE REGION LV NAME TYPE ... (8 Replies)
Discussion started by: thecobra151
8 Replies

7. Shell Programming and Scripting

Merge left hand strings mapping to different right hand strings

Hello, I am working on an Urdu to Hindi dictionary which has the following structure: a=b a=c n=d n=q and so on. i.e. Headword separated from gloss by a = I am giving below a live sample بتا=बता بتا=बित्ता بتا=बुत्ता بتان=बतान بتان=बितान بتانا=बिताना I need the following... (3 Replies)
Discussion started by: gimley
3 Replies

8. Programming

Perl script to merge cells in column1 which has same strings, for all sheets in a excel workbook

Perl script to merge cells ---------- Post updated at 12:59 AM ---------- Previous update was at 12:54 AM ---------- I am using below code to read files from a dir and print to excel. open(my $in, '<', $file) or die "Could not open file: $!"; my $rowCount = 0; my $colCount = 0;... (11 Replies)
Discussion started by: Jack_Bruce
11 Replies

9. Shell Programming and Scripting

Merge strings from a file into a template

I am preparing a morphological grammar of Marathi to be placed in open-source. I have two files. The first file called Adverbs contains a whole list of words, one word per line A sample is given below: आधी इतक इतपत उलट एवढ ऐवजी कड कडनं कडल कडील कडून कडे करता करिता खाल (2 Replies)
Discussion started by: gimley
2 Replies

10. Shell Programming and Scripting

Merge strings with ignore case

I have a bi-lingual database of a large number of dictionaries. It so happens that in some a given string is in upper case and in others it is in lower case. An example will illustrate the issue. toll Tax=पथ-कर Toll tax=राहदारी कर toll tax=टोल I want to treat all three instances of toll tax... (3 Replies)
Discussion started by: gimley
3 Replies
All times are GMT -4. The time now is 04:19 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy