Traversing member of structure of vector C++


 
Thread Tools Search this Thread
Top Forums Programming Traversing member of structure of vector C++
# 1  
Old 10-24-2014
Traversing member of structure of vector C++

Hello,
I want to loop thru a vector composed of many entries as structure, which contains sequenceID and sequence. At looping, delete any structure if the sequence is a perfect-match substring of another sequence of any other structure, so that the resulted vector contains only unique sequences.
I have difficulty for the parts to access the sequence as vector member by iterator, and to compare the sequences as structure member thru iterator.
Code:
#include <iostream>
#include <fstream>
#include <string.h>		//Needed for strtok and strcpy
#include <string>
#include <cstring>
#include <vector>
#include <map>
#include <algorithm>

/****************************************************************************
* Try to remove all the redundant sequences in a multiple fasta file
* Redundant sequences are those including any substring of perfect-match 
* from both the forward strand and the reverse complementary strand
*****************************************************************************/
using namespace std;

//Read the sequences into structure containing both strands and the length

typedef struct fasta {
    string seqID;
    string seq;
//  string *seq_rc;
    int seqLength;
} SEQ;

bool compareByLength(const SEQ & a, const SEQ & b)
{
	if 	(a.seqLength == b.seqLength) {
		return a.seq < b.seq;
} 
	else
	    return a.seqLength < b.seqLength;
//   return (a.seq).size() < (b.seq).size();                    //This is alternative way
}

int main(int argc, char *argv[])
{
    ifstream inFILE;
    inFILE.open(argv[1]);

    SEQ entry;
    vector < SEQ > seqSet;

    string entryID;
    map < string, string > FastaSeq;

    if (inFILE.fail()) {
	cout << "Error! input file failed to open!" << endl;
	cout << "Usage: ./prog fasta.file" << endl;
	return 1;
    } else {
	string line;		// Get a single line to parse
	char *sPtr;
	while (inFILE.good()) {
	    getline(inFILE, line);

	    char *sArray = new char[line.length() + 1];
	    strcpy(sArray, line.c_str());

	    if (sArray[0] == '>') {	/*For head row of each entry */
		sPtr = strtok(sArray, " ");
		entryID = sPtr + 1;	//Skip the ">" symbol

		entry.seqID = entryID;	// copy the entryID to structure entry->seqID

		continue;	//Only need the first token for sequence ID
	    }			//End of the header row  
	    else {		/* For sequence rows */

		sPtr = strtok(sArray, " ");	//Start to tokenize the line by space 

		while (sPtr != NULL) {	/* starts a single row of sequence part */
		    FastaSeq[entryID] += sPtr;
		    sPtr = strtok(NULL, " ");
		}		//Ends a single row of sequence part

		entry.seq = FastaSeq[entryID];
// cout << " ******  " << FastaSeq[entryID] << endl;
//              seq_tmp = revcomp(sPtr);
//              strcpy(entry->seq_rc, sPtr);
		entry.seqLength = strlen(FastaSeq[entryID].c_str());
// cout << entry.seqLength << endl;
seqSet.push_back(entry);	// push the new sequence info into vector

}			//Ends all the sequence part of each entry
delete[]sArray;	//Lambda? to release the memory
}			//End the file
}

sort(seqSet.begin(), seqSet.end(), compareByLength);

vector < SEQ >::iterator itr1, itr2, itr3;

for (itr1 = seqSet.begin(); itr1 < seqSet.end(); itr1++) {
for (itr2 = itr1 + 1; itr2 < seqSet.end(); itr2++) {
if ((itr2->seq).find(itr1->seq)) {                          //Line 90
//		cout << "Found substring!" << "\t\t";             //Line 90a
//		cout << itr1->seqID << "\t";	                  //Line 90b
//		cout << itr1->seq << "\t vs. \t";	          //Line 90c
//		cout << itr2->seqID << "\t";	                  //Line 90d
//		cout << (*itr2).seq << endl;	                  //Line 90e
     seqSet.erase(itr1);                                   //Line 91
	    }
	}
    }

//Printing the vector with iterator
    for (itr3 = seqSet.begin(); itr3 != seqSet.end(); itr3++) {
	cout << (*itr3).seqID << "\t";	//Or, itr3->seqID
	cout << itr3->seq << "\t";
	cout << itr3->seqLength << endl;
    }

    inFILE.close();
    return 0;
}

Debugging for a while, finally code was compiled without error!
Code:
test.fasta
>seq1
ATCGATCGATATATATATATATAT
>seq2 sub of seq1
CGATCGATATATATATAT
>seq3 sub of seq1
ATCGATATATAT
>seq4 new 
ATATATATCGATCG
>seq5 new
ATCGATCGATCGATCGTAGTCGCG
>seq6 new
ATCGATCGCGCGCGCGCGCGCGCGC
>seq7 sub of seq6
CGCGCGCGCGCGCG

Code:
  $ g++ -Wall  rm_redundantseq.cpp  
  $ ./a.out  test.fasta 
seq2	CGATCGATATATATATAT	18
seq5	ATCGATCGATCGATCGTAGTCGCG	24
seq6	ATCGATCGCGCGCGCGCGCGCGCGC	25

But I am expecting
Code:
seq1 ATCGATCGATATATATATATATAT    23
seq4 ATATATATCGATCG    14
seq5	ATCGATCGATCGATCGTAGTCGCG	    24
seq6	ATCGATCGCGCGCGCGCGCGCGCGC	25

Likely Line 90 & 91 are of problem, but not quite sure.
Two questions I have here:
1) Line 90 Find substring from masterstring, as not all the entry are substring of the others. string.find() function seems return a pointer, but there is no error here. Why?
2) Line 91 remove the vector member (structure) thru iterator , is this the right way?
Google's for a while, could not get a clear answer.
Thanks a lot!

Last edited by yifangt; 10-28-2014 at 08:05 PM.. Reason: Modifies code
# 2  
Old 10-27-2014
If I turned on Line 90a~e, I got output:
Code:
Found substring!		seq3	ATCGATATATAT	 vs. 	seq4	ATATATATCGATCG
Found substring!		seq4	ATATATATCGATCG	 vs. 	seq7	CGCGCGCGCGCGCG
Found substring!		seq7	CGCGCGCGCGCGCG	 vs. 	seq1	ATCGATCGATATATATATATATAT
Found substring!		seq7	CGCGCGCGCGCGCG	 vs. 	seq6	ATCGATCGCGCGCGCGCGCGCGCGC
Found substring!		seq1	ATCGATCGATATATATATATATAT	 vs. 	seq5	ATCGATCGATCGATCGTAGTCGCG

So there is a bug, as seq7 was duplicated !

Last edited by yifangt; 10-27-2014 at 08:48 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

traversing a string

I am writing a script which will read a word and say how many vowels and consonants does the word contain. but i dont know how to traverse a string in shell scripting. if it was in C i'd have done something like this: cout<<"plz enter the word"<<endl; cin>>word; int consonants, vowels;... (4 Replies)
Discussion started by: nishrestha
4 Replies

2. Homework & Coursework Questions

Problem while traversing directories

I was given to create a backup of all files in a given directory(command line argument) into say /home/vishal/back and the back up files must be accordingly to the extension of the file i.e pdf files are saved in back/pdf doc files back/doc etc . I gave a recursive function to traverse through the... (1 Reply)
Discussion started by: davis7son
1 Replies

3. Programming

Traversing in Array of pointers

Please find the below program. the requirement and description of the program also given: ganesh@ubuntu:~/my_programs/c/letusc/chap9$ cat fa.c.old /* Program : write a program to count the number of 'e' in thefollowing array of pointers to strings: char *s = { "We will teach you how... (12 Replies)
Discussion started by: ramkrix
12 Replies

4. Shell Programming and Scripting

Find directory name while traversing subdirectories

Hi, I have a parent directory in which I have sub directories of different depth /usr/usr1/user2/671 /usr/usr1/672 /usr/user2/user1/673 /usr/user2/user3/user4/674 And I need the names of all the directories that which starts only with 6 in a file. Thanks, (12 Replies)
Discussion started by: arun_maffy
12 Replies

5. UNIX for Dummies Questions & Answers

script for traversing directory

hi please suggest a korn script which will traverse all subdirectory in the current directory? (2 Replies)
Discussion started by: ilayans
2 Replies

6. Programming

Vector Traversing

Hi i have the following structure struct S { char Mod_num; char val; char chr_nm_cd; } I am reading a 2GB file and inserting into the structure and writing into a vector. I feel like only vector will be a right option. I tried with multimap but it is memory intensive and hence i... (1 Reply)
Discussion started by: dhanamurthy
1 Replies

7. Programming

Search attributes in one structure using the values from another structure

Hello Groups I am trying to find out ways of comparing a value from a 'c' structure to a value in another 'C' structure. the 'C' structure can be a List or liked list as it contains lot many records. if we loop it in both the structures it is going to consume time. I am looking for a simple... (3 Replies)
Discussion started by: dhanamurthy
3 Replies

8. Programming

Accesing structure member:Error:dereferencing pointer to incomplete type

$ gcc -Wall -Werror struct.c struct.c: In function `main': struct.c:18: error: dereferencing pointer to incomplete type $ cat struct.c #include <stdio.h> #include <stdlib.h> #include <string.h> /*Declaration of structure*/ struct human { char *first; char gender; int age; } man,... (3 Replies)
Discussion started by: amit4g
3 Replies

9. UNIX for Dummies Questions & Answers

Traversing a file system

I'm pretty new at this UNIX stuff, and this may be a simple question but I'm kind of stuck :confused: Let's say I have a large directory structure of .essay files, where I saved all of the essays that I did over the last few years. Not all of the .essay files are in the same directory (all... (1 Reply)
Discussion started by: hooj
1 Replies

10. UNIX for Dummies Questions & Answers

Copying a Directory Structure to a new structure

Hi all Is it possible to copy a structure of a directory only. e.g. I have a file with the following entries that is a result of a find :- /dir1/dir2/file.dbf /dir1/dir2/dir3/file1.dbf /dir1/file.dbf I want to copy these to a directory and keep the structure however starting at a new dir... (8 Replies)
Discussion started by: jhansrod
8 Replies
Login or Register to Ask a Question