Sponsored Content
Top Forums Programming Traversing member of structure of vector C++ Post 302922429 by yifangt on Friday 24th of October 2014 01:55:33 PM
Old 10-24-2014
Traversing member of structure of vector C++

Hello,
I want to loop thru a vector composed of many entries as structure, which contains sequenceID and sequence. At looping, delete any structure if the sequence is a perfect-match substring of another sequence of any other structure, so that the resulted vector contains only unique sequences.
I have difficulty for the parts to access the sequence as vector member by iterator, and to compare the sequences as structure member thru iterator.
Code:
#include <iostream>
#include <fstream>
#include <string.h>		//Needed for strtok and strcpy
#include <string>
#include <cstring>
#include <vector>
#include <map>
#include <algorithm>

/****************************************************************************
* Try to remove all the redundant sequences in a multiple fasta file
* Redundant sequences are those including any substring of perfect-match 
* from both the forward strand and the reverse complementary strand
*****************************************************************************/
using namespace std;

//Read the sequences into structure containing both strands and the length

typedef struct fasta {
    string seqID;
    string seq;
//  string *seq_rc;
    int seqLength;
} SEQ;

bool compareByLength(const SEQ & a, const SEQ & b)
{
	if 	(a.seqLength == b.seqLength) {
		return a.seq < b.seq;
} 
	else
	    return a.seqLength < b.seqLength;
//   return (a.seq).size() < (b.seq).size();                    //This is alternative way
}

int main(int argc, char *argv[])
{
    ifstream inFILE;
    inFILE.open(argv[1]);

    SEQ entry;
    vector < SEQ > seqSet;

    string entryID;
    map < string, string > FastaSeq;

    if (inFILE.fail()) {
	cout << "Error! input file failed to open!" << endl;
	cout << "Usage: ./prog fasta.file" << endl;
	return 1;
    } else {
	string line;		// Get a single line to parse
	char *sPtr;
	while (inFILE.good()) {
	    getline(inFILE, line);

	    char *sArray = new char[line.length() + 1];
	    strcpy(sArray, line.c_str());

	    if (sArray[0] == '>') {	/*For head row of each entry */
		sPtr = strtok(sArray, " ");
		entryID = sPtr + 1;	//Skip the ">" symbol

		entry.seqID = entryID;	// copy the entryID to structure entry->seqID

		continue;	//Only need the first token for sequence ID
	    }			//End of the header row  
	    else {		/* For sequence rows */

		sPtr = strtok(sArray, " ");	//Start to tokenize the line by space 

		while (sPtr != NULL) {	/* starts a single row of sequence part */
		    FastaSeq[entryID] += sPtr;
		    sPtr = strtok(NULL, " ");
		}		//Ends a single row of sequence part

		entry.seq = FastaSeq[entryID];
// cout << " ******  " << FastaSeq[entryID] << endl;
//              seq_tmp = revcomp(sPtr);
//              strcpy(entry->seq_rc, sPtr);
		entry.seqLength = strlen(FastaSeq[entryID].c_str());
// cout << entry.seqLength << endl;
seqSet.push_back(entry);	// push the new sequence info into vector

}			//Ends all the sequence part of each entry
delete[]sArray;	//Lambda? to release the memory
}			//End the file
}

sort(seqSet.begin(), seqSet.end(), compareByLength);

vector < SEQ >::iterator itr1, itr2, itr3;

for (itr1 = seqSet.begin(); itr1 < seqSet.end(); itr1++) {
for (itr2 = itr1 + 1; itr2 < seqSet.end(); itr2++) {
if ((itr2->seq).find(itr1->seq)) {                          //Line 90
//		cout << "Found substring!" << "\t\t";             //Line 90a
//		cout << itr1->seqID << "\t";	                  //Line 90b
//		cout << itr1->seq << "\t vs. \t";	          //Line 90c
//		cout << itr2->seqID << "\t";	                  //Line 90d
//		cout << (*itr2).seq << endl;	                  //Line 90e
     seqSet.erase(itr1);                                   //Line 91
	    }
	}
    }

//Printing the vector with iterator
    for (itr3 = seqSet.begin(); itr3 != seqSet.end(); itr3++) {
	cout << (*itr3).seqID << "\t";	//Or, itr3->seqID
	cout << itr3->seq << "\t";
	cout << itr3->seqLength << endl;
    }

    inFILE.close();
    return 0;
}

Debugging for a while, finally code was compiled without error!
Code:
test.fasta
>seq1
ATCGATCGATATATATATATATAT
>seq2 sub of seq1
CGATCGATATATATATAT
>seq3 sub of seq1
ATCGATATATAT
>seq4 new 
ATATATATCGATCG
>seq5 new
ATCGATCGATCGATCGTAGTCGCG
>seq6 new
ATCGATCGCGCGCGCGCGCGCGCGC
>seq7 sub of seq6
CGCGCGCGCGCGCG

Code:
  $ g++ -Wall  rm_redundantseq.cpp  
  $ ./a.out  test.fasta 
seq2	CGATCGATATATATATAT	18
seq5	ATCGATCGATCGATCGTAGTCGCG	24
seq6	ATCGATCGCGCGCGCGCGCGCGCGC	25

But I am expecting
Code:
seq1 ATCGATCGATATATATATATATAT    23
seq4 ATATATATCGATCG    14
seq5	ATCGATCGATCGATCGTAGTCGCG	    24
seq6	ATCGATCGCGCGCGCGCGCGCGCGC	25

Likely Line 90 & 91 are of problem, but not quite sure.
Two questions I have here:
1) Line 90 Find substring from masterstring, as not all the entry are substring of the others. string.find() function seems return a pointer, but there is no error here. Why?
2) Line 91 remove the vector member (structure) thru iterator , is this the right way?
Google's for a while, could not get a clear answer.
Thanks a lot!

Last edited by yifangt; 10-28-2014 at 08:05 PM.. Reason: Modifies code
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Copying a Directory Structure to a new structure

Hi all Is it possible to copy a structure of a directory only. e.g. I have a file with the following entries that is a result of a find :- /dir1/dir2/file.dbf /dir1/dir2/dir3/file1.dbf /dir1/file.dbf I want to copy these to a directory and keep the structure however starting at a new dir... (8 Replies)
Discussion started by: jhansrod
8 Replies

2. UNIX for Dummies Questions & Answers

Traversing a file system

I'm pretty new at this UNIX stuff, and this may be a simple question but I'm kind of stuck :confused: Let's say I have a large directory structure of .essay files, where I saved all of the essays that I did over the last few years. Not all of the .essay files are in the same directory (all... (1 Reply)
Discussion started by: hooj
1 Replies

3. Programming

Accesing structure member:Error:dereferencing pointer to incomplete type

$ gcc -Wall -Werror struct.c struct.c: In function `main': struct.c:18: error: dereferencing pointer to incomplete type $ cat struct.c #include <stdio.h> #include <stdlib.h> #include <string.h> /*Declaration of structure*/ struct human { char *first; char gender; int age; } man,... (3 Replies)
Discussion started by: amit4g
3 Replies

4. Programming

Search attributes in one structure using the values from another structure

Hello Groups I am trying to find out ways of comparing a value from a 'c' structure to a value in another 'C' structure. the 'C' structure can be a List or liked list as it contains lot many records. if we loop it in both the structures it is going to consume time. I am looking for a simple... (3 Replies)
Discussion started by: dhanamurthy
3 Replies

5. Programming

Vector Traversing

Hi i have the following structure struct S { char Mod_num; char val; char chr_nm_cd; } I am reading a 2GB file and inserting into the structure and writing into a vector. I feel like only vector will be a right option. I tried with multimap but it is memory intensive and hence i... (1 Reply)
Discussion started by: dhanamurthy
1 Replies

6. UNIX for Dummies Questions & Answers

script for traversing directory

hi please suggest a korn script which will traverse all subdirectory in the current directory? (2 Replies)
Discussion started by: ilayans
2 Replies

7. Shell Programming and Scripting

Find directory name while traversing subdirectories

Hi, I have a parent directory in which I have sub directories of different depth /usr/usr1/user2/671 /usr/usr1/672 /usr/user2/user1/673 /usr/user2/user3/user4/674 And I need the names of all the directories that which starts only with 6 in a file. Thanks, (12 Replies)
Discussion started by: arun_maffy
12 Replies

8. Programming

Traversing in Array of pointers

Please find the below program. the requirement and description of the program also given: ganesh@ubuntu:~/my_programs/c/letusc/chap9$ cat fa.c.old /* Program : write a program to count the number of 'e' in thefollowing array of pointers to strings: char *s = { "We will teach you how... (12 Replies)
Discussion started by: ramkrix
12 Replies

9. Homework & Coursework Questions

Problem while traversing directories

I was given to create a backup of all files in a given directory(command line argument) into say /home/vishal/back and the back up files must be accordingly to the extension of the file i.e pdf files are saved in back/pdf doc files back/doc etc . I gave a recursive function to traverse through the... (1 Reply)
Discussion started by: davis7son
1 Replies

10. Shell Programming and Scripting

traversing a string

I am writing a script which will read a word and say how many vowels and consonants does the word contain. but i dont know how to traverse a string in shell scripting. if it was in C i'd have done something like this: cout<<"plz enter the word"<<endl; cin>>word; int consonants, vowels;... (4 Replies)
Discussion started by: nishrestha
4 Replies
All times are GMT -4. The time now is 07:02 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy