Sponsored Content
Top Forums Programming C++ getline, parse and take first tokens by condition Post 302917775 by yifangt on Thursday 18th of September 2014 01:01:21 PM
Old 09-18-2014
Thought of combining the map<string, string> container with the program.
Store all the combined sequence entries in map< string, string>; which will be:
1) easier to print and avoid the problem like extra blank line for the first entry;
2) convenient to retrieve part of the sequences by sequence ID (i.e. the key of the map).
Here is my modified code that was compiled well with Segmentation fault when run.
Code:
#include <iostream> 
#include <fstream> 
#include <string>
#include <map>

using namespace std;  
int main() 
{     
ifstream inFILE("infile.fasta");     
int inGuard = 1;               //using a guard variable
    
    map <string, string>FastaSeq;   //Declare a map to hold each sequence entry

    while (inFILE.good()) {     
    string line;        //declare string for each line      
    string entryID, sequence;    //declare two strings for key and value for map
    getline(inFILE, line);    //Read the whole line      
    char *sPtr;        //Declare char pointer sPtr for tokens     

     //Initialize char pointer sArray for conversion of the string to char*     
     char *sArray = new char[line.length() + 1];     
     strcpy(sArray, line.c_str());

     if (sArray[0] == '>') {         
     sPtr = strtok(sArray, " ");    //Using space as delimiter get the first token.         
     cout << sPtr << " ";       //Print the first token only         
     entryID = sPtr;             //assign the first token as key for the map
     continue;     
}     
 else  {         
     sPtr = strtok(sArray, " ");    //Get all the tokens with " " as delimiter.         
     FastaSeq[entryID] += sPtr;   // assign first part of sequence to map
           
while (sPtr != NULL) {          //For all tokens     
     cout << sPtr;
     FastaSeq[entryID] += sPtr;   // assign more token to sequence
     sPtr = strtok(NULL, " ");         
     }     
   }     
}      
cout << endl;    
 inFILE.close(); 

//print the map    
map <string, string>::const_iterator seq_itr;
if (seq_itr != FastaSeq.end()){
      cout << seq_itr->first << " ";
      cout << seq_itr->second << endl;
}

    return 0; 
}

The parts I was not sure are the "appending" of the parsed third and after tokens to the second token as sequence (value of map) highlighted in red FastaSeq[entryID] += sPtr;, which may be the problem for the program. Thanks a lot!

Last edited by yifangt; 09-19-2014 at 04:23 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

tokens in unix ?

im trying to remove all occurences of " OF xyz " in a file where xyz could be any word assuming xyz is the last word on the line but I won't always be. at the moment I have sed 's/OF.*//' but I want a nicer solution which could be in pseudo code sed 's/OF.* (next token)//' Is... (6 Replies)
Discussion started by: seaten
6 Replies

2. UNIX for Advanced & Expert Users

How to parse through a file and based on condition form another output file

I have one file say CM.txt which contains values like below.Its just a flat file 1000,A,X 1001,B,Y 1002,B,Z ... .. total around 4 million lines of entries will be in that file. Now i need to write another file CM1.txt which should have 1000,1 1001,2 1002,3 .... ... .. Here i... (6 Replies)
Discussion started by: sivasu.india
6 Replies

3. Shell Programming and Scripting

: + : more tokens expected

Hello- Trying to add two numbers in a ksh shell scripts and i get this error every time I execute stat1_ex.ksh: + : more tokens expected stat1=`cat .stat1a.tmp | cut -f2 -d" "` stat2=`cat .stat2a.tmp | cut -f2 -d" "` j=$(($stat1 + $stat2)) # < Here a the like the errors out echo $j... (3 Replies)
Discussion started by: Nomaad
3 Replies

4. Shell Programming and Scripting

Shell script to parse/split input string and display the tokens

Hi, How do I parse/split lines (strings) read from a file and display the individual tokens in a shell script? Given that the length of individual lines is not constant and number of tokens in each line is also not constant. The input file could be as below: ... (3 Replies)
Discussion started by: yajaykumar
3 Replies

5. Shell Programming and Scripting

Replacing tokens

Hi all, I have a variable with value DateFileFormat=NAME.CODE.CON.01.#.S001.V1.D$.hent.txt I want this variable to get replaced with : var2 is a variable with string value DateFileFormat=NAME\\.CODE\\.CON\\.01\\.var2\\.S001\\.V1\\.D+\\.hent\\.txt\\.xml$ Please Help (3 Replies)
Discussion started by: abhinav192
3 Replies

6. Shell Programming and Scripting

+: more tokens expected

Hey everyone, i needed some help with this one. We move into a new file system (which should be the same as the previous one, other than the name directory has changed) and the script worked fine in the old file system and not the new. I'm trying to add the results from one with another but i'm... (4 Replies)
Discussion started by: senormarquez
4 Replies

7. Shell Programming and Scripting

Need tokens in shell script

Hi All, Im writing a shell script in which I want to get the folder names in one folder to be used in for loop. I have used: packsName=$(cd ~/packs/Acquisitions; ls -l| awk '{print $9}') echo $packsName o/p: opt temp user1 user2 ie. Im getting the output as a string. But I want... (3 Replies)
Discussion started by: AB10
3 Replies

8. Shell Programming and Scripting

Parse tab delimited file, check condition and delete row

I am fairly new to programming and trying to resolve this problem. I have the file like this. CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam tg93 77 T C T T T T T tg93 79 ... (4 Replies)
Discussion started by: empyrean
4 Replies

9. Programming

Reading tokens

I have a String class with a function that reads tokens using a delimiter. For example String sss = "6:8:12:16"; nfb = sss.nfields_b (':'); String tkb1 = sss.get_token_b (':'); String tkb2 = sss.get_token_b (':'); String tkb3 = sss.get_token_b (':'); String tkb4 =... (1 Reply)
Discussion started by: kristinu
1 Replies

10. Shell Programming and Scripting

Parse xml in shell script and extract records with specific condition

Hi I have xml file with multiple records and would like to extract records from xml with specific condition if specific tag is present extract entire row otherwise skip . <logentry revision="21510"> <author>mantest</author> <date>2015-02-27</date> <QC_ID>334566</QC_ID>... (12 Replies)
Discussion started by: madankumar.t@hp
12 Replies
GETLINE(3)						     Linux Programmer's Manual							GETLINE(3)

NAME
getline, getdelim - delimited string input SYNOPSIS
#include <stdio.h> ssize_t getline(char **lineptr, size_t *n, FILE *stream); ssize_t getdelim(char **lineptr, size_t *n, int delim, FILE *stream); Feature Test Macro Requirements for glibc (see feature_test_macros(7)): getline(), getdelim(): Since glibc 2.10: _POSIX_C_SOURCE >= 200809L || _XOPEN_SOURCE >= 700 Before glibc 2.10: _GNU_SOURCE DESCRIPTION
getline() reads an entire line from stream, storing the address of the buffer containing the text into *lineptr. The buffer is null-termi- nated and includes the newline character, if one was found. If *lineptr is NULL, then getline() will allocate a buffer for storing the line, which should be freed by the user program. (In this case, the value in *n is ignored.) Alternatively, before calling getline(), *lineptr can contain a pointer to a malloc(3)-allocated buffer *n bytes in size. If the buffer is not large enough to hold the line, getline() resizes it with realloc(3), updating *lineptr and *n as necessary. In either case, on a successful call, *lineptr and *n will be updated to reflect the buffer address and allocated size respectively. getdelim() works like getline(), except that a line delimiter other than newline can be specified as the delimiter argument. As with get- line(), a delimiter character is not added if one was not present in the input before end of file was reached. RETURN VALUE
On success, getline() and getdelim() return the number of characters read, including the delimiter character, but not including the termi- nating null byte. This value can be used to handle embedded null bytes in the line read. Both functions return -1 on failure to read a line (including end-of-file condition). ERRORS
EINVAL Bad arguments (n or lineptr is NULL, or stream is not valid). VERSIONS
These functions are available since libc 4.6.27. CONFORMING TO
Both getline() and getdelim() were originally GNU extensions. They were standardized in POSIX.1-2008. EXAMPLE
#define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> int main(void) { FILE *fp; char *line = NULL; size_t len = 0; ssize_t read; fp = fopen("/etc/motd", "r"); if (fp == NULL) exit(EXIT_FAILURE); while ((read = getline(&line, &len, fp)) != -1) { printf("Retrieved line of length %zu : ", read); printf("%s", line); } free(line); exit(EXIT_SUCCESS); } SEE ALSO
read(2), fgets(3), fopen(3), fread(3), gets(3), scanf(3), feature_test_macros(7) COLOPHON
This page is part of release 3.27 of the Linux man-pages project. A description of the project, and information about reporting bugs, can be found at http://www.kernel.org/doc/man-pages/. GNU
2010-06-12 GETLINE(3)
All times are GMT -4. The time now is 06:50 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy