Format specifier for sscanf() in C


 
Thread Tools Search this Thread
Top Forums Programming Format specifier for sscanf() in C
# 8  
Old 11-04-2019
last field for sscanf() is incorrectly parsed

Change the RECORD increment i++; according to your correction fixed the problem!
Thank you so much!
---------------------------------------------------------------------------
Wait! There is still something I must have missed.
1) The last member of each struct RECORD is not correctly parsed;
2) Sample 10 and after will not get correctly parsed when there are 10 or more samples.
This brings me back to the original learning on sscanf() with fgets() with the regex I used.
Here is the reformatted code to print a re-arranged table of the input file.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// pratice with fgets() + sscanf() to read in multiple lines into struct
typedef struct {
    char ID[32];
    char SNPs[8];
    char MNPs[8];
    char Insertion[8];
    char Deletion[8];
    char Indels[8];                     
    char SameRef[8];            
    char MissingGT[8];             
    char SNPTransTranv[8];
    char TotalHetHomRatio[8];
    char SNPHetHomRatio[8];
    char MNPHetHomRatio[8];
    char InsertionHetHomRatio[8];
    char DeletionHetHomRatio[8];
    char IndelHetHomRatio[8];
    char InsertDeletionRatio[8];
    char Indel_SNPMNPRatio[8];
} RECORD;

int    main (int argc, char *argv[])
{
    char line[256];     //for row read from file
    char name[32];      //1st part (key part) parsed from each line[]
    char str1[8];       //2nd part (value part) parsed from each line[]
//    char tail[128];     //rest behind 2nd part

    FILE* fPtr = fopen(argv[1], "r");
    RECORD record[16];            //test file may have 288 ~ 306 rows including blank lines for 16 RECORD
    static int i = -1;            //initialize counter

    while (fgets(line, sizeof(line), fPtr) != NULL) {
        str1[0]='0';
        if ( line[0] == '\n' ) continue;        //skip "blank" lines with e.g. empty or invisible spaces(to be improved!).

        //scan in two parts delimited by ":", in the 2nd part, only take the first part delimited by space
        sscanf(line, "%[^:] : %s", name, str1);   
        if (strstr(name, "Sample Name") != NULL) {
            i++;
            strcpy(record[i].ID, str1); 
            printf("%s ", str1);
        }
        else if (strstr(name, "SNPs") != NULL) { 
            strcpy(record[i].SNPs, str1);
            printf("%s ", str1);}
        else if (strstr(name, "MNPs") != NULL) {
            strcpy(record[i].MNPs, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Insertions") != NULL) {
            strcpy(record[i].Insertion, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Deletions") != NULL) {        //changed the variable line -> name in the original post, and all the rest after this line
            strcpy(record[i].Deletion, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Indels") != NULL) {  
            strcpy(record[i].Indels, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Same as reference") != NULL) {
            strcpy(record[i].SameRef, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Missing Genotype") != NULL) {
            strcpy(record[i].MissingGT, str1);
            printf("%s ", str1);}
        else if (strstr(name, "SNP Transitions") != NULL) {
            strcpy(record[i].SNPTransTranv, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Total Het/Hom") != NULL) {
            strcpy(record[i].TotalHetHomRatio, str1);
            printf("%s ", str1);}
        else if (strstr(name, "SNP Het/Hom ratio") != NULL) {
            strcpy(record[i].SNPHetHomRatio, str1);
            printf("%s ", str1);}
        else if (strstr(name, "MNP Het/Hom ratio") != NULL) {
            strcpy(record[i].MNPHetHomRatio, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Insertion Het/Hom ratio") != NULL) {
            strcpy(record[i].InsertionHetHomRatio, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Deletion Het/Hom ratio") != NULL) {
            strcpy(record[i].DeletionHetHomRatio, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Indel Het/Hom ratio") != NULL) {
            strcpy(record[i].IndelHetHomRatio, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Insertion/Deletion ratio") != NULL) {
            strcpy(record[i].InsertDeletionRatio, str1);
            printf("%s ", str1);}
        else if (strstr(name, "Inde/SNP+MNP ratio") != NULL) {
            strcpy(record[i].Indel_SNPMNPRatio, str1);
            printf("%s ", str1); }
        else printf("Sthwrong!\n"); 
        // printf("%s: %s\n", name, str1);        
    }
    puts("END");        //For debug. puts() always adds newline at the end of the string
    fclose(fPtr);
    return 0;
}
./prog infile

The attached infile is simply repeating the first 4 RECORD with unique sample IDs for trial.
And the wrong output is:
Code:
sample1 91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 Sthwrong!
Sthwrong!
sample2 73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 Sthwrong!
Sthwrong!
sample3 87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 Sthwrong!
Sthwrong!
sample4 83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 Sthwrong!
Sthwrong!
sample5 91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 Sthwrong!
Sthwrong!
sample6 73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 Sthwrong!
Sthwrong!
sample7 87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 Sthwrong!
Sthwrong!
sample8 83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 Sthwrong!
Sthwrong!
sample9 91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 Sthwrong!
Sthwrong!
Sthwrong!
73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 Sthwrong!
Sthwrong!
Sthwrong!
87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 Sthwrong!
Sthwrong!
Sthwrong!
83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 Sthwrong!
Sthwrong!
Sthwrong!
91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 Sthwrong!
Sthwrong!
Sthwrong!
73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 Sthwrong!
Sthwrong!
Sthwrong!
87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 Sthwrong!
Sthwrong!
Sthwrong!
83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 Sthwrong!
Sthwrong!
END

But, what is expected is:
Code:
sample1 91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 0.08
sample2 73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 0.07
sample3 87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 0.08
sample4 83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 0.07
sample5 91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 0.08
sample6 73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 0.07
sample7 87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 0.08
sample8 83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 0.07
sample9 91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 0.08
sample10 73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 0.07
sample11 87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 0.08
sample12 83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 0.07
sample13 91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 0.08
sample14 73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 0.07
sample15 87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 0.08
sample16 83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 0.07
 END

I believe the string size is correct, as the last member record[i].Indel_SNPMNPRatio is never more than 3 digits. Any help is greatly appreciated.

Last edited by yifangt; 11-05-2019 at 03:16 PM.. Reason: new bug
This User Gave Thanks to yifangt For This Post:
# 9  
Old 11-05-2019
The str1 variable is too small. "sample10" is 8 characters, so str1 must have 8+1=9 bytes wide at least.
Code:
    char name[256];      //1st part (key part) parsed from each line[]
    char str1[12];       //2nd part (value part) parsed from each line[]

Also I was wrong in my previous post, should have been
Code:
typedef struct {
    char ID[12];
    char SNPs[8];
...

Because ID only needs to store str1.
This User Gave Thanks to MadeInGermany For This Post:
# 10  
Old 11-06-2019
last variable gets printed twice

Thanks!
Change the char array size ID[12], str1[12] resolved the ID bug!

Last bug(hopefully!): the last variable is printed twice, that I really could not understand.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// pratice with fgets() + sscanf() to read in multiple lines into struct
typedef struct {
    char ID[12];
   /*omit other members for this post */
    char Indel_SNPMNPRatio[8];
} RECORD;

int main (int argc, char *argv[])
{
    char line[512];      //for row read from file
    char name[32];      //1st part (key part) parsed from each line[]
    char str1[12];       //2nd part (value part) parsed from each line[]

    FILE* fPtr = fopen(argv[1], "r");
    RECORD record[16];            //test file may have 288 ~ 306 rows including blank lines
    static int i = -1;               //initialize counter

    while (fgets(line, sizeof(line), fPtr) != NULL) {
        str1[0]='0';
        if ( line[0] == '\n' ) continue;        //skip "blank" lines with e.g. empty or invisible spaces(to be improved!).

        sscanf(line, "%[^:] : %s", name, str1);   
        if (strstr(name, "Sample Name") != NULL) {
            i++;
            strcpy(record[i].ID, str1);
            printf("\n");
        }
        printf("%s ", str1);
    }

    puts("\nEND");      //For debug.
    fclose(fPtr);
    return 0;
}

./prog vcfstats.txt
Code:
sample1 91 1 5 2 0 1 44 1.74 2.96 2.79 - 4.00 - - 2.50 0.08 0.08 
sample2 73 2 2 3 0 1 63 1.87 2.59 2.50 1.00 - - - 0.67 0.07 0.07 
...... 
sample15 87 1 4 2 0 1 42 1.74 2.96 2.79 - 2.00 - - 1.25 0.08 0.08 
sample16 83 1 2 3 0 4 65 1.87 2.59 2.50 1.00 - - - 0.67 0.07 0.07 
END

I want to ensure all the details of the bugs.
How come the last variable gets printed twice?
Thanks again.

Last edited by yifangt; 11-06-2019 at 04:30 PM..
# 11  
Old 11-07-2019
The empty line prints str1.
Because it was not matched in scanf() it has the value from the previous line - therefore the str1= assignment,
should set it to "" with one of
str1[0]=0 or str1[0]='\0' or with a bit more overhead strcpy(str1, "").
But you have str1[0]='0' that overwrites the first character of the previous value with a 0 character.
This User Gave Thanks to MadeInGermany For This Post:
# 12  
Old 11-07-2019
I was so stupid when you first pointed out that I should add a line to empty str1 with str1[0]=0; but I thought you had typos so that I simply added the single quotes: str1[0]='0'; Last night I actually tried str1[0]='\0'; without realizing that is the right way, or understanding until your last reply. It finally works now and I understand it better.
Thank you so much for all your time and patience with me!

Last edited by yifangt; 11-07-2019 at 05:57 PM..
# 13  
Old 11-07-2019
BTW in many cases a trim function comes in handy.
For example, the fuzzy strstr() that can match a portion of the string can be replaced with an exact strcmp()
Code:
...
// http://www.martinbroadhurst.com/trim-a-string-in-c.html
char *rtrim(char *str, const char *seps)
{
    int i;
    if (seps == NULL) {
        seps = "\t\n\v\f\r ";
    }
    i = strlen(str) - 1;
    while (i >= 0 && strchr(seps, str[i]) != NULL) {
        str[i] = '\0';
        i--;
    }
    return str;
}

int main (int argc, char *argv[])
{
    char line[512];      //for row read from file
    char name[32];      //1st part (key part) parsed from each line[]
    char str1[12];       //2nd part (value part) parsed from each line[]

    FILE* fPtr = fopen(argv[1], "r");
    RECORD record[16];            //test file may have 288 ~ 306 rows including blank lines
    static int i = -1;               //initialize counter

    while (fgets(line, sizeof(line), fPtr) != NULL) {
        if ( line[0] == '\n' ) continue;        //skip "blank" lines with e.g. empty or invisible spaces(to be improved!).
        str1[0]='\0';
        sscanf(line, "%[^:] : %s", name, str1);   
        rtrim(name, NULL);             //strip the trailing spaces
        if (strcmp(name, "Sample Name") == 0) {
            i++;
            strcpy(record[i].ID, str1);
            printf("\n");
        }
...

Note that the variable i in the rtrim() is local to the function, does not conflict with the variable i in main()
These 2 Users Gave Thanks to MadeInGermany For This Post:
# 14  
Old 11-07-2019
Thanks!
Corona688 in this forum had similar code, but I skipped that part not to making my question too branchy.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need script for transferring bulk files from one format to text format

"Help Me" Need script for transferring bulk files from one format to text format in a unix server. Please suggest (2 Replies)
Discussion started by: Kranthi Kumar
2 Replies

2. Programming

sscanf() weired behaviour

Hi with the following code int a, b; while ((n = readline (connfd, buf, sizeof(buf)-1)) > 0) { buf = '\0'; if (sscanf(buf,"%d %d",&a,&b) != 2) snprintf (buf, sizeof(buf), "data error\r\n"); else { printf("\nRecvd %d and %d",a,b); ... (1 Reply)
Discussion started by: princebadshah
1 Replies

3. Shell Programming and Scripting

Retaining the Unix CSV format in Excel format while exporting

Hi All, I have created a Unix Shell script whch creates a *.csv file and export it to Excel. The problem i am facing is that Users wants one of the AMOUNT field in comma separted values. Example : if the Amount has the value as 3000000 User wants to be in 3,000,000 format. This Amount format... (2 Replies)
Discussion started by: rawat_me01
2 Replies

4. Programming

using sscanf

How can I separetely extract the string and int after "dribble" ? (sscanf must limit TEXT to 9 chars to avoid buffer overflows.) How come this code does not work with "dribbletext08" but does with "dribbletext05" ? int main(void) { char TEXT = ""; int NUMBER = 0; ... (2 Replies)
Discussion started by: cyler
2 Replies

5. Programming

Help with sscanf

sscanf does not stop at the first "&". How can I extract "doe" ? char A = "name=john&last=doe&job=vacant&"; char B = "last"; char C = ""; char *POINTER = strstr(A, B); sscanf(POINTER + strlen(B), "=%s%*", C); printf("%s\n", C); // doe&job=vacant& (2 Replies)
Discussion started by: limmer
2 Replies

6. Programming

help with sscanf

I need to match a float inside a very long string (about 5000 chars) with sscanf. (I trimmed the string in this example.) I can't seem to match all the chars that come before and after the float. int main(void) { char A = ""; strcat(A, " hello world! WORD' name='5.3498' hello world! ... (1 Reply)
Discussion started by: limmer
1 Replies

7. Programming

help with sscanf()!

Hi everybody, i need help with this function, i'm programming in CGI with C and i can't make this work. QUERY_STRING is something like: user=MYUSER&pass=MYPASS So, what i want is to store the strings containing the username and the password into str1 and str2 respetively, here's the... (4 Replies)
Discussion started by: Zykl0n-B
4 Replies

8. Shell Programming and Scripting

awk printf formatting using string format specifier.

Hi all, My simple AWK code does C = A - B If C can be a negative number, how awk printf formating handles it using string format specifier. Thanks in advance Kanu :confused: (9 Replies)
Discussion started by: kanu_pathak
9 Replies

9. Programming

sscanf !!

I have a string Form this string, I want to extract I am unable to do that with sscanf because of the space between the words. What else can I use? #include <stdio.h> char buf_2; int main() { char *buf_1 = "\\\\?\\whats going on"; sscanf(buf_1,... (4 Replies)
Discussion started by: the_learner
4 Replies

10. Programming

sscanf function is failing

Please delete this thread. (0 Replies)
Discussion started by: jxh461
0 Replies
Login or Register to Ask a Question