Format specifier for sscanf() in C


 
Thread Tools Search this Thread
Top Forums Programming Format specifier for sscanf() in C
# 1  
Old 11-02-2019
Format specifier for sscanf() in C

Hello, I have formatted lines delimited by colon ":", and I need to parse the line into two parts with sscanf() with format specifiers.
infile.txt:
Code:
Sample Name: sample1
SNPs                         : 91
MNPs                         : 1
Insertions                   : 5
Deletions                    : 2
Indels                       : 0
Same as reference            : 1
Missing Genotype             : 44
SNP Transitions/Transversions: 1.74 (73/42)
Total Het/Hom ratio          : 2.96 (74/25)

And here is my code fragment:
Code:
char line[256];     //for row read from file
char name[32];      //1st part (key part) parsed from each line[]
char str1[8];       //2nd part (value part) parsed from each line[]

sscanf(line, "%[^:] : %s", name, str1);      //scan in two halves delimited by ":", in the 2nd half, only take the first part delimited by space/tab
                                             //e.g. line1: Sample Name-> name; sample1 -> str1 
                                             //line10: Total Het/Hom ratio -> name; 2.96-> str1,  discard (74/25)

There may not be space/tab before or after the colon. Ignoring the float/integer data type for numbers at this moment.
For each line I could not get the first half to name, and the first part after ":" into str1. The problem is sscanf() does not stop at the end of each line. Spent some time reading the manpage of sscanf() and my old post, could not figure it out myself.
What is wrong with my sscanf() line? Thanks a lot.

Last edited by yifangt; 11-02-2019 at 01:54 PM.. Reason: typo and add color
# 2  
Old 11-03-2019
Maybe add a newline at the end of your regex?
# 3  
Old 11-04-2019
For me it works: the red phrases land in the blue variables.
But there can be trailing spaces in name because %[^:] matches up to the :
These 2 Users Gave Thanks to MadeInGermany For This Post:
# 4  
Old 11-04-2019
whole program buggy code

Thanks!
Then there must be something else wrong with my program. I decided to post the whole program here seeking more help to debug. A test file is also attached for trial.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// pratice with fgets() + sscanf() to read in multiple lines into struct
typedef struct {
    char ID[16];
    char SNPs[8];
    char MNPs[8];
    char Insertion[8];
    char Deletion[8];
    char Indels[8];                     
    char SameRef[8];            
    char MissingGT[8];             
    char SNPTransTranv[8];
    char TotalHetHomRatio[8];
    char SNPHetHomRatio[8];
    char MNPHetHomRatio[8];
    char InsertionHetHomRatio[8];
    char DeletionHetHomRatio[8];
    char IndelHetHomRatio[8];
    char InsertDeletionRatio[8];
    char Indel_SNPMNPRatio[8];
} RECORD;

int main (int argc, char *argv[])
{
    char line[256];     //for row read from file
    char name[32];      //1st part (key part) parsed from each line[]
    char str1[8];       //2nd part (value part) parsed from each line[]

    FILE* fPtr = fopen(argv[1], "r");
    RECORD record[12];           //test file only has ~ 70 rows
    static int i = 0;            //initialize counter

    while (fgets(line, 256, fPtr) != NULL) {
        //if (!sscanf(line, "%[^\r\n]", name)) continue;      //skip blank line. This line may have problem???
        
    sscanf(line, "%[^:] : %s", name, str1);    //scan in two parts delimited by ":", 
                                             //in the 2nd part, only take the first part delimited by space
    if (strstr(name, "Sample Name") != NULL)
            strcpy(record[i].ID, name);
    else if (strstr(name, "SNPs") != NULL) 
            strcpy(record[i].SNPs, str1);
    else if (strstr(name, "MNPs") != NULL)
            strcpy(record[i].MNPs, str1);
    else if (strstr(name, "Insertions") != NULL)
            strcpy(record[i].Insertion, str1);
    else if (strstr(line, "Deletions") != NULL)
            strcpy(record[i].Deletion, str1);
    else if (strstr(line, "Indels") != NULL) 
            strcpy(record[i].Indels, str1);
    else if (strstr(line, "Same as reference") != NULL)
            strcpy(record[i].SameRef, str1);
    else if (strstr(line, "Missing Genotype") != NULL)
            strcpy(record[i].MissingGT, str1);
    else if (strstr(line, "SNP Transitions") != NULL)
            strcpy(record[i].SNPTransTranv, str1);
    else if (strstr(line, "Total Het/Hom") != NULL)
            strcpy(record[i].TotalHetHomRatio, str1);
    else if (strstr(line, "SNP Het/Hom ratio") != NULL)
            strcpy(record[i].SNPHetHomRatio, str1);
    else if (strstr(line, "MNP Het/Hom ratio") != NULL)
            strcpy(record[i].MNPHetHomRatio, str1);
    else if (strstr(line, "Insertion Het/Hom ratio") != NULL)
            strcpy(record[i].InsertionHetHomRatio, str1);
    else if (strstr(line, "Deletion Het/Hom ratio") != NULL)
            strcpy(record[i].DeletionHetHomRatio, str1);
    else if (strstr(line, "Indel Het/Hom ratio") != NULL)
            strcpy(record[i].IndelHetHomRatio, str1);
    else if (strstr(line, "Insertion/Deletion ratio") != NULL)
            strcpy(record[i].InsertDeletionRatio, str1);
    else if (strstr(line, "Inde/SNP+MNP ratio") != NULL)
            strcpy(record[i].Indel_SNPMNPRatio, str1);
   
    printf("%s: %s\n", name, str1);         //puts() always adds newline at the end of the string
        //printf("%d\n", i); 
    i++;                                    //increment of record count
    if (i > 8) exit (EXIT_FAILURE);       //truncate the input file, need improved
	}
    fclose(fPtr);
    return 0;
}

My code was compiled without problem, but only gave the first RECORD correctly and then segment fault.
Code:
./myprog vcfstats.txt

Sample Name: sample1
SNPs                         : 91
MNPs                         : 1
Insertions                   : 5
Deletions                    : 2
Indels                       : 0
Same as reference            : 1
Missing Genotype             : 44
SNP Transitions/Transversions: 1.74
Total Het/Hom ratio          : 2.96
SNP Het/Hom ratio            : 2.79
MNP Het/Hom ratio            : -
Insertion Het/Hom ratio      : 4.00
Deletion Het/Hom ratio       : -
Indel Het/Hom ratio          : -
Insertion/Deletion ratio     : 2.50
Indel/SNP+MNP ratio          : 0.08
Sample Name: sample2
SNPs                         : 73
MNPs                         : 2
Insertions                   : 2
Deletions                    : 3
Indels                       : 0
Same as reference            : 1
Missing Genotype             : 63
SNP Transitions/Transversions: 1.87
Segmentation fault: 11

Thanks a lot again.
# 5  
Old 11-04-2019
Again it worked for me, with the infile.txt from your post #1.
Perhaps it helps to make your strings more robust:
Code:
    char line[256];     //for row read from file
    char name[256];     //1st part (key part) parsed from each line[],
                        // entire line if there is no : separator
    char str1[8];       //2nd part (value part) parsed from each line[]
...
    while (fgets(line, sizeof(line), fPtr) != NULL) {
    str1[0]=0;    // clear str1 in case it won't be set

# 6  
Old 11-04-2019
segment fault

Thanks, did you try the attached file with 4 RECORD?
I always got segment fault at the same spot of the input, i.e. the 10th variable TotalHetHomRatio[8] within the 2nd RECORD, sample2.
Also I suspect my stupid if-else-if loop is wrong.
# 7  
Old 11-04-2019
The code that you provided stops at line 8 with exit 1:
Code:
...
    printf("%s: %s\n", name, str1);         //puts() always adds newline at the end of the string
        //printf("%d\n", i); 
    i++;                                    //increment of record count
    if (i > 8) exit (EXIT_FAILURE);       //truncate the input file, need improved

If I remove that then I get a sgmentation fault.
In the code that you provided there are two mistakes, here is one correction
Code:
typedef struct {
    char ID[32];
...

Further, you mix up line numbers with record numbers.
If variable i is supposed to increment with every record starting with "Sample Name" then you can have
Code:
    if (strstr(name, "Sample Name") != NULL) {
        i++;                                    //increment of record count
        strcpy(record[i].ID, name);
    }

and no increment for every line!
And in order to start with index 0 the initialization should be
Code:
    static int i = -1;           //initialize counter


Last edited by MadeInGermany; 11-04-2019 at 06:44 PM..
These 2 Users Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need script for transferring bulk files from one format to text format

"Help Me" Need script for transferring bulk files from one format to text format in a unix server. Please suggest (2 Replies)
Discussion started by: Kranthi Kumar
2 Replies

2. Programming

sscanf() weired behaviour

Hi with the following code int a, b; while ((n = readline (connfd, buf, sizeof(buf)-1)) > 0) { buf = '\0'; if (sscanf(buf,"%d %d",&a,&b) != 2) snprintf (buf, sizeof(buf), "data error\r\n"); else { printf("\nRecvd %d and %d",a,b); ... (1 Reply)
Discussion started by: princebadshah
1 Replies

3. Shell Programming and Scripting

Retaining the Unix CSV format in Excel format while exporting

Hi All, I have created a Unix Shell script whch creates a *.csv file and export it to Excel. The problem i am facing is that Users wants one of the AMOUNT field in comma separted values. Example : if the Amount has the value as 3000000 User wants to be in 3,000,000 format. This Amount format... (2 Replies)
Discussion started by: rawat_me01
2 Replies

4. Programming

using sscanf

How can I separetely extract the string and int after "dribble" ? (sscanf must limit TEXT to 9 chars to avoid buffer overflows.) How come this code does not work with "dribbletext08" but does with "dribbletext05" ? int main(void) { char TEXT = ""; int NUMBER = 0; ... (2 Replies)
Discussion started by: cyler
2 Replies

5. Programming

Help with sscanf

sscanf does not stop at the first "&". How can I extract "doe" ? char A = "name=john&last=doe&job=vacant&"; char B = "last"; char C = ""; char *POINTER = strstr(A, B); sscanf(POINTER + strlen(B), "=%s%*", C); printf("%s\n", C); // doe&job=vacant& (2 Replies)
Discussion started by: limmer
2 Replies

6. Programming

help with sscanf

I need to match a float inside a very long string (about 5000 chars) with sscanf. (I trimmed the string in this example.) I can't seem to match all the chars that come before and after the float. int main(void) { char A = ""; strcat(A, " hello world! WORD' name='5.3498' hello world! ... (1 Reply)
Discussion started by: limmer
1 Replies

7. Programming

help with sscanf()!

Hi everybody, i need help with this function, i'm programming in CGI with C and i can't make this work. QUERY_STRING is something like: user=MYUSER&pass=MYPASS So, what i want is to store the strings containing the username and the password into str1 and str2 respetively, here's the... (4 Replies)
Discussion started by: Zykl0n-B
4 Replies

8. Shell Programming and Scripting

awk printf formatting using string format specifier.

Hi all, My simple AWK code does C = A - B If C can be a negative number, how awk printf formating handles it using string format specifier. Thanks in advance Kanu :confused: (9 Replies)
Discussion started by: kanu_pathak
9 Replies

9. Programming

sscanf !!

I have a string Form this string, I want to extract I am unable to do that with sscanf because of the space between the words. What else can I use? #include <stdio.h> char buf_2; int main() { char *buf_1 = "\\\\?\\whats going on"; sscanf(buf_1,... (4 Replies)
Discussion started by: the_learner
4 Replies

10. Programming

sscanf function is failing

Please delete this thread. (0 Replies)
Discussion started by: jxh461
0 Replies
Login or Register to Ask a Question