Correct way to read data of different formats into same struct

04-06-2015

Registered User

564, 13

Join Date: Sep 2009

Last Activity: 26 May 2021, 8:59 AM EDT

Location: Saskatchewan, Canada

Posts: 564

Thanks Given: 376

Thanked 13 Times in 12 Posts

Correct way to read data of different formats into same struct

I was wondering what is the correct way to read in data "one-part-per-line" as compared with "one-record-per-line" formats into the same structure in C?

Code:

format1.dat:

Zacker  244.00  244.00  542.00
Lee     265.00  265.00  456.00
Walter  235.00  235.00  212.00
Zena    323.00  215.45  332.50

Code:

format2.dat:

Zacker  
244.00 
244.00 
542.00
Lee    
265.00    
265.00    
456.00
Walter    
235.00    
235.00    
212.00
Zena    
323.00    
215.45    
332.50
Mira    
285.00    
285.00    
415.00

Using the same structure as:

Code:

typedef struct info
{
  char name[20];
  double test;
  double quiz;
  double English;
} Info;

To process data in format1.dat, I have:

Code:

int main ()
{
  int n = 0;
  Info record[N];

  FILE *INFILE = fopen ("format1.dat", "r");
  while (fscanf (INFILE, "%s %lf %lf %lf", 
               record[n].name, &record[n].test,
               &record[n].quiz, &record[n].English) == 4)
    {
      printf ("%s\t%.2lf\t%.2lf\t%.2lf\n", 
              record[n].name, record[n].test,
              record[n].quiz, record[n].English);
      ++n;
    }
fclose (INFILE);

return 0;
}

How to read the data in format2.dat into the same struct, especially the while () block:

Code:

while (    ...   )  {
      printf ("%s\t%.2lf\t%.2lf\t%.2lf\n", 
               record[n].name, record[n].test,
               record[n].quiz, record[n].English);
      ++n;
    }

I was comparing these situation with awk which by default processes the file by row, or set the RS/FS separator if fields are in different rows.
Thanks a lot!

yifangt

View Public Profile for yifangt

Find all posts by yifangt

04-06-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

I guess I don't see your problem.

Changing your program slightly so it will compile, read from standard input instead of from a hardcoded filename, and not overwrite memory following your array if you overflow the array:

Code:

#include <stdio.h>

#define	NMAX	7

typedef struct info
{
  char name[20];
  double test;
  double quiz;
  double English;
} Info;

int main()
{
  int	n = 0;
  int	ret;
  Info	record[NMAX];

  while (n < NMAX && (ret = fscanf (stdin, "%s %lf %lf %lf", 
               record[n].name, &record[n].test,
               &record[n].quiz, &record[n].English)) == 4)
    {
      printf ("%s\t%.2lf\t%.2lf\t%.2lf\n", 
              record[n].name, record[n].test,
              record[n].quiz, record[n].English);
      ++n;
    }
  printf("%d records processed.\n", n);
  printf("return code from last fscanf() call: %d\n", ret);

  return 0;
}

and running it in a directory containing format1.dat and format2.dat from post #1 in this thread as follows:

Code:

$ ./a.out < *1.dat
Zacker	244.00	244.00	542.00
Lee	265.00	265.00	456.00
Walter	235.00	235.00	212.00
Zena	323.00	215.45	332.50
4 records processed.
return code from last fscanf() call: -1
$ ./a.out < *2.dat
Zacker	244.00	244.00	542.00
Lee	265.00	265.00	456.00
Walter	235.00	235.00	212.00
Zena	323.00	215.45	332.50
Mira	285.00	285.00	415.00
5 records processed.
return code from last fscanf() call: -1
$ cat *.dat | ./a.out
Zacker	244.00	244.00	542.00
Lee	265.00	265.00	456.00
Walter	235.00	235.00	212.00
Zena	323.00	215.45	332.50
Zacker	244.00	244.00	542.00
Lee	265.00	265.00	456.00
Walter	235.00	235.00	212.00
7 records processed.
return code from last fscanf() call: 4

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

04-06-2015

Registered User

316, 33

Join Date: Sep 2008

Last Activity: 13 September 2020, 12:21 AM EDT

Location: US

Posts: 316

Thanks Given: 66

Thanked 33 Times in 31 Posts

Quote:

Originally Posted by yifangt

I was wondering what is the correct way to read in data "one-part-per-line" as compared with "one-record-per-line" formats into the same structure in C?

I think questioning how to read a set of data is not exactly correct here. Rather, how to write those files. If you have an option to select in which way data will be given to you I'd much prefer one-record-per-line, as in this case end of line serves naturally as a record separator. In one-per-part-line approach there is extra burden on reader program to always keep track which field is being processed, which record is current, etc... Not to discount plain readability of line-per-record is much better than the other one, which is always helpful.

This User Gave Thanks to migurus For This Post:

migurus

View Public Profile for migurus

Find all posts by migurus

04-07-2015

Registered User

564, 13

Join Date: Sep 2009

Last Activity: 26 May 2021, 8:59 AM EDT

Location: Saskatchewan, Canada

Posts: 564

Thanks Given: 376

Thanked 13 Times in 12 Posts

Thank you!

Quote:

......plain readability of line-per-record is much better than the other one, which is always helpful

The major reason I post this question is to understand the "flow of the data" to process. I was comparing these situation with awk which by default processes the file by row, or set the RS/FS separator if fields are in different rows.In practice I do come across this situation (one-part-per-line) more often than (one-record-per-line), especially the output from other equipment, and it is not unusual to have > 50x millions records (=200x millions of lines).
Same situation for this data format I can think of is, spaces-containing-string for each variable. It is better to have them in different lines. For example:

Code:

struct bookInfo {
char book_name[100];
char author[100]; 
int pulish_year;                    //Corrected publish_year
char press_name[60];
}

Space is inserted to separate different records for easier view only

Code:

data.txt

C Programming Language  
B. W. Kernighan & D. M. Ritchie
1988
Prentice Hall

C Programming: A Modern Approach 
K. N. King
2008
W. W. Norton & Company

Absolute Beginner’s Guide To C 
Greg Perry
1994
Sams Publishing

C Primer Plus 
Stephen Prata
2004
Sams Publishing

Expert C Programming: Deep C Secrets 
Peter V. Linden 
1994
Prentice Hall

I can't imagine if they are in single line with mixture of spaces tab, quotes etc. Clear delimiter is needed, but it will be a new thread for this problem, as I do have difficulty to read in this type of data into structure in C.

Come back to the code part of my first post of this thread, I did not realize it is related to fscanf() that I was not sure of, but I AM SURE that my problem is related to data "stream" from STDIN or FILE, that's why I use awk RS/FS as a reference.
So, my question is: How does fscanf() processing the second scenario (format2.dat), i.e. record member is broken into different lines instead of being in a single line?

Last edited by yifangt; 04-07-2015 at 05:28 PM..

yifangt

View Public Profile for yifangt

Find all posts by yifangt

04-07-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Using fscanf() is great when you have any type of whitespace as a delimiter and machine produced data. But, if you have data that might be missing a field, it is sometimes hard to detect where the error happened and resync to a known good record boundary.

If you're going to be dealing with data that is delimited by <newline>s, it is usually safer reading into a line buffer (e.g., fgets()) and parsing data from the appropriate lines into the appropriate fields in your structures (strncpy() and sscanf()). It is then easy to detect empty lines as record boundaries, guarantee that you don't overflow the ends of fields in your structures, and verify the data is the correct type for each field as you gather it.

P.S.: I strongly encourage you to change the name of the third field in your bookInfo structure. Many programmers would misspell references to that field as publish_year.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

04-07-2015

Registered User

564, 13

Join Date: Sep 2009

Last Activity: 26 May 2021, 8:59 AM EDT

Location: Saskatchewan, Canada

Posts: 564

Thanks Given: 376

Thanked 13 Times in 12 Posts

Thanks Don:
I am aware what you meant by fgets() + sscanf() + strncpy() to parse input. At this moment I am working with fscanf() and assuming no missing field of the data.
According to this reference:

Code:

int fscanf ( FILE * stream, const char * format, ... );
Reads data from the stream and stores them according to the parameter format into the locations pointed by the additional arguments.
......
On success, the function returns the number of items of the argument list successfully filled.

My unclear part is: can the number of items of the argument list be separated by newline, i.e. in different rows? In other words, fscanf() keeps looking for the defined items until all of them are found, no matter those items are separated by space, tab or newline?
To give more of my confusion by this code fragment:

Code:

  rewind (pFile);                  //Line 11
  fscanf (pFile, "%f", &f);        //Line 12
  fscanf (pFile, "%s", str);       //Line 13

Line 11: set position of pFile stream to the beginning
Line 12: Look for a float number in one line (e.g. line 1)
Line 13: Look for a string in next line(line 2), or the same line(line 1)?

Code:

  rewind (pFile);                          //Line 11
  fscanf (pFile, "%f %s", &f, str);        //Line 12

This time Line 12 would be Looking for a float number AND a string in the same line (line 1)?
Sorry for this naive question, which bugs me. Thanks a lot!

Last edited by yifangt; 04-07-2015 at 06:10 PM..

yifangt

View Public Profile for yifangt

Find all posts by yifangt

04-07-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Isn't this obvious from the code I presented in post #2 in this thread where the fscanf() format string "%s %lf %lf %lf" was able to read four values from one or two lines from format1.dat and able to read four values from four or five lines from format2.dat? And the format string "%s%lf%lf%lf" would have produced the same results but isn't as easy for some people to read.

Have you read the man page for fscanf() recently. Look at it closely. (Characters that are classified as space characters by isspace() are ignored between strings being matched against conversion specifications other than for conversions with a conversion specifier [, c, C, or n.)

Did you try your fscanf() calls in a program? Or, are you just looking at the statements and wondering what they would do? (You could have easily answered this question yourself by putting your code in a program and trying it. And, you could probably have had the results in less time than it took you to type in your post.)

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Programming

Correct way to read data of different formats into same struct

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

Discussion started by: Sanjeev G

2. Shell Programming and Scripting

Shell script to correct the data

Discussion started by: sonu_pal

3. Shell Programming and Scripting

Help to get correct data using awk

Discussion started by: Jotne

4. Programming

Storing C++-struct in file - problem when adding new item in struct

Discussion started by: Powerponken

5. HP-UX

struct utsname throwing error : Value too large to be stored in data type

Discussion started by: shivarajbm

6. Shell Programming and Scripting

Extracting data from a log file with date formats

Discussion started by: cpickering

7. UNIX for Dummies Questions & Answers

How to access a struct within a struct?

Discussion started by: unbelievable21

8. Programming

writing binary/struct data to file

Discussion started by: radiatejava

9. UNIX for Advanced & Expert Users

how to read the data from an excel sheet and use those data as variable in the unix c

Discussion started by: Anne Grace

10. Shell Programming and Scripting

Read from data file

Discussion started by: fongthai