Correct way to read data of different formats into same struct


 
Thread Tools Search this Thread
Top Forums Programming Correct way to read data of different formats into same struct
# 1  
Old 04-06-2015
Correct way to read data of different formats into same struct

I was wondering what is the correct way to read in data "one-part-per-line" as compared with "one-record-per-line" formats into the same structure in C?
Code:
format1.dat:

Zacker  244.00  244.00  542.00
Lee     265.00  265.00  456.00
Walter  235.00  235.00  212.00
Zena    323.00  215.45  332.50

Code:
format2.dat:

Zacker  
244.00 
244.00 
542.00
Lee    
265.00    
265.00    
456.00
Walter    
235.00    
235.00    
212.00
Zena    
323.00    
215.45    
332.50
Mira    
285.00    
285.00    
415.00

Using the same structure as:
Code:
typedef struct info
{
  char name[20];
  double test;
  double quiz;
  double English;
} Info;

To process data in format1.dat, I have:
Code:
int main ()
{
  int n = 0;
  Info record[N];

  FILE *INFILE = fopen ("format1.dat", "r");
  while (fscanf (INFILE, "%s %lf %lf %lf", 
               record[n].name, &record[n].test,
               &record[n].quiz, &record[n].English) == 4)
    {
      printf ("%s\t%.2lf\t%.2lf\t%.2lf\n", 
              record[n].name, record[n].test,
              record[n].quiz, record[n].English);
      ++n;
    }
fclose (INFILE);

return 0;
}

How to read the data in format2.dat into the same struct, especially the while () block:
Code:
while (    ...   )  {
      printf ("%s\t%.2lf\t%.2lf\t%.2lf\n", 
               record[n].name, record[n].test,
               record[n].quiz, record[n].English);
      ++n;
    }

I was comparing these situation with awk which by default processes the file by row, or set the RS/FS separator if fields are in different rows.
Thanks a lot!
# 2  
Old 04-06-2015
I guess I don't see your problem.

Changing your program slightly so it will compile, read from standard input instead of from a hardcoded filename, and not overwrite memory following your array if you overflow the array:
Code:
#include <stdio.h>

#define	NMAX	7

typedef struct info
{
  char name[20];
  double test;
  double quiz;
  double English;
} Info;

int main()
{
  int	n = 0;
  int	ret;
  Info	record[NMAX];

  while (n < NMAX && (ret = fscanf (stdin, "%s %lf %lf %lf", 
               record[n].name, &record[n].test,
               &record[n].quiz, &record[n].English)) == 4)
    {
      printf ("%s\t%.2lf\t%.2lf\t%.2lf\n", 
              record[n].name, record[n].test,
              record[n].quiz, record[n].English);
      ++n;
    }
  printf("%d records processed.\n", n);
  printf("return code from last fscanf() call: %d\n", ret);

  return 0;
}

and running it in a directory containing format1.dat and format2.dat from post #1 in this thread as follows:
Code:
$ ./a.out < *1.dat
Zacker	244.00	244.00	542.00
Lee	265.00	265.00	456.00
Walter	235.00	235.00	212.00
Zena	323.00	215.45	332.50
4 records processed.
return code from last fscanf() call: -1
$ ./a.out < *2.dat
Zacker	244.00	244.00	542.00
Lee	265.00	265.00	456.00
Walter	235.00	235.00	212.00
Zena	323.00	215.45	332.50
Mira	285.00	285.00	415.00
5 records processed.
return code from last fscanf() call: -1
$ cat *.dat | ./a.out
Zacker	244.00	244.00	542.00
Lee	265.00	265.00	456.00
Walter	235.00	235.00	212.00
Zena	323.00	215.45	332.50
Zacker	244.00	244.00	542.00
Lee	265.00	265.00	456.00
Walter	235.00	235.00	212.00
7 records processed.
return code from last fscanf() call: 4

This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 04-06-2015
Quote:
Originally Posted by yifangt
I was wondering what is the correct way to read in data "one-part-per-line" as compared with "one-record-per-line" formats into the same structure in C?
I think questioning how to read a set of data is not exactly correct here. Rather, how to write those files. If you have an option to select in which way data will be given to you I'd much prefer one-record-per-line, as in this case end of line serves naturally as a record separator. In one-per-part-line approach there is extra burden on reader program to always keep track which field is being processed, which record is current, etc... Not to discount plain readability of line-per-record is much better than the other one, which is always helpful.
This User Gave Thanks to migurus For This Post:
# 4  
Old 04-07-2015
Thank you!
Quote:
......plain readability of line-per-record is much better than the other one, which is always helpful
The major reason I post this question is to understand the "flow of the data" to process. I was comparing these situation with awk which by default processes the file by row, or set the RS/FS separator if fields are in different rows.In practice I do come across this situation (one-part-per-line) more often than (one-record-per-line), especially the output from other equipment, and it is not unusual to have > 50x millions records (=200x millions of lines).
Same situation for this data format I can think of is, spaces-containing-string for each variable. It is better to have them in different lines. For example:
Code:
struct bookInfo {
char book_name[100];
char author[100]; 
int pulish_year;                    //Corrected publish_year
char press_name[60];
}

Space is inserted to separate different records for easier view only
Code:
data.txt

C Programming Language  
B. W. Kernighan & D. M. Ritchie
1988
Prentice Hall

C Programming: A Modern Approach 
K. N. King
2008
W. W. Norton & Company

Absolute Beginner’s Guide To C 
Greg Perry
1994
Sams Publishing

C Primer Plus 
Stephen Prata
2004
Sams Publishing

Expert C Programming: Deep C Secrets 
Peter V. Linden 
1994
Prentice Hall

I can't imagine if they are in single line with mixture of spaces tab, quotes etc. Clear delimiter is needed, but it will be a new thread for this problem, as I do have difficulty to read in this type of data into structure in C.

Come back to the code part of my first post of this thread, I did not realize it is related to fscanf() that I was not sure of, but I AM SURE that my problem is related to data "stream" from STDIN or FILE, that's why I use awk RS/FS as a reference.
So, my question is: How does fscanf() processing the second scenario (format2.dat), i.e. record member is broken into different lines instead of being in a single line?

Last edited by yifangt; 04-07-2015 at 05:28 PM..
# 5  
Old 04-07-2015
Using fscanf() is great when you have any type of whitespace as a delimiter and machine produced data. But, if you have data that might be missing a field, it is sometimes hard to detect where the error happened and resync to a known good record boundary.

If you're going to be dealing with data that is delimited by <newline>s, it is usually safer reading into a line buffer (e.g., fgets()) and parsing data from the appropriate lines into the appropriate fields in your structures (strncpy() and sscanf()). It is then easy to detect empty lines as record boundaries, guarantee that you don't overflow the ends of fields in your structures, and verify the data is the correct type for each field as you gather it.

P.S.: I strongly encourage you to change the name of the third field in your bookInfo structure. Many programmers would misspell references to that field as publish_year.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 04-07-2015
Thanks Don:
I am aware what you meant by fgets() + sscanf() + strncpy() to parse input. At this moment I am working with fscanf() and assuming no missing field of the data.
According to this reference:
Code:
int fscanf ( FILE * stream, const char * format, ... );
Reads data from the stream and stores them according to the parameter format into the locations pointed by the additional arguments.
......
On success, the function returns the number of items of the argument list successfully filled.

My unclear part is: can the number of items of the argument list be separated by newline, i.e. in different rows? In other words, fscanf() keeps looking for the defined items until all of them are found, no matter those items are separated by space, tab or newline?
To give more of my confusion by this code fragment:
Code:
  rewind (pFile);                  //Line 11
  fscanf (pFile, "%f", &f);        //Line 12
  fscanf (pFile, "%s", str);       //Line 13

Line 11: set position of pFile stream to the beginning
Line 12: Look for a float number in one line (e.g. line 1)
Line 13: Look for a string in next line(line 2), or the same line(line 1)?
Code:
  rewind (pFile);                          //Line 11
  fscanf (pFile, "%f %s", &f, str);        //Line 12

This time Line 12 would be Looking for a float number AND a string in the same line (line 1)?
Sorry for this naive question, which bugs me. Thanks a lot!

Last edited by yifangt; 04-07-2015 at 06:10 PM..
# 7  
Old 04-07-2015
Isn't this obvious from the code I presented in post #2 in this thread where the fscanf() format string "%s %lf %lf %lf" was able to read four values from one or two lines from format1.dat and able to read four values from four or five lines from format2.dat? And the format string "%s%lf%lf%lf" would have produced the same results but isn't as easy for some people to read.

Have you read the man page for fscanf() recently. Look at it closely. (Characters that are classified as space characters by isspace() are ignored between strings being matched against conversion specifications other than for conversions with a conversion specifier [, c, C, or n.)

Did you try your fscanf() calls in a program? Or, are you just looking at the statements and wondering what they would do? (You could have easily answered this question yourself by putting your code in a program and trying it. And, you could probably have had the results in less time than it took you to type in your post.)
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

We have the data looks like below in a log file. I want to generat files based on the string between two hash(#) symbol like below Source: #ext1#test1.tale2 drop #ext1#test11.tale21 drop #ext1#test123.tale21 drop #ext2#test1.tale21 drop #ext2#test12.tale21 drop #ext3#test11.tale21 drop... (5 Replies)
Discussion started by: Sanjeev G
5 Replies

2. Shell Programming and Scripting

Shell script to correct the data

Hi, I have below data in my flat file.I would like to remove the quotes and comma necessary from the data.Below is the details I would like to have in my output. Could anybody help me providing the Unix shell script for this. Input : ABC,ABC,10/15/2012,"47,936,164.567 ","1,036,997.453... (2 Replies)
Discussion started by: sonu_pal
2 Replies

3. Shell Programming and Scripting

Help to get correct data using awk

I have this input.|user1 |10.10.10.10 |23|046|1726 (212) |0 |user2 |10.10.10.11 |23|046|43 (17) |0 |test |10.10.10.12 |23|046|45 (10) |0 |test1 |10.10.10.13 |23|046|89 (32) |0 I need to get the data for a user like thisuser1 1726 user2 43 test 45 test1 89... (11 Replies)
Discussion started by: Jotne
11 Replies

4. Programming

Storing C++-struct in file - problem when adding new item in struct

Hi, I have received an application that stores some properties in a file. The existing struct looks like this: struct TData { UINT uSizeIncludingStrings; // copy of Telnet data struct UINT uSize; // basic properties: TCHAR szHost; //defined in Sshconfig UINT iPortNr; TCHAR... (2 Replies)
Discussion started by: Powerponken
2 Replies

5. HP-UX

struct utsname throwing error : Value too large to be stored in data type

Hi, I am trying to fetch sysname and nodename using struct utsname. I have two HP-UX servers on with 10 characters and other with 13 characters host name. For the first one I am getting truncated 8 characters as output but for the second one i am getting "Value too large to be stored in data type"... (1 Reply)
Discussion started by: shivarajbm
1 Replies

6. Shell Programming and Scripting

Extracting data from a log file with date formats

Hello, I have a log file for the year, which contains lines starting with the data in the format of YYYY-MM-DD. I need to get all the lines that contain the DD being 04, how would I do this? I tried using grep "*-*04" but it didn't work. Any quick one liners I should know about? Thank you. (2 Replies)
Discussion started by: cpickering
2 Replies

7. UNIX for Dummies Questions & Answers

How to access a struct within a struct?

Can someone tell me how to do this? Just a thought that entered my mind when learning about structs. First thought was: struct one { struct two; } struct two { three; } one->two->three would this be how you would access "three"? (1 Reply)
Discussion started by: unbelievable21
1 Replies

8. Programming

writing binary/struct data to file

I am trying to write binary data to a file. My program below: #include <stdlib.h> #include <stdio.h> struct tinner { int j; int k; }; struct touter { int i; struct tinner *inner; }; int main() { struct touter data; data.i = 10; struct tinner... (4 Replies)
Discussion started by: radiatejava
4 Replies

9. UNIX for Advanced & Expert Users

how to read the data from an excel sheet and use those data as variable in the unix c

I have 3 columns in an excel sheet. c1 c2 c3 EIP_ACCOUNT SMALL_TS_01 select A.* from acc; All the above 3 col shoud be passed a variable in the unix code. 1.How to read an excel file 2.How to pass these data as variable to the unic script (1 Reply)
Discussion started by: Anne Grace
1 Replies

10. Shell Programming and Scripting

Read from data file

Hi, I have a data file formatted like this: Ex: Mike 3434 Jack 481 Peter 12 Alan 926 I want to get this data into 2 variables: "Names" and "Numbers" that I can using one "for" loop to get the value as Names and Numbers Like this: for i in 0 1 2 3 do echo $Names echo... (12 Replies)
Discussion started by: fongthai
12 Replies
Login or Register to Ask a Question