Correct way to read data of different formats into same struct
I was wondering what is the correct way to read in data "one-part-per-line" as compared with "one-record-per-line" formats into the same structure in C?
Using the same structure as:
To process data in format1.dat, I have:
How to read the data in format2.dat into the same struct, especially the while () block:
I was comparing these situation with awk which by default processes the file by row, or set the RS/FS separator if fields are in different rows.
Thanks a lot!
Changing your program slightly so it will compile, read from standard input instead of from a hardcoded filename, and not overwrite memory following your array if you overflow the array:
and running it in a directory containing format1.dat and format2.dat from post #1 in this thread as follows:
This User Gave Thanks to Don Cragun For This Post:
I was wondering what is the correct way to read in data "one-part-per-line" as compared with "one-record-per-line" formats into the same structure in C?
I think questioning how to read a set of data is not exactly correct here. Rather, how to write those files. If you have an option to select in which way data will be given to you I'd much prefer one-record-per-line, as in this case end of line serves naturally as a record separator. In one-per-part-line approach there is extra burden on reader program to always keep track which field is being processed, which record is current, etc... Not to discount plain readability of line-per-record is much better than the other one, which is always helpful.
......plain readability of line-per-record is much better than the other one, which is always helpful
The major reason I post this question is to understand the "flow of the data" to process. I was comparing these situation with awk which by default processes the file by row, or set the RS/FS separator if fields are in different rows.In practice I do come across this situation (one-part-per-line) more often than (one-record-per-line), especially the output from other equipment, and it is not unusual to have > 50x millions records (=200x millions of lines).
Same situation for this data format I can think of is, spaces-containing-string for each variable. It is better to have them in different lines. For example:
Space is inserted to separate different records for easier view only
I can't imagine if they are in single line with mixture of spaces tab, quotes etc. Clear delimiter is needed, but it will be a new thread for this problem, as I do have difficulty to read in this type of data into structure in C.
Come back to the code part of my first post of this thread, I did not realize it is related to fscanf() that I was not sure of, but I AM SURE that my problem is related to data "stream" from STDIN or FILE, that's why I use awkRS/FS as a reference.
So, my question is: How does fscanf() processing the second scenario (format2.dat), i.e. record member is broken into different lines instead of being in a single line?
Using fscanf() is great when you have any type of whitespace as a delimiter and machine produced data. But, if you have data that might be missing a field, it is sometimes hard to detect where the error happened and resync to a known good record boundary.
If you're going to be dealing with data that is delimited by <newline>s, it is usually safer reading into a line buffer (e.g., fgets()) and parsing data from the appropriate lines into the appropriate fields in your structures (strncpy() and sscanf()). It is then easy to detect empty lines as record boundaries, guarantee that you don't overflow the ends of fields in your structures, and verify the data is the correct type for each field as you gather it.
P.S.: I strongly encourage you to change the name of the third field in your bookInfo structure. Many programmers would misspell references to that field as publish_year.
This User Gave Thanks to Don Cragun For This Post:
Thanks Don:
I am aware what you meant by fgets() + sscanf() + strncpy() to parse input. At this moment I am working with fscanf() and assuming no missing field of the data.
According to this reference:
My unclear part is: can the number of items of the argument list be separated by newline, i.e. in different rows? In other words, fscanf() keeps looking for the defined items until all of them are found, no matter those items are separated by space, tab or newline?
To give more of my confusion by this code fragment: Line 11: set position of pFile stream to the beginning Line 12: Look for a float number in one line (e.g. line 1) Line 13: Look for a string in next line(line 2), or the same line(line 1)?
This time Line 12 would be Looking for a float number AND a string in the same line (line 1)?
Sorry for this naive question, which bugs me. Thanks a lot!
Isn't this obvious from the code I presented in post #2 in this thread where the fscanf() format string "%s %lf %lf %lf" was able to read four values from one or two lines from format1.dat and able to read four values from four or five lines from format2.dat? And the format string "%s%lf%lf%lf" would have produced the same results but isn't as easy for some people to read.
Have you read the man page for fscanf() recently. Look at it closely. (Characters that are classified as space characters by isspace() are ignored between strings being matched against conversion specifications other than for conversions with a conversion specifier [, c, C, or n.)
Did you try your fscanf() calls in a program? Or, are you just looking at the statements and wondering what they would do? (You could have easily answered this question yourself by putting your code in a program and trying it. And, you could probably have had the results in less time than it took you to type in your post.)
This User Gave Thanks to Don Cragun For This Post:
We have the data looks like below in a log file.
I want to generat files based on the string between two hash(#) symbol like below
Source:
#ext1#test1.tale2 drop
#ext1#test11.tale21 drop
#ext1#test123.tale21 drop
#ext2#test1.tale21 drop
#ext2#test12.tale21 drop
#ext3#test11.tale21 drop... (5 Replies)
Hi,
I have below data in my flat file.I would like to remove the quotes and comma necessary from the data.Below is the details I would like to have in my output.
Could anybody help me providing the Unix shell script for this.
Input :
ABC,ABC,10/15/2012,"47,936,164.567 ","1,036,997.453... (2 Replies)
I have this input.|user1 |10.10.10.10 |23|046|1726 (212) |0
|user2 |10.10.10.11 |23|046|43 (17) |0
|test |10.10.10.12 |23|046|45 (10) |0
|test1 |10.10.10.13 |23|046|89 (32) |0
I need to get the data for a user like thisuser1 1726
user2 43
test 45
test1 89... (11 Replies)
Hi,
I have received an application that stores some properties in a file. The existing struct looks like this:
struct TData
{
UINT uSizeIncludingStrings;
// copy of Telnet data struct
UINT uSize;
// basic properties:
TCHAR szHost; //defined in Sshconfig
UINT iPortNr;
TCHAR... (2 Replies)
Hi,
I am trying to fetch sysname and nodename using struct utsname. I have two HP-UX servers on with 10 characters and other with 13 characters host name. For the first one I am getting truncated 8 characters as output but for the second one i am getting "Value too large to be stored in data type"... (1 Reply)
Hello,
I have a log file for the year, which contains lines starting with the data in the format of YYYY-MM-DD. I need to get all the lines that contain the DD being 04, how would I do this? I tried using grep "*-*04" but it didn't work.
Any quick one liners I should know about?
Thank you. (2 Replies)
Can someone tell me how to do this?
Just a thought that entered my mind when learning about structs.
First thought was:
struct one
{
struct two;
}
struct two
{
three;
}
one->two->three
would this be how you would access "three"? (1 Reply)
I am trying to write binary data to a file. My program below:
#include <stdlib.h>
#include <stdio.h>
struct tinner {
int j;
int k;
};
struct touter {
int i;
struct tinner *inner;
};
int main() {
struct touter data;
data.i = 10;
struct tinner... (4 Replies)
I have 3 columns in an excel sheet.
c1 c2 c3
EIP_ACCOUNT SMALL_TS_01 select A.* from acc;
All the above 3 col shoud be passed a variable in the unix code.
1.How to read an excel file
2.How to pass these data as variable to the unic script (1 Reply)
Hi,
I have a data file formatted like this:
Ex:
Mike 3434
Jack 481
Peter 12
Alan 926
I want to get this data into 2 variables: "Names" and "Numbers" that I can using one "for" loop to get the value as Names and Numbers
Like this:
for i in 0 1 2 3
do
echo $Names
echo... (12 Replies)