2. Now for the 1st log you can see I have invalid(none11111) 5th column. So I have to look for the actual 5th column value. 'id' column will help you to find that. So you have to run another grep based on the 'id' value so that you can find the actual 5th column in the same log file.
3. If you see the second log it has the exact matching 'id' value. So what I have to do I have to take the 5th column(UD3BSAp8appncXlZ) from the second log instead of the invalid one(none11111).
Output:-
20120829001415, UD3BSAp8appncXlZ, linkId=1ddoic
Note:- I have bunch of log files where I have to perform the above procedure. But I have to come up with a single file as output after grepping through all the log files.
It has a format like abc-2012-10-01_00000,abc-2012-10-01_00001.... etc.
Hope this time it makes clear.
Thanks for looking into it.
This is a big improvement over what you have posted before, but there are still some ambiguities.
You say that you're showing the log file structure and say that you need fields 2, 5, and 14 and then show two lines from one or two log files. Note that the second record has 13 fields (and the last fields appears to be incomplete); not 14??? If we are to determine what is supposed to happen we need to know whether or not field 14 in both lines has the same value (linkId=1ddoic). (I.e., do both of these lines appear in the output of your first grep:
And, PLEASE USE CODE TAGS when presenting file contents!
Let me try restating the problem to determine if I understand what you want done:
In some places you say there are two log files, in other places you say there is one log file but you grep it twice. Which is it?
If there is a single log file, both greps and the conversion to the desired output can be done by reading the log file just once with awk if the output order doesn't matter.
The first time you read the log file, you look for entries in column 14 that match a given value (linkId=xxx) and ignore anything that doesn't match.
For lines that were selected in step 2, if column 5 is not "none1111", skip to step 5.
Read the same log file again (or read the second log file) looking for a line in the log where field 12 (id=yyy) matches field 12 in the line matched in step 3 AND collumn 5 is not "none1111". Use the value found in column 5 in this line as a replacement for field 5 in the line matched by step 3.
Print column 2 (from the line matched in step 3), a comma, column 5 (from the line matched in step 3 [updated by the line found in the second reading of the log file if it contained "none1111" in the line matched in step 3]), a comma, and column 12 (from the line matched in step 3 with "id=" at the start of the field removed).
Is this algorithm correct?
Is there one input log file or two? What is its (or are their) name(s)?
Is step 4 only supposed to be performed for lines that have the same contents for fields 14? Or, is any field 14 value OK as long as the contents of field 12 matches a field 12 in a line that has a field 12 that doesn't contain "none1111"?
Note that this is three comma separated values; not four values that were specified in the first several messages on this thread. Is this correct? If not, where does the other output field come from?
Note also that the early messages specified","as the separator between fields, but in the latest messages you specify", " intead of",". Is","the correct separator?
Does the order of the output lines matter?
The Note in your message:
Quote:
Note:- I have bunch of log files where I have to perform the above procedure. But I have to come up with a single file as output after grepping through all the log files.
It has a format like abc-2012-10-01_00000,abc-2012-10-01_00001.... etc.
doesn't make things clear at all. We have not seen anything like this list of values in any of the samples you have shown us. Are you saying you have to create a file with a single line that contains a comma separated list of an unspecified number of entries that consist of strings that are created using the following format string to printf:"abc-%s_%05d"where data printed by the%sis used to print something that comes from a date utility format string%Y-%m-%drun on the first day of next month and the%05dis used to print a sequence number? Please explain what the entries in this list mean, how many of them there are, and why this list is useful!
I am trying my best.
This is my first grep command.
1. yes. Though the initial fields(upto 10th column) are constant across all type of log entries but others will vary. The 2 example log I have given are two different type of log generated for 2 different events. So you will not get the linkId attribute in 2nd log entry. Even you do not need to bother about that because you just need to pick 5th column from 2nd log and replace 1st log after checking the id field if that matches. But in real scenario you have to grep through the entire log file to look for the id value found in 1st log.
Quote:
In some places you say there are two log files, in other places you say there is one log file but you grep it twice. Which is it?
If there is a single log file, both greps and the conversion to the desired output can be done by reading the log file just once with awk if the output order doesn't matter.
i) I have multiple log files where I need to grep(like abc-2012-10-01_00000,abc-2012-10-01_00001.... etc.) and output 2nd, 5th and 14th column.
ii) While grepping through all the log files those invalid 2nd column will appear which is not intended.In the same file from where invalid 2nd columns were found valid 2nd columns can be found from there only by looking through the matching 'id' attribute value. It is upto you if you can achieve my goal in single grep.
My algorithm:-
i) for each file run above grep
for each row got from above grep if 2nd column is invalid(none11111)
run another grep on same file and replace invalid 2nd column with new one.
I am trying my best.
This is my first grep command.
1. yes. Though the initial fields(upto 10th column) are constant across all type of log entries but others will vary. The 2 example log I have given are two different type of log generated for 2 different events. So you will not get the linkId attribute in 2nd log entry. Even you do not need to bother about that because you just need to pick 5th column from 2nd log and replace 1st log after checking the id field if that matches. But in real scenario you have to grep through the entire log file to look for the id value found in 1st log.
i) I have multiple log files where I need to grep(like abc-2012-10-01_00000,abc-2012-10-01_00001.... etc.) and output 2nd, 5th and 14th column.
ii) While grepping through all the log files those invalid 2nd column will appear which is not intended.In the same file from where invalid 2nd columns were found valid 2nd columns can be found from there only by looking through the matching 'id' attribute value. It is upto you if you can achieve my goal in single grep.
My algorithm:-
i) for each file run above grep
for each row got from above grep if 2nd column is invalid(none11111)
run another grep on same file and replace invalid 2nd column with new one.
If the two sample lines from your log files are as you have shown in past posts, the command line you specify above is equivalent to the command:
There are no tab characters in you input files, so the cut command in your pipeline is a no-op. So this command line throws away duplicate lines found in your log files and sorts the remaining lines on the first field. It does NOT limit the output to only columns 2, 5, and 14 from your input files; does NOT produce a CSV file with three fields (and if it did; it wouldn't contain the id=value fields that say are to be used in a second grep to look for the invalid values found while processing the output from your first grep).
In message #8 in this thread, I asked nine questions. You partially answered some of the questions although, as noted above, the answer doesn't match the other statements you've made.
I want to help you solve this problem, but if you won't answer the questions (and give answers that match your data), it is obvious that I'm wasting my time. If you would like us to try to give you a working solution please answer ALL of these questions:
What are the actual commands you execute to convert your log files into the CSV file that you want processed?
Doesabc-2012-10-01_000*match the names of all of the log files (and only those log files) that you want to process?
When you findnone11111in your CSV file, will theid=xxxfield ever match more than one line (not containingnone11111) in your log files that aren't exact duplicates of other lines?
Am I correct in assuming that the line matching theid=xxxfield with the value needed to replacenone11111in your CSV file, will not be on a line that was selected by a grep on the linkId field you're processing?
Is the field separator you want in your output file","or", "?
Does the order of lines in your output file matter?
What is the purpose of having an additional single-line output file containing a comma separated list of all of your log files? If you need a file containing a list of the log files processed, wouldn't it be better to have the filenames on separate lines instead of separated by commas on a single line?
Will thelinkId=zzzfield ever appear in any log file that isn't exactly of the same form as the following example line from one of your log files?
This User Gave Thanks to Don Cragun For This Post:
Thank you for your cooperation. Here I am trying to list the answer of your questions.
Quote:
What are the actual commands you execute to convert your log files into the CSV file that you want processed?
Quote:
Does abc-2012-10-01_000* match the names of all of the log files (and only those log files) that you want to process?
Yes.
Quote:
When you find none11111 in your CSV file, will the id=xxx field ever match more than one line (not containing none11111 ) in your log files that aren't exact duplicates of other lines?
Yes. it matches more than one line.
Quote:
Am I correct in assuming that the line matching theid=xxx field with the value needed to replacenone11111 in your CSV file, will not be on a line that was selected by a grep on the linkId field you're processing?
Yes. of course. It will never be in same line.
Quote:
Is the field separator you want in your output file"," or", " ?
Only comma. No space.
Quote:
Does the order of lines in your output file matter?
Yes. It matters. It has to be in sorted order of timestamp.
Quote:
What is the purpose of having an additional single-line output file containing a comma separated list of all of your log files? If you need a file containing a list of the log files processed, wouldn't it be better to have the filenames on separate lines instead of separated by commas on a single line?
This is not the additional file. This is the output file that I use as input to generate the final output. After I create the output file after replacing the invalid 'none1111' field I read that file and on top of those values I do some database call and then creates a report.
Quote:
Will thelinkId=zzz field ever appear in any log file that isn't exactly of the same form as the following example line from one of your log files?
Yes. It will appear. To resolve that problem we have to run the second grep like below.
The purpose of running above grep is that. The id
can appear in two different event, either 'page' or 'clk'. We can take any one 5th column from that log. And also this log will be found in the same file where 'none11111' was found.
Suppose for linkId=1ddoic we found one invalid 'none11111' value in 5th column of the log file abc-2012-10-01_00002 then in abc-2012-10-01_00002 file only the corresponding id should be found with proper 5th column.
The command lines you have shown with the two sample lines you have shown from your log files don't come close to providing the data that you say they will. I also note that your last post (message #11 in this thread) is the first time you mention anything about log file field #10 being used to determine the final report.
I have tried to interpret your requirements and come up with a script that should come close to what you have said you need. Given that you have only let us see one complete log file line and one abbreviated log file line, I have low confidence that this will actually do what you want, but I believe it meets the requirements you've been willing to share.
To try it out, save the following script in a file named match2:
Make it executable by running the command:
and invoke it with:
to produce a report containing log file entries found in the log files you specified for linkId=1ddoic sorted by timestamp in the file named output_file.
Although this script specifies ksh, it should also work with sh and bash. (It won't work with csh or tcsh.)
Hello all
As part of my TUI - (line based) Text User Interface, i do have 2 commands to assist working with conf files.
Now, for me they work, but since i wrote them, i automaticly use them they way they should be used... you know what i mean. ;)
Anyway, they are designed to read 'simple'... (3 Replies)
Hi
I am trying to extract information out of a file but keep getting grep cant open errors
the code is below:
#bash
#extract orders with blank address details
#
# obtain the current date
# set today to the current date ccyymmdd format
today=`date +%c%m%d | cut -c24-31`
echo... (8 Replies)
In the code below i try to write and read from a file, but i get unexpected results, like after writing i cannot open the file, and when reading the file the value entered earlier is not shown
bool creat_fragments(int nFragment)
{
int fd, rand_value;
char frag_path, buf;
for(int... (8 Replies)
Need to develop a unix shell script for the below requirement and I need your assistance:
1) search for file.log and file.bad file in a directory and read them
2) pull out "Load_Start_Time", "Data_File_Name", "Error_Type" from log file
4) concatinate each row from bad file as... (3 Replies)
Hi All,
Please help me in writing data to a file in one row.
In database there is a column which contains large data which does not fit in the file in one row. The column contains list of paths. I want to write these paths to a file in one row.
Please find the code below writes :
... (2 Replies)
Hello everybody,
I'm trying to code a program which makes the following:
It sends an ARP request frame and when it gets the reply, extracts the IP address of source and writes it to a .txt file. This is gonna be done with many hosts (with a for() loop), so, the text file would look like... (2 Replies)
Hi all,
I have the following shell script code which tries to sftp and writes the log into the log file.
TestConnection ()
{
echo 'Connection to ' $DESTUSERNAME@$DESTHOSTNAME
$SETDEBUG
if ]; then rm $SCRIPT ; fi
touch $SCRIPT
echo "cd" $REMOTEDIR >> $SCRIPT
echo "quit" >>... (10 Replies)
Hi my prob statement is to create a new file or to append to the 1tst file the followign chages.
File 1: txt file.
portfolio No a b c d
abc 1 Any Any Any charString
cds 2 values values values charString
efd 3 can can can charString
fdg 4 come come come charString... (4 Replies)
Hi All,
Not sure if this would be in a dummies sectiin or advanced. I'm looking for a script if someone has doen something like this.
I have list of files -
adc_earnedpoints.20070630.txt
adc_earnedpoints.20070707.txt
adc_earnedpoints.20070714.txt
adc_earnedpoints.20070721.txt... (1 Reply)
Hi gurus
I am not a C programmer but I need to read and write files on a computer on LAN using IP address.
Suppose on a computer that has an IP 192.168.0.2
Any help or code example. I did in JAVA using URL, but do not know how to do in ANSI-C.
In java:
-------
URL url = new... (3 Replies)