08-15-2016
Okay, I apologize Don, I am having a hard time getting this across. Hopefully I can answer some the of questions.
1. Will you guarantee that all timestamps in both of the files that will ever be processed on the same calendar date? Or, are there two date fields that have to be processed? Yes, both files will always be processed with the same calendar date because they are ran almost simultaneously. Yes, only because fields 16 thru 21 (some fields are duplicated because of the RAW field) fields are the time (epoch) that our telemetry extractor converts and then creates field 1 timestamp. My example stopped at field 18 because my real files have about 600 fields and data points. So technically there are 3 time fields - Day Of Year, time in milliseconds, time in microseconds. I said filed 21 in an earlier post because there were 3 added fields in newer files. In these particular files, they don't have these extra fields, but as long as I chose the correct msec field, my sort works properly.
CCSDS_DOY,CCSDS_DOY(RAW),CCSDS_MSEC
20550,20550,67522104
2. If there are two date fields (presumably fields 17 and 18 in the sample files in post #29), are those two date fields always adjacent in the input files? Yes, all the date fields 16-21 are always adjacent in both files.
3. And, repeating a question that has already been asked twice: Will the date field (or fields) used in file1 be the same as the field (or fields) used in file2? Yes, both files use the same time fields.
4. Will the milliseconds field in your files be set to the string 3600000 corresponding the exactly 1:00:00am or to the string 03600000 (i.e., are all values leading 0 padded to 8 digits, or are the values just the decimal number of milliseconds since midnight with no leading 0 fill)? (Note that the sort you were using in your examples sorting on field 21 would not work if that field does not have leading 0 fill.) It is an 8 digit decimal number.
5. Will you supply the field number(s) as parameters to your script, or are the field headings for the date field(s) in the two files constants that the script is supposed to find when reading the header lines? I only used the field numbers when I sorted off of the “msec” field (i.e sort -t -k,18,18 file1 file2) and that provided and accurate sort.
6. And, since at least one of the date fields is the last field in all of your sample input files, I will ask again: Are your input files in UNIX text file format or DOS text file format? (This might not matter on your system, but it does matter on the system I'm using to test my code.). These files are .csv files processed on a Linux platform.
7. If your input files are in DOS text file format, do you want output in DOS format or UNIX format? (DOS, UNIX, and don't care are valid answers to this question.) Unix format, but they will end up being .csv files after processing (not DOS).
8. And, obviously, supply us with the complete contents of your latest sample files (including some with different dates if the data in your real files won't always be for a single date) along with the expected output from those sample inputs. I will provide more when I get to a PC later or tomorrow. A couple of days ago, I tried sending my “real” files , but this site kept giving me errors when uploading.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi, i've two files (file1, file2) i want to take value (in column1) and search in file2 if the they match print the value from file2.
this is what i have so far.
awk 'FILENAME=="file1"{ arr=$1 }
FILENAME=="file2"
{print $0}
' file1 file2 (2 Replies)
Discussion started by: myguess21
2 Replies
2. Shell Programming and Scripting
Hi,
I have file1 like this
aaa
ggg
ddd
vvv
eeeand file2
aaa 2
aaa 443
xxx 76
aaa 34
ggg 33
wee 99
ggg 33
ddd 1
ddd 10
ddd 98
sds 23 (4 Replies)
Discussion started by: polsum
4 Replies
3. UNIX for Dummies Questions & Answers
I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string.
I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies
4. Shell Programming and Scripting
I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this:
awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt
It works for all indexes.
And then I have second file with years and indexes per year, one per line... (3 Replies)
Discussion started by: phoebus
3 Replies
5. Shell Programming and Scripting
Hi.
How can we print those rows of file2 which are mentioned in file1. first character of file1 is a row number.. for eg
file1
1:abc
3:ghi
6:pqr
file2
a abc
b def
c ghi
d jkl
e mno
f pqr
... (6 Replies)
Discussion started by: Abhiraj Singh
6 Replies
6. Shell Programming and Scripting
I have two files.
File 1 is a two-column index file, e.g.
comp11084_c0_seq6:130-468(-) comp12746_c0_seq3:140-478(+)
comp11084_c0_seq3:201-539(-) comp12746_c0_seq2:191-529(+)
File 2 is a sequence file with headers named with the same terms that populate file 1. ... (1 Reply)
Discussion started by: pathunkathunk
1 Replies
7. Shell Programming and Scripting
I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1.
file1:
EN_comp12952_c0_seq3:367-1668
ES_comp17168_c1_seq6:1-864
EN_comp13395_c3_seq14:231-1088... (5 Replies)
Discussion started by: pathunkathunk
5 Replies
8. Shell Programming and Scripting
Hi, I wanted to add each row of file2.txt to entire length of file1.txt given the sample data below and save it as new file. Any idea how to efficiently do it. Thank you for any help.
input file
file1.txt file2.txt
140 30 200006 141 32
140 32 200006 142 33
140 35 200006 142... (5 Replies)
Discussion started by: ida1215
5 Replies
9. Shell Programming and Scripting
I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited.
I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies
10. UNIX for Beginners Questions & Answers
This is a question that is related to one I had last August when I was trying to sort/merge two files by millsecond time column (in this case column 6).
The script (below) that helped me last august by RudiC solved the puzzle of sorting/merging two files by time, except it gets lost when the... (0 Replies)
Discussion started by: aachave1
0 Replies
LEARN ABOUT OPENDARWIN
join
JOIN(1) BSD General Commands Manual JOIN(1)
NAME
join -- relational database operator
SYNOPSIS
join [-a file_number | -v file_number] [-e string] [-o list] [-t char] [-1 field] [-2 field] file1 file2
DESCRIPTION
The join utility performs an ``equality join'' on the specified files and writes the result to the standard output. The ``join field'' is
the field in each file by which the files are compared. The first field in each line is used by default. There is one line in the output
for each pair of lines in file1 and file2 which have identical join fields. Each output line consists of the join field, the remaining
fields from file1 and then the remaining fields from file2.
The default field separators are tab and space characters. In this case, multiple tabs and spaces count as a single field separator, and
leading tabs and spaces are ignored. The default output field separator is a single space character.
Many of the options use file and field numbers. Both file numbers and field numbers are 1 based, i.e. the first file on the command line is
file number 1 and the first field is field number 1. The following options are available:
-a file_number
In addition to the default output, produce a line for each unpairable line in file file_number.
-e string
Replace empty output fields with string.
-o list
The -o option specifies the fields that will be output from each file for each line with matching join fields. Each element of list
has the either the form 'file_number.field', where file_number is a file number and field is a field number, or the form '0' (zero),
representing the join field. The elements of list must be either comma (``,'') or whitespace separated. (The latter requires quot-
ing to protect it from the shell, or, a simpler approach is to use multiple -o options.)
-t char
Use character char as a field delimiter for both input and output. Every occurrence of char in a line is significant.
-v file_number
Do not display the default output, but display a line for each unpairable line in file file_number. The options -v 1 and -v 2 may be
specified at the same time.
-1 field
Join on the field'th field of file 1.
-2 field
Join on the field'th field of file 2.
When the default field delimiter characters are used, the files to be joined should be ordered in the collating sequence of sort(1), using
the -b option, on the fields on which they are to be joined, otherwise join may not report all field matches. When the field delimiter char-
acters are specified by the -t option, the collating sequence should be the same as sort(1) without the -b option.
If one of the arguments file1 or file2 is ``-'', the standard input is used.
DIAGNOSTICS
The join utility exits 0 on success, and >0 if an error occurs.
COMPATIBILITY
For compatibility with historic versions of join, the following options are available:
-a In addition to the default output, produce a line for each unpairable line in both file 1 and file 2.
-j1 field
Join on the field'th field of file 1.
-j2 field
Join on the field'th field of file 2.
-j field
Join on the field'th field of both file 1 and file 2.
-o list ...
Historical implementations of join permitted multiple arguments to the -o option. These arguments were of the form
'file_number.field_number' as described for the current -o option. This has obvious difficulties in the presence of files named
'1.2'.
These options are available only so historic shellscripts don't require modification and should not be used.
STANDARDS
The join command conforms to IEEE Std 1003.1-2001 (``POSIX.1'').
SEE ALSO
awk(1), comm(1), paste(1), sort(1), uniq(1)
BSD
April 18, 2002 BSD