Compare first column from two csv files with greater than or equal, and less than

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Compare first column from two csv files with greater than or equal, and less than
# 1  
Old 07-19-2016
Compare first column from two csv files with greater than or equal, and less than

I have two csv files of different sizes. The output file needs to have file1 contents on top of file2 contents where file2 col1 is >= to file1 col1, and file2 col1(same value) is < file1 col1 (next value). So basically, some file2 rows will be matched to the same file1 row because it is the closet match that is not exact. Thank you!

file1:
Code:
20160525162519,SN1
20160525162524,SN2
20160525162529,SN3
20160525162534,SN4
20160525162539,SN5
20160525162544,SN6

file2:
Code:
20160525162522,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162523,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162524,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162525,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162526,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162527,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162528,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162529,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162530,464374526,1464193527,206,0,0,0,0,1,1,544,544
20160525162531,464374526,1464193527,206,0,0,0,0,1,1,544,544

Desired output:

Code:
20160525162519,SN1
20160525162522,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162519,SN1
20160525162523,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162524,SN2
20160525162524,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162524,SN2
20160525162525,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162524,SN2
20160525162526,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162524,SN2
20160525162527,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162524,SN2
20160525162528,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162529,SN3
20160525162529,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162529,SN3
20160525162530,464374526,1464193527,206,0,0,0,0,1,1,544,544

20160525162529,SN3
20160525162531,464374526,1464193527,206,0,0,0,0,1,1,544,544


This code almost works, but only displays exact matches.

Code:
awk -F, 'NR==FNR { a[$1]=$0;next}; 
(FNR != NR  ) || ( ($1 in a) && ($0 >= a[$1]) && ($0 < a[$1])){print a[$1]; print}' file1.csv file2.csv > temp






Moderator's Comments:
Mod Comment Please use code tags as required by forum rules!

Last edited by RudiC; 07-19-2016 at 04:27 PM.. Reason: Added / corrected code tags.
# 2  
Old 07-19-2016
Not sure I understood your problem correctly, but this at least works on the samples given:
Code:
awk -F, '
BEGIN           {CNT+=2
                }
NR == FNR       {a[NR] = $0
                 b[NR] = $1
                 next
                }
$1 >= b[CNT]    {CNT++
                }
$1 <  b[CNT]    {print a[CNT-1]
                 print $0, RS
                }
' file1 file2
20160525162519,SN1
20160525162522,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162519,SN1
20160525162523,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162524,SN2
20160525162524,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162524,SN2
20160525162525,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162524,SN2
20160525162526,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162524,SN2
20160525162527,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162524,SN2
20160525162528,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162529,SN3
20160525162529,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162529,SN3
20160525162530,464374526,1464193527,206,0,0,0,0,1,1,544,544 

20160525162529,SN3
20160525162531,464374526,1464193527,206,0,0,0,0,1,1,544,544

# 3  
Old 07-19-2016
Thank you so much! That worked with my previous example files, however, it needs to be adaptable to these new files where now file1 is larger than file2 and it has 2 or more matching time stamps. It still needs to check for >= and < times, even though there may not be any of one of the scenarios.
It doesn’t seem to work with these column 1 time stamps (format issue with “:” and “/” maybe?).

file1:

Code:
TIMEFORMATTED,G_TP01_OPER_ID,G_TP01_OPER_ID(RAW)
2016/05/25 16:23:04,0,0
2016/05/25 16:23:05,0,0
2016/05/25 16:23:06,0,0
2016/05/25 16:23:07,0,0
2016/05/25 16:23:08,0,0
2016/05/25 16:23:09,0,0
2016/05/25 16:23:10,0,0

2016/05/25 16:24:01,0,0
2016/05/25 16:24:02,0,0
2016/05/25 16:24:03,0,0
2016/05/25 16:24:04,0,0
2016/05/25 16:25:05,0,0

2016/05/25 16:25:01,0,0
2016/05/25 16:25:02,0,0
2016/05/25 16:25:03,0,0
2016/05/25 16:25:04,0,0
2016/05/25 16:25:05,0,0

file2:

Code:
TIMEFORMATTED,HDR_SYNC,HDR_SEC,HDR_MSEC,CCSDS_VERSION
2016/05/25 16:24:02,464374526,1464193527,206,0
2016/05/25 16:25:05,464374526,1464193527,206,0

Desired output:

Code:
2016/05/25 16:24:02,0,0
2016/05/25 16:24:02,464374526,1464193527,206,0

2016/05/25 16:25:05,0,0
2016/05/25 16:25:05,464374526,1464193527,206,0



Moderator's Comments:
Mod Comment Please use code tags not icode tags for data as well!

Last edited by RudiC; 07-19-2016 at 05:54 PM.. Reason: Changed ICODE tags for CODE tags.
# 4  
Old 07-21-2016
Does anyone have any ideas how to get this code to work with my previous file examples? RudiC got it to work with my original post, but Iv'e tried many scenarios to have it work more universal on different size files with unpredictable time stamps in column 1. Sometimes file 1 will be larger with more time stamped data and sometimes file 2 may have more time stamped data - either way I still need the >= and < logic to cover all bases. Thank you!!!

Code:
wk -F, '
BEGIN           {CNT+=2
                }
NR == FNR       {a[NR] = $0
                 b[NR] = $1
                 next
                }
$1 >= b[CNT]    {CNT++
                }
$1 <  b[CNT]    {print a[CNT-1]
                 print $0, RS
                }
' file1 file2

# 5  
Old 07-21-2016
Code:
awk 'NR==FNR && NR > 1 {l=$0; s=FS; $0=$0; a[$1]=l; next}
a[$1] {print a[$1]; print $0; print ""}
' FS=, file1 FS=, file2

# 6  
Old 07-21-2016
Code:
awk 'NR==FNR && NR > 1 {l=$0; s=FS; $0=$0; a[$1]=l; next}
a[$1] {print a[$1]; print $0; print ""}
' FS=, file1 FS=, file2

This works with the last file examples (#3 post), but not the first file examples (#1 post). The previous code from RudiC works with the first file examples (#1 post), but not the last (#3 post). I need to find something that handles both scenarios. I'll keep tying - thanks!!
# 7  
Old 07-22-2016
The first rule for producing code that works is to get a clear description of the input format(s) and the desired output format(s). Do not expect us to write code for you that magically guesses correctly at inconsistent input file formats. Note that:
  1. there are no blank lines nor any header lines in the files in post #1, and
  2. there are blank lines in file1 and headers in both input files in post #4.
We might be able to make adjustments for blank or empty lines in your input files (if we can determine whether or not a blank or empty line is a header).

If you give us a clear description of your input file formats that can be used in ALL cases for your various input file formats such that we could be sure programmatically whether or not a line in an input file is a header or data, we might be able to handle that as well. But, I'm not going to attempt to guess at what might appear as data or as headers, how many lines of headers might be present in each input file, what characters might appear in headers, nor what characters might appear in data in other input files you might throw at us later.

Note also that character string comparisons of field 1 in both sets of sample input files happens to work with the date formats used in those files. That might not be true for other date and time formats. (And, if the format used in file1 is not the same format as the date and time format in used in file2 in any pair of input files, the code needed to normalize date and time strings for comparison might be a significant research project unless you pass your script a clear description of the date and time formats used in each input file.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare every column from one csv file to another csv file

1.csv contains following column- Empid code loc port 101 A xy 01 102 B zx 78 103 A cg 12 104 G xy 78 2.csv contains follwing data- Empid code loc port 101 A gf 01 102 B zx 78 103 C cg 32 104 ... (1 Reply)
Discussion started by: rishabh
1 Replies

2. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

3. Shell Programming and Scripting

Compare two csv's with column based

Hi, I am having below two CSV's col_1,col_2,col_3 1,2,4 1,3,6 col_1,col_3,col2,col_5,col_6 1,2,3,4,5 1,6,3,,, I need to compare based on the columns where the mismatch is expected output col_1,col_2,col_3 1,2,4 (3 Replies)
Discussion started by: rohit_shinez
3 Replies

4. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies

5. Shell Programming and Scripting

Compare 2 csv files in ksh and o/p the difference in a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 The... (7 Replies)
Discussion started by: Naresh101
7 Replies

6. UNIX for Dummies Questions & Answers

Use sed on column (csv) file if data in colmns is greater > than?

I have a data file that has 14 columns. I cannot use awk or perl but sed is installed on my host. I would like to delete a line if fields 10, 11 or twelve is greater than 999.99. How is this done using sed? :wall: sed '/^*,*,*,*,*,*,*,*,*,*,*,*,*,*,/d' infile 1 2 3 4 ... (2 Replies)
Discussion started by: Chris Eagleson
2 Replies

7. Shell Programming and Scripting

compare files and remove a line from a file if first column is greater than 25

my files are as follows fileA sepearated by tab /t 00 lieferungen 00 attractiop 01 done 02 forness 03 rasp 04 alwaysisng 04 funny 05 done1 fileB funnymou120112 funnymou234470 mou3raspnhdhv rddfgmoudone1438748 so all those record which are greater than 3 and which are not... (4 Replies)
Discussion started by: rajniman
4 Replies

8. Shell Programming and Scripting

Problem with Greater Than Or Equal To

BASH problem with IS GREATER THAN OR EQUAL TO. I have tried a dozen variations for this IF statement to work with IS GREATER THAN OR EQUAL TO. My code below WORKS. array=( $( /usr/bin/sar -q 1 30 |grep Average |awk '{print $2,$3}' ) ) nthreads="${array}" avproc="${array}" if && ; then ... (6 Replies)
Discussion started by: diex
6 Replies

9. Shell Programming and Scripting

Trying to find files equal to and greater than

Hi Guys and Gals, I'm having some difficulty putting this check into a shell script. I would like to search a particular directory for a number of files. The logic I have is pretty simple: Find file named *.txt that are newer than <this file> and count them If the number of files is equal to... (4 Replies)
Discussion started by: bbbngowc
4 Replies

10. Shell Programming and Scripting

Need to compare two csv files values and write into another csv file

Hi all, Am new to scripting. So i just need your ideas to help me out. Here goes my requirement. I have two csv files 1.csv 2.csv abc,1.24 abc,1 def,2.13 def,1 I need to compare the first column of 1.csv with 2.csv and if matches then need to compare... (2 Replies)
Discussion started by: chinnahyd
2 Replies
Login or Register to Ask a Question