Sponsored Content
Full Discussion: Merge CSV files
Top Forums Shell Programming and Scripting Merge CSV files Post 302945890 by Scrutinizer on Thursday 4th of June 2015 10:18:16 AM
Old 06-04-2015
Quote:
Originally Posted by Don Cragun
[..]
I haven't tried a performance comparison, but the following uses fewer processes and pushes less data through the pipes so it could be faster. [..]
The number of processes between the various non-find solutions does not differ much, but I never realized that the amount of data through the pipes could matter that much. Thank you for that insight..

I did some tests with 10.000 input file and it was up to a factor 5 when selection is done before the pipe (obviously depending on the amount of data that can be saved by selecting before the pipe)..

--
My system does not have a lot of line length limitations , so I was able to use this script (since the header and footer do not matter):
Code:
awk '/[0-9]"$/ && !A[$0]++' *.csv

Which is a little bit faster still, since it doesn't even use a pipe..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge 2 csv files with awk

I have 2 files pipe delimted and want to merge them based on a key e.g file 1 123$aaa$yyy$zzz 345$xab$yzy$zyz 456$sss$ttt$foo 799$aaa$ggg$dee file 2 123$hhh 345$ddd 456$xxx 888$zzz so if the key is the first field, and the result should be the common key between file 1 and 2 (6 Replies)
Discussion started by: loloAix
6 Replies

2. Shell Programming and Scripting

Merge 2 CSV files using sed

Help in writing a script using sed which updates fileOne with the contents from fileTwo Example: Contents of fileOne 1,111111 2,897823 3,235473 4,222222 Contents of fileTwo 1,111111,A,1,2 4,222222,A,2,2 5,374632,A,3,2 6,374654,A,4,2 Final File should be: 1,111111,A,1,2... (9 Replies)
Discussion started by: NewToSed
9 Replies

3. Shell Programming and Scripting

Merge CSV files and create a column with the filename from the original file

Hello everyone!! I am not completely new to shell script but I havent been able to find the answer to my problem and I'm sure there are some smart brains here up for the challenge :D. I have several CSV files that I need to combine into one, but I also need to know where each row came from.... (7 Replies)
Discussion started by: fransanchezoria
7 Replies

4. UNIX for Dummies Questions & Answers

Merge all csv files in one folder considering only 1 header row and ignoring header of all others

Friends, I need help with the following in UNIX. Merge all csv files in one folder considering only 1 header row and ignoring header of all other files. FYI - All files are in same format and contains same headers. Thank you (4 Replies)
Discussion started by: Shiny_Roy
4 Replies

5. Shell Programming and Scripting

Merge *.csv files, each in separate sheets

Does anyone know how to Merge *.csv files, each in seperate sheets? (7 Replies)
Discussion started by: frhling
7 Replies

6. UNIX for Dummies Questions & Answers

Need help combining txt files w/ multiple lines into csv single cell - also need data merge

:confused:Hello -- i just joined the forums. I am a complete noob -- only about 1 week into learning how to program anything... and starting with linux. I am working in Linux terminal. I have a folder with a bunch of txt files. Each file has several lines of html code. I want to combine... (2 Replies)
Discussion started by: jetsetter
2 Replies

7. UNIX for Dummies Questions & Answers

Merge two csv files using column name

Hi all, I have two separate csv files(comma delimited) file 1 and file 2. File 1 contains PAN,NAME,Salary AAAAA5467D,Raj,50000 AAFAC5467D,Ram,60000 BDCFA5677D,Kumar,90000 File 2 contains PAN,NAME,Dept,Salary ASDFG6756T,Karthik,ABC,450000 QWERT8765Y,JAX,CDR,780000... (5 Replies)
Discussion started by: Nivas
5 Replies

8. Shell Programming and Scripting

I am trying to merge all csv files from source path into 1 file

I am trying to merge all csv files from source path into one single csv file in target. but getting error message: hadoop fs -cat /user/hive/warehouse/stage.db/PK_CLOUD_CHARGE/TCH-charge_*.csv > /user/hive/warehouse/stage.db/PK_CLOUD_CHARGE/final/TCH_pb_charge.csv getting error message:... (0 Replies)
Discussion started by: cplusplus1
0 Replies

9. Shell Programming and Scripting

Compare and merge two big CSV files

Hi all, i need help. I have two csv files with a huge amount of data. I need the first column of the first file, to be compared with the data of the second, to have at the end a file with the data not present in the second file. Example File1: (only one column) profile_id 57036226... (11 Replies)
Discussion started by: SirMannu
11 Replies

10. UNIX for Beginners Questions & Answers

Merge the three csv files as one according to first coloumn.

I have three files with similar pattern i need to merge all the coloumns side by side from all three files according to the first coloumn example as shown below I mentioned 5 coloumns only in example but i have around 15 coloumns in each file. file1: Name,Samples,Error,95RT,90RT... (4 Replies)
Discussion started by: Raghuram717
4 Replies
PIPE(2) 							System Calls Manual							   PIPE(2)

NAME
pipe - create an interprocess channel SYNOPSIS
pipe(fildes) int fildes[2]; DESCRIPTION
The pipe system call creates an I/O mechanism called a pipe. The file descriptors returned can be used in read and write operations. When the pipe is written using the descriptor fildes[1] up to 4096 bytes of data are buffered before the writing process is suspended. A read using the descriptor fildes[0] will pick up the data. Writes with a count of 4096 bytes or less are atomic; no other process can inter- sperse data. It is assumed that after the pipe has been set up, two (or more) cooperating processes (created by subsequent fork calls) will pass data through the pipe with read and write calls. The Shell has a syntax to set up a linear array of processes connected by pipes. Read calls on an empty pipe (no buffered data) with only one end (all write file descriptors closed) returns an end-of-file. SEE ALSO
sh(1), read(2), write(2), fork(2) DIAGNOSTICS
The function value zero is returned if the pipe was created; -1 if too many files are already open. A signal is generated if a write on a pipe with only one end is attempted. BUGS
Should more than 4096 bytes be necessary in any pipe among a loop of processes, deadlock will occur. ASSEMBLER
(pipe = 42.) sys pipe (read file descriptor in r0) (write file descriptor in r1) PIPE(2)
All times are GMT -4. The time now is 01:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy