Sponsored Content
Full Discussion: Compare - 1st col of file
Top Forums Shell Programming and Scripting Compare - 1st col of file Post 302353474 by ripat on Tuesday 15th of September 2009 11:57:31 AM
Old 09-15-2009
Quote:
Originally Posted by gch
Except syntax is simpler without awk. Mine was also one line of code and faster. You can test that with command "time". As you noticed I did not have to explain syntax to user.
Well, you will be disapointed. I just ran a benchmark on files with 13000 lines each and here are the results:
Code:
jeanluc@ibm:~/scripts/test$ time nawk -F'|' 'FNR==NR {f1[$1];next} !($1 in f1)' file1 file2 > /dev/null

real	0m0.261s
user	0m0.248s
sys	0m0.008s


jeanluc@ibm:~/scripts/test$ time mawk -F'|' 'FNR==NR {f1[$1];next} !($1 in f1)' file1 file2 > /dev/null

real	0m0.093s
user	0m0.080s
sys	0m0.008s


jeanluc@ibm:~/scripts/test$ time cat file1 file2 | cut -f1 -d \| | sort | uniq -u > /dev/null

real	0m0.943s
user	0m0.888s
sys	0m0.052s
jeanluc@ibm:~/scripts/test$

In your solution you are using three different external programs: cat, sort and uniq which, BTW, is useless as sort can handle that with the -u switch. The penalty for your system (memory and CPU wise) is higher than with a simple awk run.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

compare two col from 2 files, and output uniq from file 1

Hi, I can't find how to achive such thing, please help. I have try with uniq and comm but those command can't compare columns just whole lines, I think awk will be the best but awk is magic for me as of now. file a a1~a2~a3~a4~a6~a7~a8 file b b1~b2~b3~b4~b6~b7~b8 output 1: compare... (2 Replies)
Discussion started by: pp56825
2 Replies

2. Shell Programming and Scripting

compare two files and make 1st file same as 2nd file

I am trying to compare two file and make changes where ever its different. for example: Contents of file1 IP=192.165.89.11 NM=255.255.0.0 GW=192.165.89.1 Contents of file2 IP=192.165.89.11 NM=255.255.255.255 GW=192.165.89.1 NOTE HERE THAT NM IS DIFFERENT So i want the changes... (6 Replies)
Discussion started by: pradeepreddy
6 Replies

3. Ubuntu

Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello, I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file. For example: File 1 has 1411 rows, I ignore how many columns it has (thousands) File 2 has 311 rows, 1 column Would like to... (7 Replies)
Discussion started by: sogi
7 Replies

4. Shell Programming and Scripting

Get columns from another file for match in col 2 in 1st file

Hi, My first file has 592155 9 rs16916098 1 592156 19 rs7249604 1 592157 4 rs885156 1 592158 5 rs350067 12nd file has 9 rs16916098 0 113228129 2 4 19 rs7249604 0 58709070 4 2 2 rs17042833 0 113558750 4 2... (2 Replies)
Discussion started by: genehunter
2 Replies

5. Shell Programming and Scripting

how to add new col in a file

Hi, Experts, I have a requirement as following: my source file: a a a b b c c c c I need add one more colume as following: 1 a 2 a 3 a 1 b 2 b 1 c 2 c (4 Replies)
Discussion started by: ken002
4 Replies

6. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

7. Shell Programming and Scripting

Printing from col x to end of line, except last col

Hello, I have some tab delimited data and I need to move the last col. I could hard code it, awk '{ print $1,$NF,$2,$3,$4,etc }' infile > outfile but it would be nice to know the syntax to print a range cols. I know in cut you can do, cut -f 1,4-8,11- to print fields 1,... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

8. Shell Programming and Scripting

Run a program-print parameters to output file-replace op file contents with max 4th col

Hi Friends, This is the only solution to my task. So, any help is highly appreciated. I have a file cat input1.bed chr1 100 200 abc chr1 120 300 def chr1 145 226 ghi chr2 567 600 unix Now, I have another file by name input2.bed (This file is a binary file not readable by the... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

9. Shell Programming and Scripting

Modifying col values based on another col

Hi, Please help with this. I have several excel files (with and .xlsx format) with 10-15 columns each. They all have the same type of data but the columns are not ordered in the same way. Here is a 3 column example. What I want to do add the alphabet from column 2 to column 3, provided... (9 Replies)
Discussion started by: newbie83
9 Replies

10. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies
SG_RBUF(8)							     SG3_UTILS								SG_RBUF(8)

NAME
sg_rbuf - reads data using SCSI READ BUFFER command SYNOPSIS
sg_rbuf [--buffer=EACH] [--dio] [--help] [--mmap] [--quick] [--size=OVERALL] [--test] [--verbose] [--version] DEVICE sg_rbuf [-b=EACH_KIB] [-d] [-m] [-q] [-s=OVERALL_MIB] [-t] [-v] [-V] DEVICE DESCRIPTION
This command reads data with the SCSI READ BUFFER command and then discards it. Typically the data being read is from a disk's memory cache. It is assumed that the data is sourced quickly (although this is not guaranteed by the SCSI standards) so that it is faster than reading data from the media. This command is designed for timing transfer speeds across a SCSI transport. To fetch the data with a SCSI READ BUFFER command and optionally decode it see the sg_read_buffer utility. There is also a sg_write_buffer utility useful for downloading firmware amongst other things. This utility supports two command line syntaxes, the preferred one is shown first in the synopsis and explained in this section. A later section on the old command line syntax outlines the second group of options. OPTIONS
Arguments to long options are mandatory for short options as well. -b, --buffer=EACH where EACH is the number of bytes to be transferred by each READ BUFFER command. The default is the actual available buffer size returned by the READ BUFFER (descriptor) command. The maximum is the same as the default, hence this argument can only be used to reduce the size of each transfer to less than the device's actual available buffer size. -d, --dio use direct IO if available. This option is only available if the DEVICE is a sg driver device node (e.g. /dev/sg1). In this case the sg driver will attempt to configure the DMA from the SCSI adapter to transfer directly into user memory. This will eliminate the copy via kernel buffers. If not available then this will be reported and indirect IO will be done instead. -h, --help print usage message then exit. -m, --mmap use memory mapped IO if available. This option is only available if the DEVICE is a sg driver device node (e.g. /dev/sg1). In this case the sg driver will attempt to configure the DMA from the SCSI adapter to transfer directly into user memory. This will elimi- nate the copy via kernel buffers. -O, --old switch to older style options. -q, --quick only transfer the data into kernel buffers (typically by DMA from the SCSI adapter card) and do not move it into the user space. This option is only available if the DEVICE is a sg driver device node (e.g. /dev/sg1). -s, --size=OVERALL where OVERALL is the size of total transfer in bytes. The default is 200 MiB (200*1024*1024 bytes). The actual number of bytes transferred may be slightly less than requested since all transfers are the same size (and an integer division is involved rounding towards zero). -t, --time times the bulk data transfer component of this command. The elapsed time is printed out plus a MB/sec calculation. In this case "MB" is 1,000,000 bytes. The gettimeofday() system call is used internally for the time calculation. -v, --verbose increase level of verbosity. Can be used multiple times. -V, --version print out version string then exit. NOTES
This command is typically used on modern SCSI disks which have a RAM cache in their drive electronics. If no IO to the magnetic media, or slower devices like flash RAM, is involved then the disk may be able to source data fast enough to saturate the bandwidth of the SCSI transport. The bottleneck may then be the DMA element in the HBA, the Linux drivers or the host machine's hardware (e.g. speed of RAM). Various numeric arguments (e.g. OVERALL) may include multiplicative suffixes or be given in hexadecimal. See the "NUMERIC ARGUMENTS" sec- tion in the sg3_utils(8) man page. EXAMPLES
On the test system /dev/sg0 corresponds to a fast disk on a U2W SCSI bus (max 80 MB/sec). The disk specifications state that its cache is 4 MB. $ time ./sg_rbuf /dev/sg0 READ BUFFER reports: buffer capacity=3434944, offset boundary=6 Read 200 MiB (actual 199 MiB, 209531584 bytes), buffer size=3354 KiB real 0m5.072s, user 0m0.000s, sys 0m2.280s So that is approximately 40 MB/sec at 40 % utilization. Now with the addition of the "-q" option this throughput improves and the utiliza- tion drops to 0%. $ time ./sg_rbuf -q /dev/sg0 READ BUFFER reports: buffer capacity=3434944, offset boundary=6 Read 200 MiB (actual 199 MiB, 209531584 bytes), buffer size=3354 KiB real 0m2.784s, user 0m0.000s, sys 0m0.000s EXIT STATUS
The exit status of sg_rbuf is 0 when it is successful. Otherwise see the sg3_utils(8) man page. OLDER COMMAND LINE OPTIONS
The options in this section were the only ones available prior to sg3_utils version 1.23 . In sg3_utils version 1.23 and later these older options can be selected by either setting the SG3_UTILS_OLD_OPTS environment variable or using '--old' (or '-O) as the first option. -b=EACH_KIB where EACH_KIB is the number of Kilobytes (i.e. 1024 byte units) to be transferred by each READ BUFFER command. Similar to the --buffer=EACH option in the main description but the units are different. -d use direct IO if available. Equivalent to the --dio option in the main description. -m use memory mapped IO if available. Equivalent to the --mmap option in the main description. -N switch to the newer style options. -q only transfer the data into kernel buffers (typically by DMA from the SCSI adapter card) and do not move it into the user space. Equivalent to the --quick option in the main description. -s=OVERALL_MIB where OVERALL_MIB is the size of total transfer in Megabytes (1048576 bytes). Similar to the --size=OVERALL option in the main description but the units are different. -t times the bulk data transfer component of this command. Equivalent to the --time option in the main description. -v increase level of verbosity. Can be used multiple times. -V print out version string then exit. AUTHOR
Written by Douglas Gilbert REPORTING BUGS
Report bugs to <dgilbert at interlog dot com>. COPYRIGHT
Copyright (C) 2000-2007 Douglas Gilbert This software is distributed under the GPL version 2. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PUR- POSE. SEE ALSO
sg_read_buffer, sg_write_buffer, sg_test_rwbuf(all in sg3_utils) sg3_utils-1.23 January 2007 SG_RBUF(8)
All times are GMT -4. The time now is 12:43 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy