Sponsored Content
Top Forums Shell Programming and Scripting Get common lines from multiple files Post 302437926 by radoulov on Friday 16th of July 2010 03:59:07 PM
Old 07-16-2010
Quote:
Originally Posted by genehunter
[...]
I would like to know if this script is extensible for say a hundred such files?
Also if the files are names differently; not file[a-d], how will the code change.
Could you also give a brief explanation if time permits.
Yes,
in this case the number of arguments (input files) is limited only by your system (MAX_ARGS kernel limit).

As far as the script is concerned, the filenames are irrelevant,
just pass the input files as arguments:

Code:
awk 'END {
  for (R in rec) {
    n = split(rec[R], t, "/")
    if (n > 1) 
      dup[n] = dup[n] ? dup[n] RS sprintf("\t%-20s -->\t%s", rec[R], R) : \
        sprintf("\t%-20s -->\t%s", rec[R], R)
    }
  for (D in dup) {
    printf "records found in %d files:\n\n", D
    printf "%s\n\n", dup[D]
    }  
  }
{  
  rec[$0] = rec[$0] ? rec[$0] "/" FILENAME : FILENAME
  }' <any_filename_1> <any_filename_2> ... <any_filename_n>

I'll post the explanation later.
These 2 Users Gave Thanks to radoulov For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies

2. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns) Sample input: file1: 111 222 0.1 333 444 0.5 555 666 0.4 file 2: 111 222 0.7 555 666... (5 Replies)
Discussion started by: jaysean
5 Replies

3. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 5th column.(tab separated columns) . 3rd and 4th columns corresponds to the row which has highest value for the 5th column. Sample... (2 Replies)
Discussion started by: jaysean
2 Replies

4. Shell Programming and Scripting

Merge multiple lines in same file with common key using awk

I've been a Unix admin for nearly 30 years and never learned AWK. I've seen several similar posts here, but haven't been able to adapt the answers to my situation. AWK is so damn cryptic! ;) I have a single file with ~900 lines (CSV list). Each line starts with an ID, but with different stuff... (6 Replies)
Discussion started by: protosd
6 Replies

5. Shell Programming and Scripting

Find common lines between multiple files

Hello everyone A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was: awk 'END { for (R in rec) { n = split(rec, t, "/") if (n > 1) dup = dup ?... (5 Replies)
Discussion started by: bibb
5 Replies

6. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones... (7 Replies)
Discussion started by: castrojc
7 Replies

7. UNIX for Dummies Questions & Answers

Filter lines common in two files

Thanks everyone. I got that problem solved. I require one more help here. (Yes, UNIX definitely seems to be fun and useful, and I WILL eventually learn it for myself. But I am now on a different project and don't really have time to go through all the basics. So, I will really appreciate some... (6 Replies)
Discussion started by: latsyrc
6 Replies

8. Shell Programming and Scripting

Join common patterns in multiple lines into one line

Hi I have a file like 1 2 1 2 3 1 5 6 11 12 10 2 7 5 17 12 I would like to have an output as 1 2 3 5 6 10 7 11 12 17 any help would be highly appreciated Thanks (4 Replies)
Discussion started by: Harrisham
4 Replies

9. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies

10. Shell Programming and Scripting

Find common lines between all of the files in one folder

Could it be possible to find common lines between all of the files in one folder? Just like comm -12 . So all of the files two at a time. I would like all of the outcomes to be written to a different files, and the file names could be simply numbers - 1 , 2 , 3 etc. All of the file names contain... (19 Replies)
Discussion started by: Eve
19 Replies
XML::SAX::ByRecord(3pm) 				User Contributed Perl Documentation				   XML::SAX::ByRecord(3pm)

NAME
XML::SAX::ByRecord - Record oriented processing of (data) documents SYNOPSIS
use XML::SAX::Machines qw( ByRecord ) ; my $m = ByRecord( "My::RecordFilter1", "My::RecordFilter2", ... { Handler => $h, ## optional } ); $m->parse_uri( "foo.xml" ); DESCRIPTION
XML::SAX::ByRecord is a SAX machine that treats a document as a series of records. Everything before and after the records is emitted as- is while the records are excerpted in to little mini-documents and run one at a time through the filter pipeline contained in ByRecord. The output is a document that has the same exact things before, after, and between the records that the input document did, but which has run each record through a filter. So if a document has 10 records in it, the per-record filter pipeline will see 10 sets of ( start_document, body of record, end_document ) events. An example is below. This has several use cases: o Big, record oriented documents Big documents can be treated a record at a time with various DOM oriented processors like XML::Filter::XSLT. o Streaming XML Small sections of an XML stream can be run through a document processor without holding up the stream. o Record oriented style sheets / processors Sometimes it's just plain easier to write a style sheet or SAX filter that applies to a single record at at time, rather than having to run through a series of records. Topology Here's how the innards look: +-----------------------------------------------------------+ | An XML:SAX::ByRecord | | Intake | | +----------+ +---------+ +--------+ Exhaust | --+-->| Splitter |--->| Stage_1 |-->...-->| Merger |----------+-----> | +----------+ +---------+ +--------+ | | ^ | | | | | +---------->---------------+ | | Events not in any records | | | +-----------------------------------------------------------+ The "Splitter" is an XML::Filter::DocSplitter by default, and the "Merger" is an XML::Filter::Merger by default. The line that bypasses the "Stage_1 ..." filter pipeline is used for all events that do not occur in a record. All events that occur in a record pass through the filter pipeline. Example Here's a quick little filter to uppercase text content: package My::Filter::Uc; use vars qw( @ISA ); @ISA = qw( XML::SAX::Base ); use XML::SAX::Base; sub characters { my $self = shift; my ( $data ) = @_; $data->{Data} = uc $data->{Data}; $self->SUPER::characters( @_ ); } And here's a little machine that uses it: $m = Pipeline( ByRecord( "My::Filter::Uc" ), $out, ); When fed a document like: <root> a <rec>b</rec> c <rec>d</rec> e <rec>f</rec> g </root> the output looks like: <root> a <rec>B</rec> c <rec>C</rec> e <rec>D</rec> g </root> and the My::Filter::Uc got three sets of events like: start_document start_element: <rec> characters: 'b' end_element: </rec> end_document start_document start_element: <rec> characters: 'd' end_element: </rec> end_document start_document start_element: <rec> characters: 'f' end_element: </rec> end_document METHODS
new my $d = XML::SAX::ByRecord->new( @channels, \%options ); Longhand for calling the ByRecord function exported by XML::SAX::Machines. CREDIT
Proposed by Matt Sergeant, with advise by Kip Hampton and Robin Berjon. Writing an aggregator. To be written. Pretty much just that "start_manifold_processing" and "end_manifold_processing" need to be provided. See XML::Filter::Merger and it's source code for a starter. perl v5.10.0 2009-06-11 XML::SAX::ByRecord(3pm)
All times are GMT -4. The time now is 05:45 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy