Sponsored Content
Full Discussion: Find common entries
Top Forums Shell Programming and Scripting Find common entries Post 302727509 by Don Cragun on Tuesday 6th of November 2012 10:49:06 AM
Old 11-06-2012
Quote:
Originally Posted by manigrover
Hi Don,

Sorry for any inconvenience. Basically both files are in text format in Unix system. I myself put it in doc format to easily upload on the website. So, I am attaching the txt files this time.

Yes, there is a spacing present in different column. Although I am sure that entries are in different column but spacing is not fixed. As i checked after pasting in excel file.

Sorry for any inconvenience. But I always upload files here in doc format.


Mani
First. Please NEVER download files in .doc format. Doing so only makes it impossible to figure out what the real format of your input data and desired output data actually looks like.

Second. Your files do not look like UNIX files at all. They look like oddly formatted DOS files. I say oddly because lines in both files end with <space><carriage-return><newline>. Furthermore, your fields are not consistently <tab> separated and they are not <space> separated; the separator between the 1st and 2nd fields in Secondfile.txt seems to be <space><tab>. So using a line from "first file.txt" as an entry to be found in "Secondfile.txt" will never match anything (since the space and carriage return at the end of the lines in the 1st file do not appear in the even fields in the 2nd file.

Third. The last line in Secondfile.txt contains 9 tab characters, the other lines in that file contain 77 tab characters.

Fourth. Even after cleaning up "first file.txt", there are no entries in that file that match any field in "Secondfile.txt".

So, I added some code to the awk script to clean up both input files and used tab as the input and output field separators. The updated script is:
Code:
awk -F "\t" 'BEGIN {OFS = "\t"}
{       # Fix input oddities (globally change "<space><tab>" to "<tab>" and
        # delete any combination of <space>s, <tab>s, and <carriage-return>s at
        # the end of the each input line.
        x=$0
        gsub(/ \t/, "\t")
        sub(/[ \t\r]+$/, "")
}
FNR == NR {
        # Save the "approved" list from thet 1st file.
        c[$0]
        next
}
{       for(i = 2; i <= NF; i += 2)
                if($i in c)
                        # An entry in an even field in the 2nd file matched an
                        # item in the approved list; mark it approved.
                        $i = $i " (approved)"
        print
}' 'first file.txt' 'Secondfile.txt'

This script seems to do what you need, but (other than changing "<space><tab>" to "<tab>" as the field separator between all fields and throwing away all trailing <space>, <tab>, and <carriage-return> characters; the output is unchanged because no entries in "first file.txt" appear anywhere in "Secondfile.txt".
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies

2. Shell Programming and Scripting

find common data

Hey guys, I have two files. file1 and file2. file1: a,1 b,2 c,343 d,343 e,4343 f,4544 file 2: a, d, e, Now i need to find the common data between these files from file1. i.e a,1 (8 Replies)
Discussion started by: jaituteja
8 Replies

3. Shell Programming and Scripting

Request to check:find out common entries

I have to compare 2 files which means 2 files with common entries in same column and separate those common entries in a diferent file as well right before those entries common so that I can separat common and Uncommon entries in rows in 2 different files. Is it possible For eg. one file ... (3 Replies)
Discussion started by: manigrover
3 Replies

4. Shell Programming and Scripting

Find common entries in 2 list and write data before it

Hi all, I have 2 files: second file I want if entries in one file will match in other file. It shuld wite approve before it so output shuld be (1 Reply)
Discussion started by: manigrover
1 Replies

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

Find common numbers and print yes or no

Hi I have 2 files with following data First file, sp|Q676U5|A16L1_HUMAN, Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2, Maximum coiled-coil residue probability: 0.657 in position 163. Maximum dimeric residue probability: 0.288 in position 163. ... (1 Reply)
Discussion started by: manigrover
1 Replies

7. Shell Programming and Scripting

Find the common values

Hi, I have two files with the below values. file1 305231921 1.0 ben/Ben_Determination_Appeals 1348791394 2.0 ben/Ben_Determination_Appeals] 1305231921 1.0 ben/Cancel_Refund_Payment_JLRS 1348791394 2.0 ben/Cancel_Refund_Payment_JLRS 1305231921 ... (2 Replies)
Discussion started by: Vikram_Tanwar12
2 Replies

8. Shell Programming and Scripting

Find common words

Hi, I have 10 files which needs to be print common words from those all files. Is there any command to find out. (2 Replies)
Discussion started by: munna_dude
2 Replies

9. Shell Programming and Scripting

Find common files between two directories

I have two directories Dir 1 /home/sid/release1 Dir 2 /home/sid/release2 I want to find the common files between the two directories Dir 1 files /home/sid/release1>ls -lrt total 16 -rw-r--r-- 1 sid cool 0 Jun 19 12:53 File123 -rw-r--r-- 1 sid cool 0 Jun 19 12:53... (5 Replies)
Discussion started by: sidnow
5 Replies
PAPS(1) 						      General Commands Manual							   PAPS(1)

NAME
paps - UTF-8 to PostScript converter using Pango SYNOPSIS
paps [options] files... DESCRIPTION
paps reads a UTF-8 encoded file and generates a PostScript language rendering of the file. The rendering is done by creating outline curves through the pango ft2 backend. OPTIONS
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. --landscape Landscape output. Default is portrait. --columns=cl Number of columns output. Default is 1. Please notice this option isn't related to the terminal length as in a "80 culums terminal". --font=desc Set the font description. Default is Monospace 12. --rtl Do right to left (RTL) layout. --paper ps Choose paper size. Known paper sizes are legal, letter and A4. Default is A4. Postscript points Each postscript point equals to 1/72 of an inch. 36 points are 1/2 of an inch. --bottom-margin=bm Set bottom margin. Default is 36 postscript points. --top-margin=tm Set top margin. Default is 36 postscript points. --left-margin=lm Set left margin. Default is 36 postscript points. --right-margin=rm Set right margin. Default is 36 postscript points. --gutter-width=gw Set gutter width. Default is 40 postscript points. --help Show summary of options. --header Draw page header for each page. --markup Interpret the text as pango markup. --lpi Set the lines per inch. This determines the line spacing. --cpi Set the characters per inch. This is an alternative method of specifying the font size. --stretch-chars Indicates that characters should be stretched in the y-direction to fill up their vertical space. This is similar to the texttops behaviour. AUTHOR
paps was written by Dov Grobgeld <dov.grobgeld@gmail.com>. This manual page was written by Lior Kaplan <kaplan@debian.org>, for the Debian project (but may be used by others). April 17, 2006 PAPS(1)
All times are GMT -4. The time now is 02:50 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy