Match first column entries precisely and fetch whatever in front of it


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match first column entries precisely and fetch whatever in front of it
# 1  
Old 11-17-2012
Match first column entries precisely and fetch whatever in front of it

Hi all

I have 2 files:

first file
Code:
AABC
TTYP
JKBH
CVBN
NHJK
KJHM

Second file is

Code:
AABC,XCYU,JUHD      Alllele1        GACXT  It is approved study
TTYP,JKBH               Allele2         PPRD      It is clinical trial study   
JKBH                         Allele2         PPRD      It is clinical trial study 
CVBN                      Allele23        PKHGN     It is clinical trial study 
NHJK,CVBN                Allele2         PPRD      It is clinical trial study 
KJHM,CVBN,GHCY,BVCHJ             Allele5         PPRD      It is approved/clinical trial study


If the entries of first file matches with any entry of first column in second file whether it's after comma even(This is problem because my code is only checking first entry not entries after comma)


Then fetch entries present in next columns after it.


Code:
AABC    Alllele1        GACXT  It is approved study
TTYP              Allele2         PPRD      It is clinical trial study
JKBH               Allele2         PPRD      It is clinical trial study   
JKBH                Allele2         PPRD      It is clinical trial study 
CVBN               Allele23        PKHGN     It is clinical trial study 
NHJK,              Allele2         PPRD      It is clinical trial study 
CVBN                Allele2         PPRD      It is clinical trial study 
KJHM                Allele5         PPRD      It is approved/clinical trial study 
CVBN                Allele5         PPRD      It is approved/clinical trial study

The code which I was using is
Code:
awk 'NR==FNR{X[$1]=$0;next}{n=split($1,P," ");sub($1,"",$0);for(i=1;i<=n;i++){if(X[P[i]]){print P[i],$0}}}' file1 FS="\t" file2

which is mapping only first entries not after comma of firs tcolumn of second file

I am also attaching sample from original files(first and second). please checkit
# 2  
Old 11-17-2012
If first.txt didn't have trailing spaces on most lines, my first guess would have worked. But, with the files you attached, the awk program:
Code:
awk 'FNR == NR {f[$1]; next}
{       for(n = split($1, f1, ","); n > 0; n--)
                if(f1[n] in f) {
                        $1 = f1[n]
                        print
                }
}' first.txt FS="\t" OFS="\t" second.txt

produces the following output:
Code:
CTSD    Insulin recombinant     Novolin R (Novo Nordisk)        Approved        For treatment of Type I and II diabetes mellitus.
CTSD    Insulin, porcine        Iletin II       Approved        For the treatment of type I and II diabetes mellitus.
LCT     Vitamin C       Adenex  Approved        Used to treat vitamin C deficiency, scurvy, delayed wound and bone healing, urine acidification, and in general as an antioxidant. It has also been suggested to be an effective antiviral agent.
B4GALT1 N-Acetyl-D-glucosamine  Aflexa  Approved        For the treatment and prevention of osteoarthritis, by itself or in combination with chondroitin sulfate.
PPARD   Icosapent       Not Available   Approved        EPA can be used for lowering elevated triglycerides in those who are hyperglyceridemic. In addition, EPA may play a therapeutic role in patients with cystic fibrosis by reducing disease severity and may play a similar role in type 2 diabetics in slowing the progression of diabetic nephropathy.
NR3C1   Flunisolide     Aerobid Approved        For the maintenance treatment of asthma as a prophylactic therapy.
CHRM1   Cevimeline      Evoxac  Approved        For the treatment of symptoms of dry mouth in patients with Sj&ouml;gren's Syndrome.

# 3  
Old 11-17-2012
Hi Don

It's great that its working for my attached files but hwen I used twoo ther files with different data to do the same thin means to put same conditon it doesnt wrk(output s blank)

I haev attached sample of those two as well.

Please explain me if possible. Thanks a lot!
# 4  
Old 11-17-2012
Hi Don


I also have the same second file with commas in first columdn just lik I was having second input fiel first time still there is no output.
# 5  
Old 11-17-2012
Quote:
Originally Posted by Priyanka Chopra
Hi Don

It's great that its working for my attached files but hwen I used twoo ther files with different data to do the same thin means to put same conditon it doesnt wrk(output s blank)

I haev attached sample of those two as well.

Please explain me if possible. Thanks a lot!
You need to specify your input file formats and stick to them. Smilie

In your first set of input files, the first file has the value you want to match followed by a <space> and a <newline>; and the second file has fields (some of which contain <space> characters) separated by <tab> characters.

In your second set of input files, the first file has the value you want to match followed by a <carriage-return> and a <newline>; and the second file has fields (all of which contain <space> characters, including some trailing <space> characters) separated by zero or more <space> characters followed by a <tab> character.

The following script works for either set of inputs and could probably be updated to accept other garbage in your input (but if that needs to be done I will leave it as an exercise for you to do yourself):
Code:
awk '{  gsub(/\r/, "")
        gsub(/ *\t/, "\t")}
FNR == NR {f[$1]
        next}
{       for(n = split($1, f1, ","); n > 0; n--)
                if(f1[n] in f) {
                        $1 = f1[n]
                        print
                }
}' pHARMGKBT2D.txt FS="\t" OFS="\t" second.txt

Note that the second file in both tests is named second.txt, but they are not the same file. You supplied two files to download named second.txt, but they are not even close to containing the same data and do not contain data in the same format.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 11-17-2012
Sorry for inconvenience Don!

its wrking

I will now check with my all other files!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to dynamically fetch lines after a match?

Hi Friends, How to fetch current hour data from a log file, given below? I want all lines after the match "Wed Aug 13 16:" I have tried below command, but not working. If I put exact string, then it is working. cat /iscp/user/monitor/ORA_errors |awk '/`date +%h" "%d" "%h`/,printed==999 {... (7 Replies)
Discussion started by: suresh3566
7 Replies

2. Shell Programming and Scripting

Match first column and separate entries

Hi I have 2 big files containing following information: file 1 12345 345634 217341 87234693 8236493 file 2: 12345 1237 (6 Replies)
Discussion started by: kaav06
6 Replies

3. Shell Programming and Scripting

Match columns and fetch whatever in front of it

Hi Solved these kind of issues using these codes But these are not wrking for my attached files can anybody check........ awk 'NR==FNR{X++;next}{if(X){print}}' file1 file2 awk 'NR==FNR{X=$0;next}{n=split($1,P," ");sub($1,"",$0);for(i=1;i<=n;i++){if(X]){print P,$0}}}' file1 FS="\t" file2 ... (6 Replies)
Discussion started by: Priyanka Chopra
6 Replies

4. Shell Programming and Scripting

Find common and fetch what ever in front of it

Hi guys As my previous one is not working now I have a different problem with me one file with entries F2 F3 YUH SUH second fiel several columns like excel sheet (8-9) but my file in text F2 fgf gfhjhjghjghj dhgfhgfh 234324 F3 ... (6 Replies)
Discussion started by: Priyanka Chopra
6 Replies

5. Shell Programming and Scripting

Fetch entries in front of specific word till next word

Hi all I have following file which I have to edit for research purpose file:///tmp/moz-screenshot.png body, div, table, thead, tbody, tfoot, tr, th, td, p { font-family: &quot;Liberation Sans&quot;; font-size: x-small; } Drug: KRP-104 QD Drug: Placebo Drug: Metformin|Drug:... (15 Replies)
Discussion started by: Priyanka Chopra
15 Replies

6. Shell Programming and Scripting

Match words and fetch data in front of it in second column

Hi all, I have 2 files one file contain data like this in one column AST3 GSTY4 JST3 second file containign data like this in 2 columns AST3(PAXXX),GSTY4(PAXXY) it is used in diabetes KST4 it is used in blood... (6 Replies)
Discussion started by: manigrover
6 Replies

7. Shell Programming and Scripting

Match the word or words and fetch the entries

Hi all, I have 7 words Now I have 1 file which contain data in large number of rows and columns and 6th column contain any of these words or may be more than one words among above 7 words: I want script should search for the above mentioned 7 words in the 6th column ... (9 Replies)
Discussion started by: manigrover
9 Replies

8. Shell Programming and Scripting

match sentence and word adn fetch similar words in alist

Hi all, I have ot match sentence list and word list anf fetch similar words in a separate file second file with 2 columns So I want the output shuld be 2 columns like this (3 Replies)
Discussion started by: manigrover
3 Replies

9. Shell Programming and Scripting

fetch last line no form file which is match with specific pattern by grep command

Hi i have a file which have a pattern like this Nov 10 session closed Nov 10 Nov 9 08:14:27 EST5EDT 2010 on tty . Nov 10 Oct 19 02:14:21 EST5EDT 2010 on pts/tk . Nov 10 afrtetryytr Nov 10 session closed Nov 10 Nov 10 03:21:04 EST5EDT 2010 Dec 8 Nov 10 05:03:02 EST5EDT 2010 ... (13 Replies)
Discussion started by: Himanshu_soni
13 Replies

10. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies
Login or Register to Ask a Question