Sponsored Content
Top Forums Shell Programming and Scripting awk print matching records and occurences of each record Post 302924381 by iori on Saturday 8th of November 2014 11:44:38 PM
Old 11-09-2014
awk print matching records and occurences of each record

Hi all , I have two files : dblp.xml with dblp records and itu1.txt with faculty members records. I need to find out how many dblp records are related to the faculty members. More specific: I need to find out which names from itu1.txt are a match in dblp. xml file , print them and show how many times they occur in dblp as publisher, editors , coauthors, authors etc ...excatly how many dblp records has each of them . The names appears in dblp just as authors and editor

This is how files looks:

dblp.xml
------------
Code:
<incollection mdate="2010-04-20" key="series/sci/GorissenCCD09">
<author>Mathias Gruttner</author>
<author>Tobias Grundtvig</author>
<author>Erik Gronvall</author>
<author>Tom Dhaene</author>
<title>Automatic Approximation of Expensive Functions with Active Learning.</title>
<pages>35-62</pages>
<year>2009</year>
<booktitle>Foundations of Computational Intelligence (1)</booktitle>
<ee>http://dx.doi.org/10.1007/978-3-642-01082-8_2</ee>
<crossref>series/sci/2009-201</crossref>
<url>db/series/sci/sci201.html#GorissenCCD09</url>
</incollection>
....

-----------

itu1.txt
---------------------
Code:
Mathias Gruttner
Tobias Grundtvig
Erik Gronvall
Sigurd Trolle Gronemann
Dominik  Grondziowski
.....

Now I have this awk script which does display the authors name but it doesn't show the correct result + I don't know how to print the occurrences for each author.

Code:
awk -F, '\
BEGIN {
while ((getline < "dblp.xml") > 0)
   file2[$2]=$3
}

{longest=0
 for (name in file2)
    if (name == substr($1,1,length(name)))
       if (length(name)>longest)
          {holdname=name
           longest=length(name)}
 if (longest>0)
     loc=file2[holdname]
 else
     loc=""
 print $0 "," loc
}' itu1.txt

desired output:
Code:
25 <author>Mathias Gruttner</author>
34<author>Tobias Grundtvig</author>
3<editor> Erik Gronvall </editor>

.....

Could someone tell me what I am doing wrong? any help appreciated. Thanks a lot
Moderator's Comments:
Mod Comment Please use CODE tags for sample input and output as well for sample code.

Last edited by Don Cragun; 11-09-2014 at 03:22 AM.. Reason: Add missing CODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk scripting - matching records and summing up time

Hello. I just found out about awk, and it appears that this could handle the problem I'm having right now. I first stumbled on the thread How to extract first and last line of different record from a file, and that problem is almost similar to mine. In my case, an ASCII file will contain the... (0 Replies)
Discussion started by: Gonik
0 Replies

2. Shell Programming and Scripting

Print all the fields of record using awk

Hi, i want to generate print statement using awk. i have 20+ and 30+ fields in each line Now its priting only first eight fields print statement as output not all. my record is as shown below filename ... (2 Replies)
Discussion started by: raghavendra.nsn
2 Replies

3. Shell Programming and Scripting

AWK exclude first and last record, sort and print

Hi everyone, I've really searched for a solution to this and this is what I found so far: I need to sort a command output (here represented as a "cat file" command) and from the second down to the second-last line based on the second row and then print ALL the output with the specified section... (7 Replies)
Discussion started by: dentex
7 Replies

4. Shell Programming and Scripting

Splitting record into multiple records by appending values from an input field (AWK)

Hello, For the input file, I am trying to split those records which have multiple values seperated by '|' in the last input field, into multiple records and each record corresponds to the common input fields + one of the value from the last field. I was trying with an example on this forum... (4 Replies)
Discussion started by: imtiaz99
4 Replies

5. Shell Programming and Scripting

AWK print initial record and double

I have an initial record 0.018 I would like a script that would for i=0;i<200;i++ print 0.018*1 0.018*2 0.018*3 0.018*4 ... 0.018*200 using newline. (7 Replies)
Discussion started by: chrisjorg
7 Replies

6. UNIX for Dummies Questions & Answers

keeping last record among group of records with common fields (awk)

input: ref.1;rack.1;1 #group1 ref.1;rack.1;2 #group1 ref.1;rack.2;1 #group2 ref.2;rack.3;1 #group3 ref.2;rack.3;2 #group3 ref.2;rack.3;3 #group3 Among records from same group (i.e. with same 1st and 2nd field - separated by ";"), I would need to keep the last record... (5 Replies)
Discussion started by: beca123456
5 Replies

7. Shell Programming and Scripting

awk pattern matching name in records

Hi, I'm very new to these forums. I was wondering if someone could help an AWK beginner with a pattern matching an actor to his appearance in movies, which would be stored as records. Let's say we have a database of 4 movies (each movie a record with name, studio + year, and actor fields with... (2 Replies)
Discussion started by: Jill Ceke
2 Replies

8. Shell Programming and Scripting

Modifying text file records, find data in one place in the record and print it elsewhere

Hello, I have some text data that is in the form of multi-line records. Each record ends with the string $$$$ and the next record starts on the next line. RDKit 2D 15 14 0 0 0 0 0 0 0 0999 V2000 5.4596 2.1267 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 ... (5 Replies)
Discussion started by: LMHmedchem
5 Replies

9. Shell Programming and Scripting

awk record matching

ok. so i have a list of country names which have been abbreviated. we'll call this list A i have another list that which contains the what country each abbreviated name means. we'll call this list B. so example of the content of list B: #delimited by tabs #ABBR COUNTRY COUNTRY... (2 Replies)
Discussion started by: SkySmart
2 Replies

10. UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records. My goal is that when $1 of... (2 Replies)
Discussion started by: jvoot
2 Replies
All times are GMT -4. The time now is 05:12 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy