Sponsored Content
Top Forums Shell Programming and Scripting awk print matching records and occurences of each record Post 302924385 by Don Cragun on Sunday 9th of November 2014 03:49:51 AM
Old 11-09-2014
Given that you're using comma as your field separator and there are no commas in either of your sample files, I have no idea how your current script is producing a list of authors for you. And since there are no entries for editor and no more than 1 entry for any author in your input file, I have no idea why you would expect to get that output for the given input. I am also surprised that the spacing in and around names in your input and output files is inconsistent.

Maybe something like the following would come closer to what you said you wanted:
Code:
awk -F '</?author>|</?editor>|</?publisher>|</?coauthor>|</?illustrator>' '
FNR == NR {
	faculty[$1]
	next
}
$2 in faculty {
	count[$0]++
}
END {	for(i in count)
		printf("%d\t%s\n", count[i], i)
}' itu1.txt dblp.xml

You didn't say what OS you're using. If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

With input files:
dblp.xml:
Code:
<incollection mdate="2010-04-20" key="series/sci/GorissenCCD09">
<author>Mathias Gruttner</author>
<coauthor>Erik Gronvall</coauthor>
<coauthor>Tom Dhaene</coauthor>
<title>Automatic Approximation of Cheap Functions with Active Learning.</title>
<pages>1-20</pages>
<author>Dominik Grondziowski</author>
<title>Automatic Approximation of Inexpensive Functions with Active Learning.</title>
<pages>21-24</pages>
<author>Mathias Gruttner</author>
<coauthor>Tobias Grundtvig</coauthor>
<coauthor>Erik Gronvall</coauthor>
<title>Automatic Approximation of Expensive Functions with Active Learning.</title>
<pages>25-34</pages>
<author>Mathias Gruttner</author>
<illustrator>Sigurd Trolle Gronemann</illustrator>
<author>Erik Gronvall</author>
<author>Tom Dhaene</author>
<title>Automatic Approximation of Expensive Functions with Inactive Learning.</title>
<pages>35-62</pages>
<year>2009</year>
<booktitle>Foundations of Computational Intelligence (1)</booktitle>
<publisher>Sigurd Trolle Gronemann</publisher>
<editor>Dominik Grondziowski</editor>
<ee>http://dx.doi.org/10.1007/978-3-642-01082-8_2</ee>
<crossref>series/sci/2009-201</crossref>
<url>db/series/sci/sci201.html#GorissenCCD09</url>
</incollection>

and itu1.txt:
Code:
Mathias Gruttner
Tobias Grundtvig
Erik Gronvall
Sigurd Trolle Gronemann
Dominik Grondziowski

it produces the output:
Code:
2	<coauthor>Erik Gronvall</coauthor>
1	<editor>Dominik Grondziowski</editor>
1	<author>Erik Gronvall</author>
1	<illustrator>Sigurd Trolle Gronemann</illustrator>
1	<publisher>Sigurd Trolle Gronemann</publisher>
3	<author>Mathias Gruttner</author>
1	<author>Dominik Grondziowski</author>
1	<coauthor>Tobias Grundtvig</coauthor>

Is this what you were trying to do?
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk scripting - matching records and summing up time

Hello. I just found out about awk, and it appears that this could handle the problem I'm having right now. I first stumbled on the thread How to extract first and last line of different record from a file, and that problem is almost similar to mine. In my case, an ASCII file will contain the... (0 Replies)
Discussion started by: Gonik
0 Replies

2. Shell Programming and Scripting

Print all the fields of record using awk

Hi, i want to generate print statement using awk. i have 20+ and 30+ fields in each line Now its priting only first eight fields print statement as output not all. my record is as shown below filename ... (2 Replies)
Discussion started by: raghavendra.nsn
2 Replies

3. Shell Programming and Scripting

AWK exclude first and last record, sort and print

Hi everyone, I've really searched for a solution to this and this is what I found so far: I need to sort a command output (here represented as a "cat file" command) and from the second down to the second-last line based on the second row and then print ALL the output with the specified section... (7 Replies)
Discussion started by: dentex
7 Replies

4. Shell Programming and Scripting

Splitting record into multiple records by appending values from an input field (AWK)

Hello, For the input file, I am trying to split those records which have multiple values seperated by '|' in the last input field, into multiple records and each record corresponds to the common input fields + one of the value from the last field. I was trying with an example on this forum... (4 Replies)
Discussion started by: imtiaz99
4 Replies

5. Shell Programming and Scripting

AWK print initial record and double

I have an initial record 0.018 I would like a script that would for i=0;i<200;i++ print 0.018*1 0.018*2 0.018*3 0.018*4 ... 0.018*200 using newline. (7 Replies)
Discussion started by: chrisjorg
7 Replies

6. UNIX for Dummies Questions & Answers

keeping last record among group of records with common fields (awk)

input: ref.1;rack.1;1 #group1 ref.1;rack.1;2 #group1 ref.1;rack.2;1 #group2 ref.2;rack.3;1 #group3 ref.2;rack.3;2 #group3 ref.2;rack.3;3 #group3 Among records from same group (i.e. with same 1st and 2nd field - separated by ";"), I would need to keep the last record... (5 Replies)
Discussion started by: beca123456
5 Replies

7. Shell Programming and Scripting

awk pattern matching name in records

Hi, I'm very new to these forums. I was wondering if someone could help an AWK beginner with a pattern matching an actor to his appearance in movies, which would be stored as records. Let's say we have a database of 4 movies (each movie a record with name, studio + year, and actor fields with... (2 Replies)
Discussion started by: Jill Ceke
2 Replies

8. Shell Programming and Scripting

Modifying text file records, find data in one place in the record and print it elsewhere

Hello, I have some text data that is in the form of multi-line records. Each record ends with the string $$$$ and the next record starts on the next line. RDKit 2D 15 14 0 0 0 0 0 0 0 0999 V2000 5.4596 2.1267 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 ... (5 Replies)
Discussion started by: LMHmedchem
5 Replies

9. Shell Programming and Scripting

awk record matching

ok. so i have a list of country names which have been abbreviated. we'll call this list A i have another list that which contains the what country each abbreviated name means. we'll call this list B. so example of the content of list B: #delimited by tabs #ABBR COUNTRY COUNTRY... (2 Replies)
Discussion started by: SkySmart
2 Replies

10. UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records. My goal is that when $1 of... (2 Replies)
Discussion started by: jvoot
2 Replies
All times are GMT -4. The time now is 07:51 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy