Finding Authors in Common Across Dozens of Lists


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding Authors in Common Across Dozens of Lists
# 1  
Old 03-29-2009
Finding Authors in Common Across Dozens of Lists

I currently have publication lists for ~3 dozen faculty members. I need to find out how many publications are in common across all faculty members - person 1 with person 2, person 1 with person 3, person 2 with person 3, person 1 with both person 2 and person 3, etc.

One person may have
Last1, F1., Last2, F2., with an et al after the first 2 or 3 authors.

Another person may have
Last1, F1, Last2, F2, and list 15 or 20 authors

And another person may have
Last1 F1, Last2 F2, and so on.

Some people have (YYYY) after the authors and before the title of the paper. Some people have (YYYY) at the very end (after journal, volume, and page numbers).

Most people I've talked have said to bite the bullet and do a lot of manual work, like copy the article titles into a new document, sort, then look them up in the original publication lists. There will be hundreds of pages of publications, so I'm not too anxious to do this!

Any ideas/hints will be much appreciated. Thanks, Peggy 3/29
# 2  
Old 03-29-2009

A sample of the file formats would help.
# 3  
Old 03-29-2009
... and your Operating System and preferred Shell.
# 4  
Old 03-29-2009
Quote:
Originally Posted by methyl
... and your Operating System and preferred Shell.

Why? It will be solved using a POSIX shell and standard commands.
# 5  
Old 03-29-2009
This is an analytical problem where we need to define a process. Once the process is defined we decide whether to work manually or program a computer to do the task. The core of this process involves defining a common index key. If you cannot define a common index key the task is impossible.
You may find that one user is more organised than the rest and has a good index key which can be used to index all the publications. Failing that it is up to you to define a universal index key.
Once all documents can be indexed by the same index key the task is feasible.

The choice of software is a secondary issue (cfajohnson please note).

Echoing previous correspondents, please do provide sample data.
# 6  
Old 03-30-2009
Finding Authors in Common Across Dozens of Lists

I apologize for not including more information! Here are 2 examples of the same citation in different formats. It is probably impossible to find a standard key across all of these CVs; I thought maybe I could insert a tab before and after the date where the date is after the authors, dump everything into a SQL table, and then see what sorting on the journal article did.

Examples:
Zandi, P.P., Zöllner, S,, Avramopoulos, D., Willour, V.L., Qin, Z.S., Burmeister, M., Miao, K., Gopalakrishnan, S., Potash, J.B., DePaulo, J.R., McInnis, M.G. Family - based SNP Association Study on 8q24 in Bipolar Disorder. Am J Med Genet B Neuropsychatr Genet. 147B(5): 612-618, 2008

versus

Zandi PP, Zöllner S, Avramopoulos D, Willour VL, Qin ZS, Burmeister M, Miao K, Gopalakrishnan S, Potash JB, DePaulo JR, McInnis MG. (2007) Family-based SNP association study on 8q24 in bipolar disorder. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. E Pub ahead of publication.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Hello, I need to find the intersection across 10 columns. Kindly help. my file (INPUT.csv) looks like this 4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies

2. Shell Programming and Scripting

Finding most common substrings

Hello, I would like to know what is the three most abundant substrings of length 6 from col2. The file is quite large and looks like this col1 col2 EN03 typehellobyedogcatcatdog EN09 typehellobyebyebyebye EN08 dogcatcatdogbyebyebyebye EN09 catcattypehellobyebyebyebye... (9 Replies)
Discussion started by: verse123
9 Replies

3. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Dear All, I have 2 files. If field 1, 2, 4 and 5 matches in both file1 and file2, I want to print the whole line of file1 and file2 one after another in my output file. File1: sc2/80 20 . A T 86 F=5;U=4 sc2/60 55 . G T ... (1 Reply)
Discussion started by: NamS
1 Replies

4. Shell Programming and Scripting

get the lists

I expert, I may cross post something similar but I dirtyed my quesion somehow to be clear in the thread #cat file1 88dee gcc: Grok for callconvention-hard to enable hard float a2ad2 eglibc: package mtrace separately 61487 python: bump PR of packages after update of distutils.bbclass... (1 Reply)
Discussion started by: yanglei_fage
1 Replies

5. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies

6. Shell Programming and Scripting

Shell Script to Create non-duplicate lists from two lists

File_A contains Strings: a b c d File_B contains Strings: a c z Need to have script written in either sh or ksh. Derive resultant files (File_New_A and File_New_B) from lists File_A and File_B where string elements in File_New_A and File_New_B are listed below. Resultant... (7 Replies)
Discussion started by: mlv_99
7 Replies

7. Shell Programming and Scripting

Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention: YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT What I would like to do is automatically discover the part of the filenames that are common to all... (1 Reply)
Discussion started by: cmcnorgan
1 Replies

8. Shell Programming and Scripting

Finding the most common entry in a column

Hi, I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this. e.g. value1,value2,bob value1,value2,bob... (12 Replies)
Discussion started by: Donkey25
12 Replies

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies
Login or Register to Ask a Question