Greping summaries of academic citations


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Greping summaries of academic citations
# 1  
Old 11-09-2012
Greping summaries of academic citations

Hello friends,
I'm trying to grep out sentences. The sentences are previous to an academic citations in a pdf. The goal is to get summaries of citable work.

Her is what I tried reading the MAN page.

Code:
pdftotext foo.pdf | grep -A 5 ***chose reg expression below*** 


pdftotext BioPsych10.pdf | grep -A 5 \([A-Z]*[a-z]\,[1-2][0-9][0-9][0-9]\)

It pauses, but doesn't produce anything. Also it would be nice if I could stop printing at the start of the desired sentence, instead of 5 lines.


These are the regular expressions I will use.
(Daviis, 2004)
Code:
\([A-Z]*[a-z]\,[1-2][0-9][0-9][0-9]\)

(Schultz, 2000) and (White, 1989)
Code:
\([A-Z]*[a-z]\,[1-2][0-9][0-9][0-9]\) and \(, [A-Z]*[a-z]\,[1-2][0-9][0-9][0-9]\)

(Sutter, 1987; Reid and Shapley, 1992)
Code:
\([A-Z]*[a-z]\, [1-2][0-9][0-9][0-9]\; [A-Z]*[a-z] and [A-Z]*[a-z]\, [1-2][0-9][0-9][0-9]\)

(Enroth-Cugell and Robson, 1966)
Code:
\([A-Z]*[a-z]\-[A-Z]*[a-z] and [A-Z]*[a-z]\, [1-2][0-9][0-9][0-9]\

(Barlow, 1961, 1989; Atick and Redlich, 1990; Atick, 1992)
PHP Code:
\([A-Z]*[a-z]\, [1-2][0-9][0-9][0-9]\, [1-2][0-9][0-9][0-9]\; [A-Z]*[a-z] and [A-Z]*[a-z]\, [1-2][0-9][0-9][0-9]\; [A-Z]*[a-z]\, [1-2][0-9][0-9][0-9]\) 
(Dong and Atick, 1995a)
Code:
\([A-Z]*[a-z] and [A-Z]*[a-z]\, [1-2][0-9][0-9][0-9][a-z)\)

Thank you for taking the time to read this. Please let me know if you have any ideas.
# 2  
Old 11-09-2012
can you post some of the output from the below command. And the required output

Code:
 
pdftotext BioPsych10.pdf

# 3  
Old 11-09-2012
You need to use single quotes around your regular expression to protect it from the shell.
# 4  
Old 11-09-2012
And, you're searching for multiple uppercase letters followed by a single lowercase: [A-Z]*[a-z]
# 5  
Old 11-09-2012
And if you are using pdftotext to produce unicode and preserve accented characters it is best to use [[:upper:]] instead of [A-Z] and [[:lower:]] instead of [a-z] , [[:alpha:]] etc.

Last edited by Scrutinizer; 11-09-2012 at 05:13 AM..
# 6  
Old 11-09-2012
This would yield all lines from your sample above but suppress many other text lines. If it's too open, try narrowing it down by becoming more specific, e.g. on the year numbers:
Code:
$ grep -E "([A-Za-z]+, [0-9]{4})" file
(Daviis, 2004)
(Schultz, 2000) and (White, 1989)
(Sutter, 1987; Reid and Shapley, 1992)
(Enroth-Cugell and Robson, 1966)
(Barlow, 1961, 1989; Atick and Redlich, 1990; Atick, 1992)
(Dong and Atick, 1995a)

And, yes, as Scrutinizer proposes, you may want to use the [[:upper:]] and [[:lower:]] classes.
This User Gave Thanks to RudiC For This Post:
# 7  
Old 11-10-2012
Code:
pdftotext BioPsych10.pdf

dl.dropbox. C O M /u/4235339/BioPsych10.txt

It won't let me post urls until I do 5 posts. It's a 2.4 MB file. Connect the .com to see it.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Latex/Bibtex - Error of no match for citations?

I'm trying to add citations to this latex formatted paper. I know the bibtex file is causing the issue. I will try to use this document as a reference once I have a working version. If anyone has an idea what these errors could mean that would help immensely. Any ideas how to fix would also be... (0 Replies)
Discussion started by: D2K
0 Replies

2. Shell Programming and Scripting

greping $2 into a list

Hi When I run this command: lsuser -a auditclasses ALL I got: user1 auditclasses=general,objects,cron,files,rbac,audit,lvm,aixpert user2 auditclasses=general,objects,cron,files,rbac,audit,lvm,aixpert user3 auditclasses=general,objects,cron,files,rbac,audit,lvm,aixpert user4... (7 Replies)
Discussion started by: iga3725
7 Replies

3. Shell Programming and Scripting

need help in greping

Hi, i have to find a string in a file and positin of the string in the file would come in some particular interval. let's say file is 1-1000 lines and string is in from 200-300line. could any one suggest me how to get make the grep search for the string in that particular portion of the... (4 Replies)
Discussion started by: tarakant
4 Replies

4. Shell Programming and Scripting

Greping in between two different lines.

I know you could use the grep "something" -A200 flag to get all the lines past your pattern. Is there a way to get all the lines in between two patterns? The -a flag will not work since all lines in between the two patterns don't have a constant number. (4 Replies)
Discussion started by: jwillis0720
4 Replies

5. Shell Programming and Scripting

greping ip address

I have this line BTSRTRGRP-448-1-1 10.162.141.118/255.255.255.254 - I need to print only the IPADDRESS and not the subnet mask. If i use cut -c30-43 I get the ipaddress, where as in some cases if the last octet is of single digit (10.162.141.8/255.255.255.254) it... (2 Replies)
Discussion started by: miltonrods
2 Replies

6. Red Hat

Rebuilding C++ Libraries to save Linux install (purely academic question)

The crisis is over. I am just doing postmortem on how we handled it. So this is just an exercise in academics. We have a mission critical system running on RED Hat Linux. It is a turnkey system "managed 100% by the vendor". I put this is quotes because we had an event last night that... (0 Replies)
Discussion started by: Skyybugg
0 Replies

7. Shell Programming and Scripting

need help in greping

i have a ksh script : #!/bin/ksh TZ=`date +%Z`+24 ;a=`date +%Y-%m-%d` b=`date +"%H:%M:%S"` cd /ednadtu3/u01/pipe/logs for i in Archiver1.log do cat $i | grep $a | grep $b >> /ednadtu3/u01/pipe/naveed/Insert_Date.txt done... (4 Replies)
Discussion started by: ali560045
4 Replies

8. UNIX for Dummies Questions & Answers

Help with greping a field

Hi, Suppose I have a file as below and I just want the field Invoice Number from this file , How can I do it. /home/arbor>cat PH0034090202314800030IM-001 0Yp825XMilperra NSW 1891 189110H14V1Sp2871Yp300X Customer Service : 0000-368-81118H6.5V0Sp3130Yp50X ... (7 Replies)
Discussion started by: rooh
7 Replies
Login or Register to Ask a Question