Using awk to isolate specific rows

10-09-2011

Registered User

7, 0

Join Date: Nov 2009

Last Activity: 3 March 2019, 3:33 PM EST

Posts: 7

Thanks Given: 2

Thanked 0 Times in 0 Posts

Thanks, nice explanation.

kvmreddy

View Public Profile for kvmreddy

Find all posts by kvmreddy

10-09-2011

Registered User

10, 0

Join Date: Oct 2011

Last Activity: 2 December 2012, 3:58 AM EST

Posts: 10

Thanks Given: 9

Thanked 0 Times in 0 Posts

Hey thanks a lot for your help.. Works like a charm now.. I'm now learning how to place in a loop and array a specific list of lecturers to be extracted from the code that you've provided me.. I'll post it here if it works right, or yelp for help if it doesn't work right..

The second input file contains data which is similar to savedrecs.txt.. It's because the ISI Web of Science only allows 500 articles per text file so it has to be broken up to two as there are almost 900 articles..

sidiqmk

View Public Profile for sidiqmk

Find all posts by sidiqmk

10-31-2011

Registered User

10, 0

Join Date: Oct 2011

Last Activity: 2 December 2012, 3:58 AM EST

Posts: 10

Thanks Given: 9

Thanked 0 Times in 0 Posts

Update

Hi, sorry for the late reply.. I couldn't set out to complete what I was going to do, so I took the easy way out and made a clunky script.. I'm just posting it to have somewhat completeness to the posts..

A. I create a directory called nominal, put the input file in it and get rid of records starting with AU, ED and BE. These records are inconsequential and would create problems later if included.

Code:

 rm -r nominal
mkdir nominal
cd nominal
cp ../savedrecs.txt ../savedrecs2.txt .
cat savedrecs2.txt >> savedrecs.txt
rm savedrecs2.txt
sed 's/AU/ /g' savedrecs.txt > savedrecsa.txt
sed /ED/d savedrecsa.txt > savedrecsb.txt
sed /BE/d savedrecsb.txt > savedrecsc.txt
mv savedrecsc.txt savedrecs.txt
rm savedrecsa.txt savedrecsb.txt

B. I make a directory for each professor, dump all articles with their names into the file AllPubs, you have to be careful of the syntax though.

Code:

 mkdir Arof.AK
cd Arof.AK
awk '/Arof, A.*K.*$/ { print > "AllPubs" }' RS= ORS='\n\n' ../savedrecs.txt 
awk '/AROF, A.*K.*$/ { print >> "AllPubs" }' RS= ORS='\n\n' ../savedrecs.txt

C. From AllPubs, sort all articles by publication year..

Code:

 gawk '{ 
  /^PY/ && y=$2
  r = r ? r RS $0 : $0  
 }
/^ER/ {
  print r > y
  r = x;
  }' AllPubs

D. Counts the number of publications each year by the number of times the professors name is mentioned.

Code:

 i=1980
while [ "$i" -lt 2012 ]
do
    grep "Arof, A.*K.*$" "$i" > names
    grep "AROF, A.*K.*$" "$i" >> names
    gawk '{nama[$1 $2 $3]++}
    END {for (name in nama) print name, nama[name]}
    ' names > counted
    rm names

    gawk '
    BEGIN   {
        secondcol=0;
        }
        {
        secondcol+=$2;
        }
    END     {
        printf "Arof, AK %d\n",secondcol;
        }
    ' counted > "counted$i"
    rm counted

E. This dumps the number of publication for a certain year into a file called ByYear

Code:

     gawk '{print '"$i"', $3}' counted$i >> ByYear
    rm "counted$i"
    ((i+=1))
done

cd ..

F. And so on..

Code:

 mkdir Shrivastava.KN
cd Shrivastava.KN
awk '/Shrivastava, K.*N.*$/ { print > "AllPubs" }' RS= ORS='\n\n' ../savedrecs.txt

gawk '{ 
  /^PY/ && y=$2
  r = r ? r RS $0 : $0  
 }
/^ER/ {
  print r > y
  r = x;
  }' AllPubs

i=1980
while [ "$i" -lt 2012 ]
do
    grep "Shrivastava, K.*N.*$" "$i" > names
    gawk '{nama[$1 $2 $3]++}
    END {for (name in nama) print name, nama[name]}
    ' names > counted
    rm names
    
    gawk '
    BEGIN   {
        secondcol=0;
        }
        {
        secondcol+=$2;
        }
    END     {
        printf "Shrivastava, KN %d\n",secondcol;
        }
    ' counted > "counted$i"
    rm counted
    
    gawk '{print '"$i"', $3}' counted$i >> ByYear
    rm "counted$i"
    ((i+=1))
done

cd ..


mkdir Kwek.KH
cd Kwek.KH
awk '/Kwek, K.*H.*$/ { print > "AllPubs" }' RS= ORS='\n\n' ../savedrecs.txt 
awk '/KWEK, K.*H.*$/ { print >> "AllPubs" }' RS= ORS='\n\n' ../savedrecs.txt 

gawk '{ 
  /^PY/ && y=$2
  r = r ? r RS $0 : $0  
 }
/^ER/ {
  print r > y
  r = x;
  }' AllPubs

i=1980
while [ "$i" -lt 2012 ]
do
    grep "Kwek, K.*H.*$" "$i" > names
    grep "KWEK, K.*H.*$" "$i" >> names
    gawk '{nama[$1 $2 $3]++}
    END {for (name in nama) print name, nama[name]}
    ' names > counted
    rm names

    gawk '
    BEGIN   {
        secondcol=0;
        }
        {
        secondcol+=$2;
        }
    END     {
        printf "Kwek, KH %d\n",secondcol;
        }
    ' counted > "counted$i"
    rm counted
    
    gawk '{print '"$i"', $3}' counted$i >> ByYear
    rm "counted$i"
    ((i+=1))
done

cd ..

I guess it would be much more elegant if it all professors initials were to be in an array and just using a while loop.. I'll try to learn how to do this later.. Thanks..

sidiqmk

View Public Profile for sidiqmk

Find all posts by sidiqmk

10-31-2011

Registered User

1,910, 488

Join Date: Sep 2008

Last Activity: 22 December 2019, 2:31 AM EST

Location: San Jose, CA

Posts: 1,910

Thanks Given: 54

Thanked 488 Times in 481 Posts

Code:

...
/"Yousefi.*$"/ && y = "yousefi"
...

"y" is the filename here. Since now you are extracting the data by author, you can either hardcode the file name as shown above or extract the author dynamically and use it as a filename in which case you will get all the entries in separate file with author's name.
--ahamed

ahamed101

View Public Profile for ahamed101

Find all posts by ahamed101

Shell Programming and Scripting

Using awk to isolate specific rows

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract rows that have a specific name

Discussion started by: phil_heath

2. Shell Programming and Scripting

Grep to isolate a text file line and Awk to select a word?

Discussion started by: Ironguru

3. Shell Programming and Scripting

Cutting rows at specific length

Discussion started by: ida1215

4. Shell Programming and Scripting

Extracting specific rows

Discussion started by: CAch

5. UNIX for Dummies Questions & Answers

extract specific rows

Discussion started by: jdhahbi

6. Shell Programming and Scripting

Counting rows line by line from a specific column using Awk

Discussion started by: vnayak

7. Shell Programming and Scripting

awk: isolate a part of a file name

Discussion started by: friend

8. Shell Programming and Scripting

Deleting of Specific Rows.

Discussion started by: gregarion

9. Shell Programming and Scripting

Deleting specific rows in large files having rows greater than 100000

Discussion started by: manish2009

10. UNIX for Dummies Questions & Answers

how can i isolate the random sequence of numbers using awk?

Discussion started by: rcon1