Parsing file data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing file data
# 1  
Old 02-16-2013
Parsing file data

Hey Guys,

I'm a novice at shell scripts and i need some help parsing file data.

Basically, I want to write a script that retrieves URLs.

Here is what I have so far.

Code:
#!/bin/bash

echo "Please enter start date (format: yyyy-mm-dd):\c"
read STARTDATE
echo "Please enter end date (format: yyyy-mm-dd):\c"
read ENDDATE
wget -O filename download_location

So this downloads a page from download_location and saves it as filename. I need to parse the downloaded page and retrieve the URLs. Below is a small snippet of the data I'm parsing:

Code:
<a title='http://149.47.192.185/b6caba9f46bef1d14f/w.php'<nobr><center>2013-02-16 21:56:52</center></nobr></td><td align='center'><b>0 / 2</b></td><td><a title='http://199.204.210.238/eda89353bf8789202d999ee8e832c/w.php'

The url is after "a title=" and is enclosed in single quotes ('url_here'). I want to grab the data enclosed in the quotes and discard the rest.

Thanks for your help, I'm really bad at this stuff.

Last edited by silverdust; 02-16-2013 at 07:40 PM.. Reason: looked like crap
# 2  
Old 02-16-2013
Code:
awk -F'=' ' {
                for(i=1;i<=NF;i++) {
                        if($i ~ /a title/) {
                                url=$(i+1);
                                gsub(/'\''| .*/,x,url);
                                print url;
                        }
                }
}' filename

# 3  
Old 02-16-2013
hi bipinajith,

Thanks for your quick response. I don't suppose you could explain this a bit to me so I could understand what's going on here? Sorry, I'm new to scripting.

Also:
What is NF?
What does the ~ represent?

Thank you.
# 4  
Old 02-16-2013
Here is the explanation of code:
Code:
awk -F'=' '                                     # Set = sign as field separator.
{
        for(i=1;i<=NF;i++)                      # for i <= NF ( NF is a special variable in awk and it means number of fields in the current record )
        {
                if($i ~ /a title/)              # if $i ~ /a title/ ( ~ operator matches a pattern or regex )
                {
                        url=$(i+1);             # Set variable: url = $(i+1) which is next field value
                        gsub(/'\''| .*/,x,url); # Remove single quotes and everything followed by blank space from url variable value
                        print url;              # Print value of variable: url
                }
        }
}' filename

Check awk manual pages for further reference:
Code:
man awk

This User Gave Thanks to Yoda For This Post:
# 5  
Old 02-17-2013
Thanks a lot!

---------- Post updated 02-17-13 at 01:11 AM ---------- Previous update was 02-16-13 at 08:04 PM ----------

Sorry, one more question for you.

Suppose that I have URLs that have parameters which include a '='. Since the previous code separates fields with a equals(=), my URLs are wrong.

How could I fix this?
# 6  
Old 02-17-2013
Set single quotes ' as field separator instead of equal to = and see if it works.

Replace: awk -F'=' with awk -F\'

Also replace existing gsub function to sub(/ .*/,x,url);
# 7  
Old 02-17-2013
Works great, you taught me a lot. Thanks again.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing C Data Tipes from Input File

Im really beginner in this case, maybe someone can help me find the answer: if my input file like this: void main(int a, int b){ int x; double y; printf("file"); } and i want output like this: int a int b int x double y A awk script that can parse only data tipe, im confused. what... (2 Replies)
Discussion started by: radynaraya
2 Replies

2. Shell Programming and Scripting

Parsing data using keys from one file

I have 2 text files where I need to parse data from file 2 using the data from file 1. Below are my sample files File 1 (tab delimited) 257 350 670 845 725 1025 767 820 ... .... .... file 2 (tab delimited) 220..450 TA AB650 ABCED 520..850 GA AB720 ABCDE 700..1100 TC AB820 ABCDE... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

3. Shell Programming and Scripting

parsing data from a big file using keys from another smaller file

Hi, I have 2 files format of file 1 is: a1 b2 a2 c2 d1 f3 format of file 2 is (tab delimited): a1 1.2 0.5 0.06 0.7 0.9 1 0.023 a3 0.91 0.007 0.12 0.34 0.45 1 0.7 a2 1.05 2.3 0.25 1 0.9 0.3 0.091 b1 1 5.4 0.3 9.2 0.3 0.2 0.1 b2 3 5 7 0.9 1 9 0 1 b3 0.001 1 2.3 4.6 8.9 10 0 1 0... (10 Replies)
Discussion started by: Lucky Ali
10 Replies

4. Shell Programming and Scripting

parsing a portion of Data from a text file

Hi All, I need some help to effectively parse out a subset of results from a big results file. Below is an example of the text file. Each block that I need to parse starts with "Output of GENE for sequence file 100.fasta" (next block starts with another number). I have given the portion of... (8 Replies)
Discussion started by: Lucky Ali
8 Replies

5. Shell Programming and Scripting

parsing data and incorporating it into another file

Hi, I have a folder that contains many (multiple) files 1.fasta 2.fasta 3.fasta 4.fasta 5.fasta . . 100's of files Each such file have data in the following format for example: vi 1.fasta >AB_1 gi|15835212|ref|NP_296971.1| preprotein translocase subunit SecE... (3 Replies)
Discussion started by: Lucky Ali
3 Replies

6. Shell Programming and Scripting

parsing data and incorporating it into another file

Hi All I have two files: file 1 >AB_1 MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM IQQLKELLAGRPVDIPIYDYKKHTRSNTTFRQDPQDVIIVEGILVLEDERLRDLMDIKLFVDTDDDIRII RRIKRDMMERGRSLESIIDQYTSVVKPMYHQFIEPSKRYADIVIPEGVSNVVAIDVINSKIASILGEV >AB_2... (5 Replies)
Discussion started by: Lucky Ali
5 Replies

7. Shell Programming and Scripting

urgent<parsing data from a excel file>

Hi all, I wud like to get ur assistance in retrieving lines containing l1.My excel dataset contains around 8000 lines.I converted it into a text tab delimiter file and got the lines containing l1,My output is a list of lines containing l1 saved in a outfile.Some of d lines from my outfile s... (5 Replies)
Discussion started by: sayee
5 Replies

8. Shell Programming and Scripting

parsing data file picking out certain fields

I have a file that is large and is broken up by groups of data. I want to take certain fields and display them different to make it easier to read. Given input file below: 2008 fl01 LAC 2589 polk doal xx 2008q1 mx sect 25698541 Sales 08 Dept group lead1 ... (8 Replies)
Discussion started by: timj123
8 Replies

9. Shell Programming and Scripting

Parsing the data in a file

Hi, I have file (FILE.tmp) having contents, FILE.tmp ======== filename=menudata records=0000000000037 ldbname=pinsys timestamp=2005/05/14-18:32:33 I want to parse it bring a new file which will look like, filename records ldbname timestamp... (2 Replies)
Discussion started by: Omkumar
2 Replies

10. Shell Programming and Scripting

Parsing file and extracting the useful data block

Greetings All!! I have a very peculiar problem where I have to parse a big text file and extract useful data out of it with starting and ending block pattern matching. e.g. I have a input file like this: sample data block1 sample data start useful data end sample data block2 sample... (5 Replies)
Discussion started by: arminder
5 Replies
Login or Register to Ask a Question