Noob trying to improve


 
Thread Tools Search this Thread
Operating Systems OS X (Apple) Noob trying to improve
# 1  
Old 01-12-2017
Back from the dead

Hey RudiC!

It's been a while I know, but as I said I was busy learning bash SmilieSmilie

Not saying I got it all, I still got a long way to go...
I just wanted to post here what I've been able to do all on my own until now.
It will definitely seem barbaric to you Smilie and less elegant that what you did earlier with the awk command but as I'm not sure how to control it, I'm taking another road Smilie:

Code:
#!/bin/bash

#setting variable for the link construction. This will be the part that comes after the www.dotmed.com"$link" for the second curl
set link

#Setting the index for the while loop. The limit U (constant in the while loop)  will define the amount of equipment to "crawl"
i=1

#Setting the offset variable that helps passing from one href to the next. This variable is used in the first curl link
offset=0

#Starting the loop for the crawl
while [ $i -lt 5 ]
do

#Getting the listing and assigning each listing to the variable "link"
        link=$(curl "https://www.dotmed.com/equipment/2/5/2693/all/offset/$offset/all?key=&limit=1&price_sort=descending&cond=all&continent_filter=0&zip=&distance=5&att_1=0&att_row_num=1&additionalkeywords=&country=ES" | egrep "href.*view more" | sed -n 's/.*href="\([^"]*\).*/\1/p')

#Getting information from each listing
        curl "https://www.dotmed.com$link" | fgrep -e "id=\"price"

#Reseting for next iteration
        unset link      
        (( i++ ))
        (( offset++ ))
done

The great thing is that I can run it on any Linux machine plus I'm getting into each listing with this script to get info from there...
Now I've got to learn more about sed and grep to extract the information I need automatically and I'll be done SmilieSmilie.
Easy right? hopefully I will be able to do it soon.

If you have any comment on the script please be my guest! still trying to learn! SmilieSmilieSmilieSmilieSmilie

All the best!

Last edited by Ardzii; 01-12-2017 at 02:29 PM.. Reason: English
# 2  
Old 01-12-2017
Even if i am not RudiC: you do quite fine.

Quote:
Originally Posted by Ardzii
It will definitely seem barbaric to you Smilie and less elegant that what you did earlier with the awk command but as I'm not sure how to control it, I'm taking another road Smilie:
In (german) medicine there is a proverb: who heals is right. In programming the same is true: as long as a program is doing what it is supposed to do it is kinda hard to argue ... ;-)

A few suggestions, though:

Code:
#setting variable for the link construction. 
local link=""

There is a difference between an unset variable and one that has a value of "" (empty string) or zero (for numbers). What you want is to declare the variable, so you can give some (meaningful) value to it, which is - if this yet to be determined - an empty value. In bash the keyword to declare variables is "local" or "declare" (or even "typeset", perhaps in an effort to be compatible to the Korn shell).

Code:
local -i i=1
local -i offset=1

see above. As a suggestion: always give variables meaningful names. Once your script grows to some length and you juggle around several indexes at the same time you might want to have one i.e. "fooidx" and one "baridx" instead of "i" and "j".

Code:
#Reseting for next iteration
        link=""      
        (( i++ ))
        (( offset++ ))

You don't want to unset (that is: the opposite of define) the variable, just clear its content. So, like in the declaration, you just assign an empty string instead of unsetting it.

As a suggestion: i put commentary always at the same line as the line which it belongs to and always at a fixed horizontal position. Hence, instead of your loop, I'd write:

Code:
#Starting the loop for the crawl
while [ $i -lt 5 ] ; do                     # crawling loop
                                            # getting the link
     link=$( curl "your-link-here" |\
             egrep "href.*view more" |\
             sed -n 's/.*href="\([^"]*\).*/\1/p' \
           )
                                            # extract link
     curl "https://www.dotmed.com$link" | fgrep -e "id=\"price"

        link=""                             # Reset for next iteration
        (( i++ ))
        (( offset++ ))
done

For my eyes this is easier to read, but again: whatever helps you you should do. In the pipeline:

Code:
     link=$( curl "your-link-here" |\
             egrep "href.*view more" |\
             sed -n 's/.*href="\([^"]*\).*/\1/p' \
           )

You can do all in sed without an additional egrep:

Code:
     link=$( curl "your-link-here" |\
             sed -n '/href.*view more/ s/.*href="\([^"]*\).*/\1/p' \
           )

As a rule of thumb: grep/sed/awk | grep/sed/awk is always wrong because it can be done in the respective tool chosen.

I hope this helps and have (more) fun programming.

bakunin
This User Gave Thanks to bakunin For This Post:
# 3  
Old 12-26-2016
This may serve as a starting point (file contains the web content downloaded before):

Code:
awk '/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")
                        sub (/">.*$/, "")
                        print}
' file | sh | awk '
/<\/*title>/ ||
/id=\"price/ ||
/id=\"condition/ ||
/id=\"date_updated/     {gsub (/<[^>]*>/, _)
                         if (length) print
                        }
' 

 
Used GE Lunar DPX Bone Densitometer For Sale - DOTmed Listing #2299124: 
			Price:$20,000.00 USD [convert]
			Condition:Used - Excellent
			Date updated:December  18, 2016
 
New OSTEOSYS DEXXUM T Bone Densitometer For Sale - DOTmed Listing #2299556: 
			Price:$19,990.00 USD [convert]
			Condition:New
			Date updated:December  09, 2016
 
Used HOLOGIC DISCOVERY C Bone Densitometer For Sale - DOTmed Listing #1184884: 
			Price:$19,000.00 USD [convert]
			Condition:Used - Good
			Date updated:December  07, 2016
.
.
.


Last edited by RudiC; 12-26-2016 at 02:41 PM..
This User Gave Thanks to RudiC For This Post:
# 4  
Old 12-26-2016
Hey RudiC!

Thanks for that!SmilieSmilie It looks really great! I have got a lot of work ahead to understand your lines thoughSmilieSmilieSmilie

I'll let you know once I'm able to get the results as you did!

Best!

Ardzii
# 5  
Old 12-26-2016
A wee bit improved so you can add the search words at the end as parameters separated by pipe symbols:


Code:
curl -s "https://www.dotmed.com/equipment/2/92/1209/all/offset/0/all?key=&limit=20&price_sort=descending&cond=all&continent_filter=0&zip=&distance=5&att_1=0&att_row_num=1&additionalkeywords=&country=ES" |
awk '/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")
                        sub (/">.*$/, "")
                        print}
' |
sh |
awk '
match ($0, "id=\"(" IDS ")\"")  ||
/<\/*title>/    {gsub (/<[^>]*>/, _)
                 print
                }
' IDS="price|condition|date_updated" 

Used GE Lunar DPX Bone Densitometer For Sale - DOTmed Listing #2299124: 
			Price:$20,000.00 USD [convert]
			Condition:Used - Excellent
			Date updated:December  18, 2016
 
New OSTEOSYS DEXXUM T Bone Densitometer For Sale - DOTmed Listing #2299556: 
			Price:$19,990.00 USD [convert]
			Condition:New
			Date updated:December  09, 2016
 
Used HOLOGIC DISCOVERY C Bone Densitometer For Sale - DOTmed Listing #1184884: 
			Price:$19,000.00 USD [convert]
			Condition:Used - Good
			Date updated:December  07, 2016
.
.
.


Last edited by RudiC; 12-26-2016 at 03:01 PM..
This User Gave Thanks to RudiC For This Post:
# 6  
Old 12-26-2016
OK! I got most of it... SmilieSmilieSmilie

For the first step you get the listing:
Code:
curl -s "https://www.dotmed.com/equipment/2/92/1209/all/offset/0/all?key=&limit=5&price_sort=descending&cond=all&continent_filter=0&zip=&distance=5&att_1=0&att_row_num=1&additionalkeywords=&country=ES" | awk '/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")
                        sub (/">.*$/, "")
                        print}

The second one you created a variable IDS that looks for the price, condition and date_updated and print the results.
Code:
curl -s "https://www.dotmed.com/equipment/2/92/1209/all/offset/0/all?key=&limit=20&price_sort=descending&cond=all&continent_filter=0&zip=&distance=5&att_1=0&att_row_num=1&additionalkeywords=&country=ES" | awk '/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")                         sub (/">.*$/, "")                         print} ' | sh | awk ' match ($0, "id=\"(" IDS ")\"")  || /<\/*title>/    {gsub (/<[^>]*>/, _)                  print                 } ' IDS="price|condition|date_updated"

I added a >> "/Users/myuser/Desktop/test.csv" to get the print exported to a CSV file.
I've been looking around for the past hour now and I can't seem to find how I can put each listing in a line with a ";" dividing the "description" (or "title") and the price, condition and date_updated instead of having 4 lines create per entry.

I know that something has to change between the "||" after the match and before the print, but I have no idea where and how...
Could you help me once more?SmilieSmilieSmilie

Thanks as usual!!SmilieSmilieSmilieSmilieSmilieSmilie

Ardzii
# 7  
Old 12-27-2016
If you'd accept a trailing comma (removal would need additional measures), set the output record separator to comma: ORS=",". As ALL info would come in a long line, then, we need to find out how to separate a single machine's data from the next. I used the begin of a HTML doc for this. Try adding the following to your script
Code:
.
.
.
/^<!DOCTYPE/    {printf RS
                }
END             {printf RS
                }
' IDS="price|condition|date_updated|in_stock" ORS=","

Please be aware that any comma INSIDE fields will lead to misinterpretation if the result is read somewhere else based on comma separated fields.
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Improve script

Gents, Is there the possibility to improve this script to be able to have same output information. I did this script, but I believe there is a very short code to get same output here my script awk -F, '{if($10>0 && $10<=15) print $6}' tmp1 | sort -k1n | awk '{a++} END { for (n in a )... (23 Replies)
Discussion started by: jiam912
23 Replies

2. Shell Programming and Scripting

How to improve an script?

Gents. I have 2 different scripts for the same purpose: raw2csv_1 Script raw2csv_1 finish the process in less that 1 minute raw2csv_2 Script raw2csv_2 finish the process in more that 6 minutes. Can you please check if there is any option to improve the raw2csv_2. To finish the job... (4 Replies)
Discussion started by: jiam912
4 Replies

3. AIX

improve sulog

I just wrote a very small script that improves readability on system sulog. The problem with all sulog is there is lack of clarity whether the info you are looking at is the most current. So if you just need a simple soution instead of going thru the trouble of writing a script that rotate logs and... (0 Replies)
Discussion started by: sparcguy
0 Replies

4. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Discussion started by: poweroflinux
6 Replies

5. IP Networking

How to improve throughput?

I have a 10Gbps network link connecting two machines A and B. I want to transfer 20GB data from A to B using TCP. With default setting, I can use 50% bandwidth. How to improve the throughput? Is there any way to make throughput as close to 10Gbps as possible? thanks~ :) (3 Replies)
Discussion started by: andrewust
3 Replies

6. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF... (3 Replies)
Discussion started by: sirababu
3 Replies

7. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
Discussion started by: mazhar99
2 Replies

8. Shell Programming and Scripting

improve this?

Wrote this script to find the date x days before or after today. Is there any way that this script can be speeded up or otherwise improved? #!/usr/bin/sh check_done() { if then daysofmth=31 elif then if ... (11 Replies)
Discussion started by: blowtorch
11 Replies

9. UNIX for Advanced & Expert Users

improve performance by using ls better than find

Hi , i'm searching for files over many Aix servers with rsh command using this request : find /dir1 -name '*.' -exec ls {} \; and then count them with "wc" but i would improve this search because it's too long and replace directly find with ls command but "ls *. " doesn't work. and... (3 Replies)
Discussion started by: Nicol
3 Replies

10. Shell Programming and Scripting

Can I improve this script ???

Hi all, Still a newbie and learning as I go ... as you do :) Have created this script to report on disc usage and I've just included the ChkSpace function this morning. It's the first time I've read a file (line-by-bloody-line) and would like to know if I can improve this script ? FYI - I... (11 Replies)
Discussion started by: Cameron
11 Replies
Login or Register to Ask a Question