Home Man
Search
Today's Posts
Register

OS X is a line of Unix-based graphical operating systems developed, marketed, and sold by Apple.

Noob trying to improve

Tags
beginners, faq, regexp, sed, solved

Login to Reply

 
Thread Tools Search this Thread
# 8  
Old 12-26-2016
This may serve as a starting point (file contains the web content downloaded before):

Code:
awk '/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")
                        sub (/">.*$/, "")
                        print}
' file | sh | awk '
/<\/*title>/ ||
/id=\"price/ ||
/id=\"condition/ ||
/id=\"date_updated/     {gsub (/<[^>]*>/, _)
                         if (length) print
                        }
' 

 
Used GE Lunar DPX Bone Densitometer For Sale - DOTmed Listing #2299124: 
			Price:$20,000.00 USD [convert]
			Condition:Used - Excellent
			Date updated:December  18, 2016
 
New OSTEOSYS DEXXUM T Bone Densitometer For Sale - DOTmed Listing #2299556: 
			Price:$19,990.00 USD [convert]
			Condition:New
			Date updated:December  09, 2016
 
Used HOLOGIC DISCOVERY C Bone Densitometer For Sale - DOTmed Listing #1184884: 
			Price:$19,000.00 USD [convert]
			Condition:Used - Good
			Date updated:December  07, 2016
.
.
.


Last edited by RudiC; 12-26-2016 at 01:41 PM..
The Following User Says Thank You to RudiC For This Useful Post:
Ardzii (12-26-2016)
# 9  
Old 12-26-2016
Hey RudiC!

Thanks for that! It looks really great! I have got a lot of work ahead to understand your lines though

I'll let you know once I'm able to get the results as you did!

Best!

Ardzii
# 10  
Old 12-26-2016
A wee bit improved so you can add the search words at the end as parameters separated by pipe symbols:


Code:
curl -s "https://www.dotmed.com/equipment/2/92/1209/all/offset/0/all?key=&limit=20&price_sort=descending&cond=all&continent_filter=0&zip=&distance=5&att_1=0&att_row_num=1&additionalkeywords=&country=ES" |
awk '/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")
                        sub (/">.*$/, "")
                        print}
' |
sh |
awk '
match ($0, "id=\"(" IDS ")\"")  ||
/<\/*title>/    {gsub (/<[^>]*>/, _)
                 print
                }
' IDS="price|condition|date_updated" 

Used GE Lunar DPX Bone Densitometer For Sale - DOTmed Listing #2299124: 
			Price:$20,000.00 USD [convert]
			Condition:Used - Excellent
			Date updated:December  18, 2016
 
New OSTEOSYS DEXXUM T Bone Densitometer For Sale - DOTmed Listing #2299556: 
			Price:$19,990.00 USD [convert]
			Condition:New
			Date updated:December  09, 2016
 
Used HOLOGIC DISCOVERY C Bone Densitometer For Sale - DOTmed Listing #1184884: 
			Price:$19,000.00 USD [convert]
			Condition:Used - Good
			Date updated:December  07, 2016
.
.
.


Last edited by RudiC; 12-26-2016 at 02:01 PM..
The Following User Says Thank You to RudiC For This Useful Post:
Ardzii (12-26-2016)
# 11  
Old 12-26-2016
OK! I got most of it...

For the first step you get the listing:
Code:
curl -s "https://www.dotmed.com/equipment/2/92/1209/all/offset/0/all?key=&limit=5&price_sort=descending&cond=all&continent_filter=0&zip=&distance=5&att_1=0&att_row_num=1&additionalkeywords=&country=ES" | awk '/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")
                        sub (/">.*$/, "")
                        print}

The second one you created a variable IDS that looks for the price, condition and date_updated and print the results.
Code:
curl -s "https://www.dotmed.com/equipment/2/92/1209/all/offset/0/all?key=&limit=20&price_sort=descending&cond=all&continent_filter=0&zip=&distance=5&att_1=0&att_row_num=1&additionalkeywords=&country=ES" | awk '/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")                         sub (/">.*$/, "")                         print} ' | sh | awk ' match ($0, "id=\"(" IDS ")\"")  || /<\/*title>/    {gsub (/<[^>]*>/, _)                  print                 } ' IDS="price|condition|date_updated"

I added a >> "/Users/myuser/Desktop/test.csv" to get the print exported to a CSV file.
I've been looking around for the past hour now and I can't seem to find how I can put each listing in a line with a ";" dividing the "description" (or "title") and the price, condition and date_updated instead of having 4 lines create per entry.

I know that something has to change between the "||" after the match and before the print, but I have no idea where and how...
Could you help me once more?

Thanks as usual!!

Ardzii
# 12  
Old 12-27-2016
If you'd accept a trailing comma (removal would need additional measures), set the output record separator to comma: ORS=",". As ALL info would come in a long line, then, we need to find out how to separate a single machine's data from the next. I used the begin of a HTML doc for this. Try adding the following to your script
Code:
.
.
.
/^<!DOCTYPE/    {printf RS
                }
END             {printf RS
                }
' IDS="price|condition|date_updated|in_stock" ORS=","

Please be aware that any comma INSIDE fields will lead to misinterpretation if the result is read somewhere else based on comma separated fields.
The Following User Says Thank You to RudiC For This Useful Post:
Ardzii (12-27-2016)
# 13  
Old 12-27-2016
Quote:
Originally Posted by RudiC
If you'd accept a trailing comma (removal would need additional measures), set the output record separator to comma: ORS=",". As ALL info would come in a long line, then, we need to find out how to separate a single machine's data from the next. I used the begin of a HTML doc for this. Try adding the following to your script
Code:
.
.
.
/^<!DOCTYPE/    {printf RS
                }
END             {printf RS
                }
' IDS="price|condition|date_updated|in_stock" ORS=","

Please be aware that any comma INSIDE fields will lead to misinterpretation if the result is read somewhere else based on comma separated fields.
Hey RubiC!

Thanks a lot for your followup! I'm sorry but I think that you're too advanced for me...
I have no idea on how to combine bash commands with HTML and where to insert the new code into my script.

For now I have that:
Code:
curl -s "https://www.dotmed.com/equipment/2/92/1209/all/offset/0/all?key=&limit=20&price_sort=descending&cond=all&continent_filter=0&zip=&distance=5&att_1=0&att_row_num=1&additionalkeywords=&country=ES" | 
awk '
/href.*view more/ {sub (/^[^<]*<a href="/, "curl -s https://www.dotmed.com")
                         sub (/">.*$/, "")
                         print} ' | 
sh | 
awk ' match ($0, "id=\"(" IDS ")\"")  ||
 /<\/*title>/    {gsub (/<[^>]*>/, _)
                  print >> "/Users/MyUser/Desktop/test.txt"
                 } ' IDS="price|condition|date_updated"

I tried already to replace the portion:
Code:
/<\/*title>/    {gsub (/<[^>]*>/, _)
                  print >> "/Users/MyUser/Desktop/test.txt"
                 } ' IDS="price|condition|date_updated"

with your new code:
Code:
/^<!DOCTYPE/    {printf RS
                 }
END             {printf RS
                 } ' IDS="price|condition|date_updated|in_stock" ORS=","

But it yielded a blanc "page" on my terminal. Plus I'm not sure at all on how to export to a file?
Again, I'm truly sorry to be such a burden and would totally understand if you weren't able to help me further!

Oh! And one last thing: Having a coma is perfect, I'll try to deal with the "iner" comas afterwards. Using Excel, I can still fine tune that pretty easily I guess...

Thanks anyways and all the best,

Ardzii

Last edited by RudiC; 12-27-2016 at 10:18 AM.. Reason: Corrected ICODE tags.
# 14  
Old 12-27-2016
Well, I said "add", not "replace". Add the lines after the print statement.

EDIT: And, yes, replace this line:
Code:
                 } ' IDS="price|condition|date_updated"


Last edited by RudiC; 12-27-2016 at 10:56 AM..
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Improve script jiam912 Shell Programming and Scripting 23 11-28-2016 11:31 AM
How to improve an script? jiam912 Shell Programming and Scripting 4 12-15-2014 12:45 PM
Var Check Script (Help improve if possible) whotippedmycow Shell Programming and Scripting 3 03-16-2012 03:06 PM
improve sulog sparcguy AIX 0 04-18-2011 12:10 AM
How to improve throughput? andrewust IP Networking 3 11-13-2010 02:20 PM
SCO noob please help hgibbs8129 SCO 1 02-10-2009 04:32 PM
Improve Performance mazhar99 UNIX for Dummies Questions & Answers 2 08-19-2008 12:52 PM
How to improve grep performance... pooga17 Shell Programming and Scripting 2 02-13-2008 06:34 AM
improve this? blowtorch Shell Programming and Scripting 11 08-04-2005 03:53 AM
Can I improve this script ??? Cameron Shell Programming and Scripting 11 10-22-2002 08:39 PM


All times are GMT -4. The time now is 08:58 PM.

Unix & Linux Forums Content Copyrightę1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password