Thanks Birei and I will report you the output in few hours.. Too bad that we have some urgent meetings coming up and I don't have time to test this but can't wait for sure!!!
Thanks for the quick reply and I shall get back on this shortly!!
Cheers,
Andy
---------- Post updated 02-10-11 at 11:36 AM ---------- Previous update was 02-09-11 at 06:24 PM ----------
Ok, following is what worked for me after MUCH required help from Franklin!!
I used the same script provided by Franklin to get my URLs filtered -
PHP Code:
sed -n 's!.*service=\(http://[^/]*/[^/]*/\).*!\1!p' file
However, I still haven't tested the script part provided by Franklin yet and I will post the output later.
Another problem I faced while trying to filter output received after using Franklin's script was I had few URLs with BIG strings with special characters and had to use following to get rid of them. (To get rid of &, ? and ' ' basically AND have them sorted)
PHP Code:
cat old.file | awk -F \& '{print $1}' | awk -F \? '{print $1}' |awk -F ' ' '{print $1}'| sort -u > output.txt
One more problem I have is while trying to remove duplicate lines, I need to treat lower and upper cases in URLs carefully as below -
For following duplicate lines, I need to have only two URLs since currently they are all being treated as UNIQUE URLs.
(note: Separate IPs don't matter since I am only concerned with lower and upper case letters)
PHP Code:
56.555.72.69/crm_ababcdves/
81.745.42.59/CRM_Ababcdves/
38.475.62.19/squitv3/
92.625.42.89/Squitv3/
37.288.30.12/cview/
63.598.30.89/Cview/
85.048.30.52/CView/
So final output should be -
PHP Code:
56.555.72.69/crm_ababcdves/
38.475.62.19/squitv3/
37.288.30.12/cview/
Now if someone can help me with this, that would be really great since I am a newbie on these things so far though getting better since past few days.
Cheers,
Andy