Tuesday 20 March 2012

Extract Parts Of A URL

Sometimes we need to extract diffrent parts of a url.

For example

http://www.blogger.com/post-create.g

basically,we want to extract

1.  protocal i.e http://
2. Server i.e www.
3. domain name i.e blogger.com

Here we extracts these parts with the help of regex using perl language...assuming that server name will start only with "www" or "m" or "ftp". If you want to specify more server name you can specify by using  "|" operator.

my $url =  "http://www.blogger.com/post-create.g";

$url =~ /([^:]*:\/\/)?((www|ftp|m)\.)?([^\/]*)/;

print "Protocol = $1\n";
print "Server = $2\n";
print "Domain name = $4\n";

 I wrote this regex to stop the access of sites that are unauthorised to some users. And for this i have list of these sites in files of two types.

1st file type contains the URL and domain name both, and
2nd file type contains the domain name.

So, I do these by extracting domain part from the url and using ilike search of the database.

Hope it helps!!!.

Neha goel


No comments:

Post a Comment