- Impact
- 62
This code will get all links from a page, example. I developed it as part of a simple spider i'm working on.
This is what i'm using it for, obviously it's not finished, but I think its a pretty good (if strange) idea. Needs JavaScript. Only tested in Opera.
This is what i'm using it for, obviously it's not finished, but I think its a pretty good (if strange) idea. Needs JavaScript. Only tested in Opera.
PHP:
<pre><?php
$url = $_GET['url'];
$html = file_get_contents($url);
$preg = array();
$base = array();
$links = array();
$parsed = parse_url($url);
preg_match_all("/\<a(\s*)href(\s*)=(\s*)\"(.*?)\"(.*?)\>(.*?)\<\/a\>/i", $html, $preg[0]);
preg_match_all("/\<a(\s*)href(\s*)=(\s*)'(.*?)'(.*?)\>(.*?)\<\/a\>/i", $html, $preg[1]);
preg_match("/\<base(\s*)href(\s*)=(\s*)\"(.*?)\"(\s*)\/\>/i", $html, $base);
$title = array_merge($preg[0][6], $preg[1][6]);
$href = array_merge($preg[0][4], $preg[1][4]);
$base = $base[4];
if(empty($base))
$base = (!empty($parsed['user'])) ? "{$parsed['scheme']}://{$parsed['user']}:{$parsed['pass']}@{$parsed['host']}" : "{$parsed['scheme']}://{$parsed['host']}";
for($i = 0; $i < count($href); $i ++){
if(substr($href[$i], 0, 1) == '/')
$href[$i] = "{$base}{$href[$i]}";
if(substr($href[$i], 0, 1) == '?' || substr($href[$i], 0, 1) == '#')
$href[$i] = "{$url}{$href[$i]}";
$links[$i] = array("title" => htmlentities($title[$i]), "url" => htmlentities($href[$i]));
}
print_r($links);
?></pre>
Last edited: