Dynadot

Get all links from a page

Spaceship Spaceship
Watch
Impact
62
This code will get all links from a page, example. I developed it as part of a simple spider i'm working on.

This is what i'm using it for, obviously it's not finished, but I think its a pretty good (if strange) idea. Needs JavaScript. Only tested in Opera.

PHP:
<pre><?php

$url = $_GET['url'];
$html = file_get_contents($url);
$preg = array();
$base = array();
$links = array();
$parsed = parse_url($url);

preg_match_all("/\<a(\s*)href(\s*)=(\s*)\"(.*?)\"(.*?)\>(.*?)\<\/a\>/i", $html, $preg[0]);
preg_match_all("/\<a(\s*)href(\s*)=(\s*)'(.*?)'(.*?)\>(.*?)\<\/a\>/i", $html, $preg[1]);
preg_match("/\<base(\s*)href(\s*)=(\s*)\"(.*?)\"(\s*)\/\>/i", $html, $base);

$title = array_merge($preg[0][6], $preg[1][6]);
$href = array_merge($preg[0][4], $preg[1][4]);
$base = $base[4];

if(empty($base))
	$base = (!empty($parsed['user'])) ? "{$parsed['scheme']}://{$parsed['user']}:{$parsed['pass']}@{$parsed['host']}" : "{$parsed['scheme']}://{$parsed['host']}";

for($i = 0; $i < count($href); $i ++){
	if(substr($href[$i], 0, 1) == '/')
		$href[$i] = "{$base}{$href[$i]}";
	if(substr($href[$i], 0, 1) == '?' || substr($href[$i], 0, 1) == '#')
		$href[$i] = "{$url}{$href[$i]}";
	$links[$i] = array("title" => htmlentities($title[$i]), "url" => htmlentities($href[$i]));
}

print_r($links);

?></pre>
 
Last edited:
1
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
I like it!

Very good potential on this script, thanks for posting.

You don't suppose you could post/zip the other files, .css, .js (although I think it's inline), and .php - We could of course source them, but it's polite to ask.

Thanks,

Dan
 
0
•••
0
•••
0
•••
Is this Open Source, unrestricted code? I have a commercial use of this, I can send you a finished script with Resell Rights in exchange for full reseller rights usage of the code.

Thanks,

Dan.

(P.S. Please don't say no :()
 
0
•••
Danltn said:
Is this Open Source, unrestricted code? I have a commercial use of this, I can send you a finished script with Resell Rights in exchange for full reseller rights usage of the code.

Thanks,

Dan.

(P.S. Please don't say no :()

Of course.
 
0
•••
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back