Jump to content
Sign in to follow this  
Balderick

check if a url address has a page with content

Recommended Posts

I made a piece of code to check wheter a url page exists.

This is a part of the get_headers routine I'm using:

			<?php 
			 
			 // check with checkdnsrr
			 // validate with FILTER_VALIDATE_URL
			 
			 ///////////////////////////////////////////////////
			 
			 // get_headers part
						
				$array = get_headers($url);
				
				var_dump($array);

				$string = $array[0];
				if( strpos($string,"200") || strpos($string,"301") || strpos($string,"302") || strpos($string,"403")   ) {
					
					
											var_dump($url);

				   
											} 
											else {
												echo '<br><br> this site is insecure<br>';
										 echo '<br> use http instead of https <br><br>';
											}

			   }


			   ?>

 

The problem is that some websites configure their server either http or https but this does not mean that there is page content.

What can be used to determine if there is an index.php or an index.html or any alternative indexpage used?

The things I'm considering are: file_exists, file_get_contents or glob() on the other hand var_dump($array) gives in several cases the parameter: Content Type. The value is txt/html . Can this be used to see if there is a page?
  

What would you recommend and can you give an example how this is used?

 

 

 

 

 

Share this post


Link to post
Share on other sites

If you get a 301 or 302 response, check the Location header in the array and make another get_headers() call to that URL to see if it exists.

Share this post


Link to post
Share on other sites

The error is like this:

Warning: get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 unrecognized name in .....

it concerns the line with:

$array = get_headers($url);

 

the url input is:



if I change htttps to http then I dont have any error messages.

I would like to find a way to check in advance what kind of url is put in to avoid the error messages.

Anyone got any idea?

 

 

 

Share this post


Link to post
Share on other sites

You can use parse_url to break up the URL, and then check the protocol to see if it's https or not.  

If you want to skip the certificate validation which is causing that error, you can do that also:

https://stackoverflow.com/questions/37274206/get-headers-ssl-operation-failed-with-code-1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

×
×
  • Create New...