Jump to content

Search html document contents with php | PHP Search Engine


DarkxPunk

Recommended Posts

I am looking for a PHP search engine that searches the contents of HTML documents. I am not expecting someone to simply hand me some code, but I also dont want something I need to buy or install. I found one post on the net with a great tutorial but I lost it and can't find it. I will keep searching, but if anyone has anything, let me know.Thanks!

Link to comment
Share on other sites

If I had a better understanding of PHP I bet, but sadly I am a bit of a noob in this regard... I need a tutorial or something.

Edited by DarkxPunk
Link to comment
Share on other sites

Manual has lot of example with them . Start with the DomDocument constrcutor. Check the examples there. in that example they will have use more function to do a certain job. keep on checking them. with studying each of example recusrively you will cover up the most common methods and usage of domDocument. Also each of function has list at bottom of the page which consists of similar methods which is helpfull. PHP manual is very verbose and easy to use manual i have ever seen with.DomDocument use common DOM API which is very similar like JS DOM.Also the link DD provided have list of methods and their usage in one page. I use those pages as summary cheatsheet. Reading alone those page gives the idea of poteniality and usage of class and function. If you know what you are doing seeking for appropiate function(s) from that page to solve the problem would not be hard. It is not only for DomDocument it applies to other classes too. PHP manual is very consistent. If everything fails or you still need other tutorial or examples you can google them. there is plenty of them around the web.

Link to comment
Share on other sites

We could save time if you explain exactly what you want to search for and how much information you already have about it's location. For example, I once helped someone who wanted to extract her game score from a 3rd-party source. She knew her username, the id of an ancestor div, and the fact that her score would be in a p element. With that information, it was very easy to find her score.

Link to comment
Share on other sites

Well what I am searching for will be broad. The goal is to have a search engine for a site. It will consist of things like blog posts, profiles, etc. so it would be nice to just point the search function to a directory or a few directories where all this stuff is and then search all the text in the documents for whatever is searched. The documents will be html files and how they are formatted will be decided by the poster so I may not know the Id or anything I just want the text to be searched. I know I can also do this from MySQL and may use that as well. But for now I simply want to understand how to search text of HTML documents with php.

Link to comment
Share on other sites

Searching is as easy as opening the document, reading the contents, and checking for text with a function like strpos. That's not precise, but it's easy. It also picks up HTML tags. If they search for "body" it's going to match every <body> tag in every document. But that's the easiest way. If you want more features, then you need to design a more complex search. Using DOMDocument to parse the HTML documents and go through them looking for text nodes instead of tags and attributes is one way to do that. Using something along the lines of a natural language search or query expansion search is another way to get better results. You're asking about a topic that goes from the very very simple, which includes a lot of false positives, to the very very complex. It's up to you to figure out how far you want to take it. There's not one simple answer to a question like how to search through HTML documents. Any novice programmer can create a search engine quickly, which works terribly. If you want a search engine that works well then there's a lot of thinking and researching that you need to do.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...