DarkxPunk Posted January 10, 2013 Share Posted January 10, 2013 I am looking for a PHP search engine that searches the contents of HTML documents. I am not expecting someone to simply hand me some code, but I also dont want something I need to buy or install. I found one post on the net with a great tutorial but I lost it and can't find it. I will keep searching, but if anyone has anything, let me know.Thanks! Link to comment Share on other sites More sharing options...
jeffman Posted January 10, 2013 Share Posted January 10, 2013 (edited) This might be what you want. I'm not sure Edited January 10, 2013 by Deirdre's Dad Link to comment Share on other sites More sharing options...
DarkxPunk Posted January 10, 2013 Author Share Posted January 10, 2013 (edited) If I had a better understanding of PHP I bet, but sadly I am a bit of a noob in this regard... I need a tutorial or something. Edited January 10, 2013 by DarkxPunk Link to comment Share on other sites More sharing options...
birbal Posted January 10, 2013 Share Posted January 10, 2013 Manual has lot of example with them . Start with the DomDocument constrcutor. Check the examples there. in that example they will have use more function to do a certain job. keep on checking them. with studying each of example recusrively you will cover up the most common methods and usage of domDocument. Also each of function has list at bottom of the page which consists of similar methods which is helpfull. PHP manual is very verbose and easy to use manual i have ever seen with.DomDocument use common DOM API which is very similar like JS DOM.Also the link DD provided have list of methods and their usage in one page. I use those pages as summary cheatsheet. Reading alone those page gives the idea of poteniality and usage of class and function. If you know what you are doing seeking for appropiate function(s) from that page to solve the problem would not be hard. It is not only for DomDocument it applies to other classes too. PHP manual is very consistent. If everything fails or you still need other tutorial or examples you can google them. there is plenty of them around the web. Link to comment Share on other sites More sharing options...
jeffman Posted January 10, 2013 Share Posted January 10, 2013 We could save time if you explain exactly what you want to search for and how much information you already have about it's location. For example, I once helped someone who wanted to extract her game score from a 3rd-party source. She knew her username, the id of an ancestor div, and the fact that her score would be in a p element. With that information, it was very easy to find her score. Link to comment Share on other sites More sharing options...
DarkxPunk Posted January 10, 2013 Author Share Posted January 10, 2013 Well what I am searching for will be broad. The goal is to have a search engine for a site. It will consist of things like blog posts, profiles, etc. so it would be nice to just point the search function to a directory or a few directories where all this stuff is and then search all the text in the documents for whatever is searched. The documents will be html files and how they are formatted will be decided by the poster so I may not know the Id or anything I just want the text to be searched. I know I can also do this from MySQL and may use that as well. But for now I simply want to understand how to search text of HTML documents with php. Link to comment Share on other sites More sharing options...
justsomeguy Posted January 10, 2013 Share Posted January 10, 2013 Searching is as easy as opening the document, reading the contents, and checking for text with a function like strpos. That's not precise, but it's easy. It also picks up HTML tags. If they search for "body" it's going to match every <body> tag in every document. But that's the easiest way. If you want more features, then you need to design a more complex search. Using DOMDocument to parse the HTML documents and go through them looking for text nodes instead of tags and attributes is one way to do that. Using something along the lines of a natural language search or query expansion search is another way to get better results. You're asking about a topic that goes from the very very simple, which includes a lot of false positives, to the very very complex. It's up to you to figure out how far you want to take it. There's not one simple answer to a question like how to search through HTML documents. Any novice programmer can create a search engine quickly, which works terribly. If you want a search engine that works well then there's a lot of thinking and researching that you need to do. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now