Jump to content

Search Engine


dcole.ath.cx

Recommended Posts

Okay, before the hack.. justsome guy way taking about some function. What was that function??How should I do this last part where a script looks up any term in the reverse DB and tries to get data back.Example 1 term: I like (red OR green) birdsExample 2 term: (red OR green yellow) apples are my favoritewell it should look up 1st term and get data for I, like, and bird. Then get data for red and green and combine them into RG . Then it will have arrays I, like, bird, and RG (arrays red and green combined)for the 2nd example, it will get the data for apples, are, my, favorite. But then it will have to get green and yellow data and see if there are matches in the arrays. Then I will want it to out put 3 arrays GY1, GY2, and GY3.I want thing to out put something like: here are results that match: "(red OR green yellow) apples are my favorite"here are results that match: "(red OR green yellow) apples are my" but not "favorite"... so will that be way to hard?

Link to post
Share on other sites
  • Replies 52
  • Created
  • Last Reply

Yeah, I spent a while thinking about that before and I can't remember specifically what I came up with. It looks like the wayback machine doesn't have the forum archived either, I'm not sure how frequently that is getting updated these days. I'll think about what I was talking about earlier (I know it had something to do with tokenizing the string and keeping track of scope), and I'll post anything that I can remember.In the meantime, let's all be happy they decided not to use Comic Sans:http://web.archive.org/web/20001004222234/....w3schools.com/

Link to post
Share on other sites

I'm making this way too hard... I need to take a step back. They way right now I would need to make a script to write a custom script to solve each search.I think I should go like this maybe:Example 2 term: (red OR green yellow) apples are my favoritearray that: (red, OR, green, yellow), apples, are, my, favoritethen get each thing in the DB after removing - + ( ) and checking if it's not OR or AND.so now I have the each array of results in a master array.then find out how many possible combs there are and make another master array but have it blank. Then make a master list of all URLs without repeat.Then check them some how (HELP ME HERE) and plug them into the empty array. How does that sound? please post if you have ANY thoughts!

Link to post
Share on other sites

Well, you can use eregi_replace or str_ireplace to remove whatever you need to. Then use the split function to seperate the string at every space and put it into an array.. Ill look more into this later.. too tired.

Link to post
Share on other sites

I guess I'm a little confused, you have 2 arrays, one of them contains URLs and one of them contains search terms? And you want to do what? Are you looking for the terms in the URL itself?

Link to post
Share on other sites

I have a reverse index! That means I have arrays of URLs that have the picked word. So for each word I have an array of URLs that have that word at the web pages.now I have to find the URL that is in the most arrays and stuff.

Link to post
Share on other sites

$master_array = $1st_term, $2nd_term, 3rd_term...then each $#_term is an array of URLsI want to now form a new array that has the best answers. If it's in all arrays then it's in group 1, if it's in all but the last array then it's in group 2, if it's in all the arrays but the 2nd to last one then it's in group 3...If that is to hard to figure out and make it dynamic, then I just want to order then by how many arrays they are found in and then order then by points!

Link to post
Share on other sites

Well, the last thing to do is easy. I'm still a little confused by the other part, because I'm not sure how two items, each in every array but one, would appear in a different result array because one of them was not in the last and the other was not in the second-to-last. I'm not sure how the different arrays relate to the final grouping I guess.But like I said, it's easy to just count how many times everything shows up.

for ($i = 0; $i < count($master); $i++){  for ($j = 0; $j < count($master[$i]); $j++)  {	if (!array_key_exists($master[$i][$j], $scores))	  $scores[$master[$i][$j]] = 0;	$scores[$master[$i][$j]]++;  }}

Now the $scores array contains a key for each URL, with the number of times it appeared as the value. You can use the asort function to sort by the score, or arsort to sort in reverse order.

Link to post
Share on other sites

Well each sub array will have a list of websites that have picked word and a site may have more than one picked word. That's what I was saying but then I wanted to give more value to the location of the URL, depending on what sub array it was in.Like the first word in your search term would have the most value and the last would have the least amout of value. I don't know if I want to do this though because I guess the average search doesn't search like this but just types in all the words in a given sentence like order.

Link to post
Share on other sites

I haven't had time to write this last part of the search engine, I will try to get to it Monday or if not I WILL on Tuesday!!So... It will work like this (so I don't forget) and it's open source so you can just download the source in a couple of weeks... anyway (I have to debug, test, and add admin browser innerface)1. Get a set of words... the term2. break up the term by 'space'3. look up each word after removing -, +, and ().... but also make sure the word isn't OR or AND. Then put each array of URLs into one master array. But before you need to join arrays that have an OR4. go though using the script justsomeguy gave about... giving point to the begining of the array and the word of the sub-array to the end. (needs a little work because the URL is in alot of junk)5. go though and order all the URLs... repeats of URLs are taken out in #4 (don't worry about that) order by the begining number, then by the following rank, then by the URL... (lol)6. go though a for statement printing each set in this results array!sound good everyone... does to me, except #4... because I can't use justsomeguy's way. unless I make a sister array that will hold the URL as the other array holds all the data. bingo!

Link to post
Share on other sites

but that doesn't solve the -, +, (), OR, AND thing...so... when I'm getting the URL arrays I will check for (, AND, and ORif it's (, I will continue until I find )if it's OR, I will include the next array in the last arrayif it's AND, I will what ones are in both arrays.but what about - and +, where should I check for them... where would work best?

Link to post
Share on other sites

It's hard to answer questions like that without all of the code. Speaking about this in abstract terms with arrays and whatnot is sort of hard for me to follow without any real examples. What do - and + mean in your search engine?

Link to post
Share on other sites

Isn't + the same as AND? I'm not sure what your original question is, I guess if you are checking for the presence or absence of a specific word, you would probably want to do that near the beginning, because it will make your set of possible matches a lot smaller. When working with a large data set, you will want to use methods that make the set as small as possible as soon as possible. The same thing goes for SQL queries with joins in them, the ordering of your joins and your where clauses has a large impact on how long the query takes. But again, not knowing the specifics of the algorithm, I can't give you much better advice than that.

Link to post
Share on other sites
so what is the easiest way to compair if something is in 2 array and remove it, IF IF IF you only want to match the middleI want to compair parts of each set in array to parts of each set in another array...Example: (set 1 in array 1) "01 [url="http://google.com"]http://google.com[/url] GOOGLE"(set 1 in array 2) "03 [url="http://msn.com"]http://msn.com[/url] MSN"(set 2 in array 2) "07 [url="http://google.com"]http://google.com[/url] THE GOOGLE"so I only want to compair the URLs in each array, not the whole set in each array... just http://....com, not the number before it or the Title after it.. so there is still a match if 01 and 03 are different, or there is an extra THE
Link to post
Share on other sites

justsomeguy, how long have you been doing php? Do you have a degree in computer science or something of that nature?yeah, it has a pattern... the one I showed you.Cool Cool, I'll post back in a little bit... see what I get with sscanf.

Link to post
Share on other sites

Yeah, I do have a degree in computer science, from Arizona State. I've been writing PHP for around 2-3 years now, but most of the general programming concepts I learned in school. You can use the concepts of software design for any language, or I guess alternately you can find a way to implement your design in any language. Except VB.sscanf and that type of thing came from C, C used printf to print strings. You specify a format string to parse from. Your string would probably look something like this: "%d %s %d" meaning that you have a number (digit), then a space, then a string, then another space, then another number. I think it will return the variables it parses in an array, and you can discard the two you don't need.

Link to post
Share on other sites

so I have an empty array and I want to add arrays of data to it, but still have it one array..$empty = array();then I want to add$data1 = 1, 2, 3, 4; (each #, being a set in the array)$data2 = 5, 6, 7, 8; (each #, being a set in the array)then when $empty .= $data1;and when $empty .= $data2;then $empty will be $empty = 1, 2, 3, 4, 5, 6, 7, 8;this all correct? Will that give me what I want?---- ---This is 2nd part, I want another answer... kinda different from the 1st part above the ---- ---Then I also want another way that put it in it's own setso I have:$empty[0] = 1, 2, 3, 4;$empty[1] = 5, 6, 7, 8;for this, Can I go:$empty = array(); $empty[] = $data1; $empty[] = $data2;

Link to post
Share on other sites

I do beleive that would just create new sections of the array.. I cant wait til you get this out :). Oh yeah, your thumbnail thing is plausible. Check out http://whois.sc and type in any website. It will show a little thumbnail off to the side.. I wonder how they do it though!

Link to post
Share on other sites

True, could they use like the javascript print page function and store it into a variable or something then just stick it in there server for that time..?How exactly do they do it though, I would like to know..

Link to post
Share on other sites

I believe PHP can create images but I don't know if that is how it is done.My guess for the safari test is they have a COM+ component that sends the url you provide to an app on a Mac server which somehow loads it in safari and then saves a screenshot (somehow) then retreives the generated screenshot then passes it back tothe browser.All that is just spectulation and I wouldn't knwo where to start to set it up. I am guessing it is complicated since I have only ever foudn 1 other site like it.

Link to post
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...