Jump to content
iwato

MATCH( ) AGAINST( ) Prepared Statements

Recommended Posts

BACKGROUND:  i have created a search box for users to query the Grammar Captive database with Natural Language functionality. As the query string results in a single variable that is read into an AGAINST statement (see below), I am concerned about the overall safety of my database and web application.

QUESTION ONE: Is it possible to write a prepared statement with the value of an AGAINST clause as an unknown?  If not,  what must one do in order to prevent against SQL injection?

WHAT I HAVE CREATED

SELECT letter_no, letter_title, letter_abstract, submission_date, revision_date,
	   MATCH (letter_title, letter_abstract, letter_body) AGAINST ('$search_input') AS letter_score
	FROM sevengates_letter
	WHERE
		MATCH (letter_title, letter_abstract, letter_body)
		AGAINST ('$search_input')
	ORDER BY letter_score DESC

WHAT I WANT TO CREATE

SELECT letter_no, letter_title, letter_abstract, submission_date, revision_date,
	   MATCH (letter_title, letter_abstract, letter_body) AGAINST ('$search_input') AS letter_score
	FROM sevengates_letter
	WHERE
		MATCH (letter_title, letter_abstract, letter_body)
		AGAINST (?)
	ORDER BY letter_score DESC

QUESTION TWO: Will the "What I want to create" version fly?

Roddy

 

Edited by iwato

Share this post


Link to post
Share on other sites

Have you tested it?  I don't see why that wouldn't work, although you need to do one more replacement.

Share this post


Link to post
Share on other sites

Yes, I have tested it, and it works -- sort of.

According to the manual, however, "IN NATURAL LANGUAGE MODE" is the default option for  AGAINST arguments.  Further according to the manual, when search entries IN NATURAL LANGUAGE MODE are placed in double quotation marks the exact phrase is sought.  This does not appear to be the case.

  1. Please go to the Grammar Captive main page and find the heading Search Grammar Captive in the navigation bar.
  2. Click on the word Newsletter and perform a search for the following two words -- once enclosed in quotation marks and once not:  Seven Gates.

With quotation marks the search fails.  Without quotation marks the search succeeds.

Either the manual or I is in error.

Roddy

 

Share this post


Link to post
Share on other sites

Since I see no comments in the documentation from people saying that the documentation is wrong, I'm going to assume it's correct.  Did you read the entire page for the natural language search, including the comments at the bottom?  They're mostly talking about the same issue.

Share this post


Link to post
Share on other sites

At the bottom of every manual page there's a comments section. In that section you'll often find interesting additional information about the topic at hand. The same goes for the PHP manual too. From reading peoples' comments on the page you just linked to, it seems like a search term that occurs in more than 50% of the records is ignored. That may or may not be the reason you're having trouble.

  • Like 1

Share this post


Link to post
Share on other sites

Yes,  I read that as well, but this is clearly not the case with the phrase "Seven Gates". It would, of course, be true with the phrase "lorem ipsum", for there are only 8 documents in the entire data base, and all of them contain the words "lorem ipsum". Note that, in both cases --  "seven gates" and "lore ipsum" -- nothing is returned.

Roddy

 

Share this post


Link to post
Share on other sites

And the phrase "seven gates" is in fewer than 50% of the entries?  When there is a small number of records this becomes a problem, it is easy for anything to have 50% of a small number of records.

Share this post


Link to post
Share on other sites

Yes, of course, else I would have mentioned it.

The phrase "seven gates" only exits in two of eight (25%).

What is more you cannot add the IN NATURAL LANGUAGE MODE to the SQL statement when using the unknown ? token.  Not that it should make a difference, because according to the manual the IN NATURAL LANGUAGE MODE is the default mode.

Roddy

Share this post


Link to post
Share on other sites

What is more you cannot add the IN NATURAL LANGUAGE MODE to the SQL statement when using the unknown ? token.

I'm not aware of any restriction like that, I don't see why there would need to be one.  That statement might indicate there are other problems with your code.  These abstract discussions are often missing the actual code, when in reality what matters is the specific implementation instead of the abstract concepts.

Share this post


Link to post
Share on other sites

No, it does not work.  I tried it.  The option must be enclosed in parentheses with the AGAINST string.  When you replace the string with a question mark in combination with the option, the SELECT statement fails.  

Any other suggestions?

Roddy

Share this post


Link to post
Share on other sites

I don't have any other suggestions.  When we do this in our code we treat it like any other prepared statement.  I'm not aware of any special considerations that need to be made.

Now, if you want to assume that your code might have errors instead of assuming that any problems are not in your code, and you want me to look at your code, I'll be happy to.  But, in general terms, other than setting up the necessary index on the table I am aware of no special considerations that need to be made when using fulltext matching in a prepared statement.

In fact, here's that code, notice there is nothing specifying that this is a fulltext search or that the placeholder has any special considerations.  It's just a placeholder, and gets added to the predicate expression like any other placeholder value, there is nothing to indicate that it needs to be treated any differently than any other placeholder:

        if (isset($data->search)) {
            $search_sql = <<<TEXT
                (MATCH (c.title, c.code, c.description, c.keywords) AGAINST (?)
             OR MATCH (child.title, child.code, child.description, child.keywords) AGAINST (?))
TEXT;
            $select->where->addPredicate(new PredicateExpression($search_sql, [$data->search, $data->search]));
            $search_sql = str_replace(' OR ', ' + ', $search_sql);
            $select->columns(['*', 'relevance' => new Expression($search_sql, [$data->search, $data->search])]);
        }

So, again, if you want to continue down the abstract path then I probably don't have anything else to add.  If you want to post the actual code you're using and a description about what happens then I'm happy to look at your specific situation.  The vast majority of all problems I encounter relate to the specific code being used instead of general concepts.  Assuming that all of your code is correct and that this just must be some sort of special case that you're not aware of isn't going to help you fix the problem.  Every programmer should always assume their code probably has all kinds of issues, it makes debugging go a lot faster.

Share this post


Link to post
Share on other sites

Do you use the innoDB engine?

The code that I am using to produce the visible effects observable on the Grammar Captive mainpage by clicking on Newsletter under the Search Grammar Captive heading in the page's navigation bar is the second of the two code entries of my initial entry to this discussion -- namely, 

Roddy

Edited by iwato

Share this post


Link to post
Share on other sites

By the way, does the following mean that you are matching against two separate tables in the same database?

(
MATCH (c.title, c.code, c.description, c.keywords) AGAINST (?)
OR
MATCH (child.title, child.code, child.description, child.keywords) AGAINST (?)
)

Roddy

Share this post


Link to post
Share on other sites
Quote

Do you use the innoDB engine?

I do, I find it much more useful than MyISAM.

 

Quote

the second of the two code entries of my initial entry to this discussion

 

I pointed out an issue with that in my initial reply.  But that's not the only code, that's just the SQL code inside a bunch of other PHP code.  Should I assume that there are no problems in any of the PHP code?  If so, then I don't have any suggestions for you.  If I should not make that assumption, then I need to see all of the code you're actually using.

Let me try to put that another way.  If you run this query in phpMyAdmin:

SELECT letter_no, letter_title, letter_abstract, submission_date, revision_date,
	   MATCH (letter_title, letter_abstract, letter_body) AGAINST ('Seven Gates') AS letter_score
	FROM sevengates_letter
	WHERE
		MATCH (letter_title, letter_abstract, letter_body)
		AGAINST ('Seven Gates')
	ORDER BY letter_score DES

And the results you get there are different than what you see on your site, then the SQL code is not the problem.  It's the rest of the code that is the problem.  If you're focusing on the SQL code being the problem, and I think you are because that's the only code you're showing, then you're missing the actual problem.

When you're developing new queries you should test them outside of your site, just run them on the database and make sure they work first.  Once they work, then integrate them into your site.  If the results are now different then something you did with the integration caused the problem.  This is incremental developing or debugging, you work on each small part individually and then put them together.  You do one thing at a time, and then test.  If you put everything together first and only then start looking for problems the possibilities are much larger.

 

Quote

By the way, does the following mean that you are matching against two separate tables in the same database?

 

They're the same actual table, but using 2 aliases because there's a parent/child relationship that we're also checking for.  Namely, if a child contains the search terms then it should show the parent.  

Share this post


Link to post
Share on other sites

You are welcome to see the code, but it is difficult for me to find the kind of error that would produce the unwanted effect.  By the way, I ran the same statement without the placeholder (?) using "Seven Gates" (notice, they are double, not single quotes) with phpMyAdmin and was returned the same entries that I can only otherwise produce with the single word entries seven and gates.  Voilá la code!

<?php
	ini_set('display_errors', 0);
	ini_set('error_log', '/Users/kiusau/Sites/error.log');
	ini_set('log_errors', 1);
//	error_reporting(E_ALL);

	if(isset($_POST['search_input'])) {
		$prelim_result = [];
		$letter_input = $_POST['search_input'];
		require_once('./classes/class.lunarpages.php');
		$lunarpages = new Lunarpages();
		$mysqli_obj = $lunarpages->get_mysqli_obj();
		$sql ="SELECT letter_no, letter_title, letter_abstract, submission_date, revision_date, MATCH (letter_title, letter_abstract, letter_body) AGAINST (?) AS letter_score FROM sevengates_letter WHERE MATCH (letter_title, letter_abstract, letter_body) AGAINST (?) ORDER BY letter_score DESC";
		$mysqli_stmt = $mysqli_obj->stmt_init();
		$mysqli_stmt->prepare($sql);
		$mysqli_stmt->bind_param("ss", $letter_input, $letter_input);
		$mysqli_stmt->execute();
		
		$meta = $mysqli_stmt->result_metadata();
		while ($field = $meta->fetch_field()) {
			$params[] = &$row[$field->name];
		}
		call_user_func_array(array($mysqli_stmt, 'bind_result'), $params);
		while ($mysqli_stmt->fetch()) {
			foreach($row as $key => $val) {
				$c[$key] = $val;
			}
			$prelim_result[] = $c;
		}
		echo json_encode($prelim_result);
//		print_r($prelim_result);
	}
?>

Roddy

Share this post


Link to post
Share on other sites

Just so we're on the same page, I'm pretty sure that these will return the same thing:

SELECT letter_no, letter_title, letter_abstract, submission_date, revision_date
FROM sevengates_letter
WHERE
  MATCH (letter_title, letter_abstract, letter_body)
  AGAINST ('Seven Gates')

SELECT letter_no, letter_title, letter_abstract, submission_date, revision_date
FROM sevengates_letter
WHERE
  MATCH (letter_title, letter_abstract, letter_body)
  AGAINST ("Seven Gates")  

If you wanted to find the entire phrase then it would be this:

SELECT letter_no, letter_title, letter_abstract, submission_date, revision_date
FROM sevengates_letter
WHERE
  MATCH (letter_title, letter_abstract, letter_body)
  AGAINST ('"Seven Gates"')  

It looks like that code is using an undefined variable called $row, does that get defined somewhere else?  Actually you use $row twice, and it doesn't look like it's defined either time.  You should also initialize $c as an empty array, but that foreach loop inside the while looks pointless.  It looks like you're looping through all the values in an array and building another array with all the values.  Why not just use the original array?  Why copy everything to a separate array, then use that?  You aren't doing anything with the bound results either, you bind the results but don't use those variables.  It seems like something like this is all you need:

$mysqli_stmt->execute();
$result = $mysqli_stmt->get_result();

while ($row = $result->fetch_assoc()) {
  $prelim_result[] = $row;
}

echo json_encode($prelim_result);

Or, even more concise:

$mysqli_stmt->execute();
$result = $mysqli_stmt->get_result();

echo json_encode($result->fetch_all(MYSQLI_ASSOC));

 

Share this post


Link to post
Share on other sites

What i have learned is that the mysqli result object behaves differently with mysqli object and myslqi statement objects.  The technique that I am using resolves these differences.  I learned it in one of the footnotes to prepared statements along while ago and no longer recall the source.

In any case, '"..."' does not appear to work.  What I have learned is that single quotation marks are automatically placed around unquoted words that are read into placeholders.  Thus,

Where along the following chain it occurs I do not know, but a single-word entry with no quotation marks is translated into one with single quotation marks.

<input type='text' name='podcast_input' >

$podcast_input = $_POST['podcast_input']; 

mysqli_stmt->bind_param(s, $podcast_input);

Thus, lorem ipsum entered into an input form control becomes 'lorem ipsum' by the time something is matched against it.  This makes no difference, however, as words contained within single quotation marks are treated singly.

Roddy

Share this post


Link to post
Share on other sites

SQL code and quotes are only there for human consumption.  When you create a prepared statement and send that and the data to the database server, it adds that data to the statement and runs it.  If you want to think about that as it inserting single quotes I guess you can, but that's not what the database is actually doing.  The quotes in a SQL statement are only there to tell the database server what the data is.  With a prepared statement you don't need that, because you send the data separately from the query.  The server adds the data to the query after it creates the execution plan for the query, so on the server the data is completely separate.  That is why you don't need to escape any data yourself to avoid a SQL injection attack, because the server already knows exactly what the data is.  But prepared statements are not just a way to add quotes around data in a query, it's more complex than that.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×