comp.lang.php (mini)FAQ

(this is a work in progress - it's not even close to finished yet)

External variables (forms, sessions and cookies)

Databases


External variables (forms, sessions and cookies)

I've just upgraded to PHP 4.2.0 or later, and I'm getting 'undefined variable' warnings on all my pages. Why?

You are relying on the register_globals configuration setting being on. It is off by default in PHP 4.2.0 and later.

Code of the following form will no longer work in the default configuration:

<form method="GET">
 <input type="text" name="input1" />
 <input type="submit" />
</form>

<?php
if (isset($input1)) {
    echo $input1;
}
?>

The variable $input1 will be undefined - global variables for each form input element are no longer created by default for security reasons. This is documented in the following pages:

PHP 4.1.0 Release Announcement
PHP 4.2.0 Release Announcement
PHP Manual - External Variables
Secure Programming In PHP article on zend.com

The new mechanism using the superglobal variables $_GET, $_POST, $_SERVER, $_SESSION, $_ENV and $_REQUEST is much preferred over the old global variable method.

<form method="GET">
 <input type="text" name="input1" />
 <input type="submit" />
</form>

<?php
if (isset($_GET['input1'])) {
    echo $_GET['input1'];
}
?>

If you absolutely must use the old method, and you are aware of the security risks this involves, you can restore the old behaviour with the register_globals configuration option in php.ini.

Databases

Why I am getting 'supplied argument is not a valid MySQL-Link resource' warnings?

You've ignored the return value of a MySQL function earlier in the script, and so you've passed 'false' to another function, which causes the error.

Never ignore the return value of mysql_query (or any other MySQL function), as queries can fail, returning 'false'. The function mysql_error() will tell you why.

For debugging, you can use something like:

$result = mysql_query($query)
    or die("Query failed: $query<br />Error: ".mysql_error()."<br />");

For a production system, you should never display raw error messages to the end user, and should use some sort of graceful error handling to inform the user that the function is currently unavailable.

I have a table with usernames or email addresses. How do I check whether a username or email address is already taken before I add a new user?

Add a primary key or unique constraint to the table, and just go ahead and insert your new row. If there's a duplicate username or email address, the database will flag it up for you.

The database will be able to do the checking quicker than if you did a separate select beforehand - and you don't have to worry about locking in between the two calls.

For example, in MySQL:

CREATE TABLE users (
    user_id         INTEGER UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    user_name       VARCHAR(255) NOT NULL,
    user_email      VARCHAR(255) NOT NULL
);

Given the table above, if you want to ensure that no two users have the same username, or the same email address, you add constraints to the table to get the database to enforce that.

ALTER TABLE users ADD UNIQUE (user_name);
ALTER TABLE users ADD UNIQUE (user_email);

If you then attempt to insert duplicate data into the table, you will get error 1036, which you can catch and identify using mysql_errno().

How do I get the first/last/top/bottom record by a particular criteria in SQL?

The first thing to remember is that in relational databases there is no concept of 'first' or 'last' without an ORDER BY clause. Relational databases work on sets of data, which are by definition unordered.

Once you have a field you can order by, then the approach depends on what fields you need.

If you only need the field you are ordering on, you use MAX or MIN:

SELECT MAX(column_name) FROM table;
SELECT MIN(column_name) FROM table;

However, if you need the whole row having that maximum or minimum value, the approach depends on your database. For databases that support subqueries:

SELECT t1.col1, t1.col2, t1.col3
FROM   tablename t1
WHERE  t1.col1 = (SELECT MAX(t2.col1)
                  FROM   tablename t2);

Some databases, for example MySQL, don't support subselects, so you have to do it in two queries, saving the result of the first query in a variable.

SELECT MAX(col1) FROM t1;

(store the result in $max_col1)

SELECT col1, col2, col3
FROM   tablename
WHERE  col1 = $max_col1;

Don't be tempted to use the LIMIT clause in MySQL to limit the result to a single row, after sorting by the column - it is generally far quicker to do two queries, as the database will not need to sort the entire table as it would with a LIMIT.

If you do it in two queries, you get the following execution plans:

mysql> explain select max(id) from test;
+------------------------------+
| Comment                      |
+------------------------------+
| Select tables optimized away |
+------------------------------+
1 row in set (0.01 sec)

It doesn't even need to access the table, it just picks the top off the index, which is fast.

mysql> explain select * from test where id = 1;
+-------+-------+---------------+---------+---------+-------+------+-------+
| table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+-------+-------+---------------+---------+---------+-------+------+-------+
| test  | const | PRIMARY       | PRIMARY |       4 | const |    1 |       |
+-------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.01 sec)

Then it's just a simple index lookup of a single row.

If you use LIMIT, you get:

mysql> explain select * from test order by id desc limit 1;
+-------+-------+---------------+---------+---------+------+------+-------------+
| table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+-------+-------+---------------+---------+---------+------+------+-------------+
| test  | index | NULL          | PRIMARY |       4 | NULL |   13 | Using index |
+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

(There are 13 rows in the table in this example)

It reads the entire table to build a result set, and then you only fetch one row of it. That's a lot more work for the database to do.