Tuesday, 28 July 2015

Day Numbers

For reference, this is relatively canonical code for calculating day numbers.

# turn a March 1600 based date into a day number
def daynum(y, m, d):
	# years consist of repeating 5-month periods of 153 days each
	m = divmod(m, 5)
	return 365 * y + y / 4 - y / 100 + y / 400 + \
		153 * m[0] + \
		31 * m[1] - m[1] / 2 + \
		d

# turn a number into a March 1600 based date
def numdate(dnum):
	# over-estimate year
	year = dnum / 365
	day = dnum - daynum(year, 0, 0)
	while day < 0:
		year = year - 1
		day = dnum - daynum(year, 0, 0)

	# over-estimate month
	month = day / 30
	day = dnum - daynum(year, month, 0)
	while day < 0:
		month = month - 1
		day = dnum - daynum(year, month, 0)

	return year, month, day

# turn a date into a March 1600 based date
def to1600(year, month, day):
	month = divmod(month - 3, 12)
	return year + month[0] - 1600, month[1], day - 1

# turn a March 1600 based date into a real date
def from1600(y, m, d):
	m = divmod(2 + m, 12)
	return 1600 + y + m[0], 1 + m[1], 1 + d

def from_date_str(date):
	return daynum(*to1600(*map(int, date.split("-"))))

def to_date_str(d):
	return "%04d-%02d-%02d" % from1600(*numdate(d))

Friday, 24 July 2015

How PHP is broken

Some objective reasons.

  • PHP's handling of variables violates a fundamental programming paradigm. I call it the »substitution principle«. It stipulates for an operation op and an expression a such that op(a) is a valid expression, then op(x) should also be a valid expression for x = a.

    There are very many cases where this is not the case in PHP. For example, a property of an object may be accessed indirectly using the notation $this->$prop when the variable $prop is a string specifying a property's name. However, the substitution principle is violated because $this->'propertyname' is not a valid expression. Instead, a bare word must be used to access the property directly.

    This also happens when instantiating a class: an unknown class can be instantiated using the expression new $class when the variable $class contains the class name. But substituting its value yields the invalid expression new 'ClassName'.

    Another violation of the substitution principle is that the following won't work:

    $func = 'array';
    $func();

    Even more bizarrely, sometimes it's impossible to use a value in any other way than a local variable. For example, if $this->someClass contains a class name, its value has to be copied to a local variable in order to access static properties or methods of the referenced class.

    $className = $this->className;
    $className::s_property;
    $className::makeInstance();
    The syntax provides no way to use $this->className directly in these contexts, violating the substitution principle.

  • Another violation of the substitution principle arises with function literals: while it's possible to pass a function literal as a callback, for example in
    array_map(function ($a) { return 1 + $a; }, $ary)
    the same is not true in other places where expressions are allowed:
    class C {
      static private $l = function () { return 'hello'; };
    }
    is a parse error.
  • One of PHP's important use-cases is presenting data retrieved from a database. That considered, I was dismayed how difficult it is to use prepared statements or database transactions with this language. I know, both MySQLi and PDO now provide them, but what in the world has taken them so bloody long?!

    I mean, if it's taken the developers of a database connection toolkit decades of coming up with quoting functions after quoting function ( addslashes, addcslashes, quotemeta, magic_quotes_runtime, mysql_escape_string, mysql_real_escape_string, mysqli_real_escape_string, PDO::quote) without realising that in the presence of NO_BACKSLASH_ESCAPES and ANSI_QUOTES the client cannot know how to correctly quote arbitrary data, how can I trust their software not to randomly delete my hard drive on a wrong keypress? Or indeed, why doesn't this happen all the time!?

  • Given that PHP is used for presenting database contents on the web, you'd think it's a given that various character encodings and their conversion are well supported. I was disappointed when I found out that the function mb_internal_encoding() returns the string 'UTF-8' if that's selected as the internal encoding, but the function tidy_parse_string() requires the string 'utf8' in order to correctly parse a UTF-8 string! This virtually requires hard-coding a specific character encoding, making it nearly impossible to write code that deals correctly with various character encodings in different situations. In comparison, Java and Perl provide full conversion toolsets that seamlessly support any encoding situations imaginable.
  • There's such a thing as »catchable fatal errors« in PHP. When I first heard this I didn't think much about it, assuming that if the language or interpreter encountered some error, an exception would be thrown and my code could deal with it appropriately by displaying some message in a suitable place (log file or user interface).

    But that's not what a catchable fatal error is. What actually happens is that PHP prints a message into your web page (thereby usually causing Quirks Mode) and the program continues.

    In order to make these errors actually »catchable«, you must first install a custom error handler:

    set_error_handler(function ($severity, $message, $file, $line) {
        if (!(error_reporting() & $severity)) {
            return;
        }
        throw new \ErrorException($message, 0, $severity, $file, $line);
    });
    PHP even helpfully provides an exception subclass and sample code in the manual — then why not report all errors like this in the first place?

    On the upside, this has also finally allowed me to get rid of that annoying message from file_get_contents() if a file could not be opened. I had given up on being able to prevent this from breaking my user interface until I installed the above error handler, which now lets me catch and deal with the exception as expected.

  • PHP's object-oriented support has been bolted on in a big rush. This becomes apparent, for example, when using static methods. Incredibly, PHP treats such methods, which are also known as »class methods«, as being inherited by subclasses rather than merely visible to them. This is an important distinction: since every call of a static method has to be qualified with a class name, it can be statically determined at compile time which method will be called. Hence the name, d'uh. There is no run-time polymorphism involved due to inheritance.

    And yet, when compiling code like the following,

    class A {
        static public function f() {}
    }
    
    class B extends A {
        static public function f($a) {}
    }

    PHP's compiler generates this error:

    Declaration of B::f() should be compatible with A::f()
  • It's really difficult to find good advice on how to use PHP well. Even a published book (Peter MacIntyre's »PHP — The Good Parts« at O'Reilly) advocates code like the following without any warning:

    $sql = "INSERT INTO guests (fname, lname, comments)
    	VALUES ('$_POST[fname]', '$_POST[lname]', '$_POST[comments]')";
    $query = mysql_query($sql);
    At the end of this quite horrifying chapter is a suggestion to also read the chapter on security, which makes a passing mention of filtering user data for semicolons and encrypting passwords using SHA0 or SHA1, without giving any reasons or code examples… That something like this could be published in 2010 makes me speechless.

  • PHP's ternary operator is very limited. While every other language that implements this operator (C, C++, Java, …) evaluates it right-to-left, PHP evaluates it left-to-right. Why is this wrong? Let's assume we have a statement
    a ? b : c ? d : e
    When this is evaluated right-to-left, there are three possible outcomes b, d or e, as becomes obvious when adding implied parentheses:
    a ? b : (c ? d : e)
    The way PHP evaluates from left to right, the only outcomes are d or e:
    (a ? b : c) ? d : e
    This ordering only makes it possible to provide elaborate conditions, but breaks multiple outcomes. This is relevant when doing range tests which cannot otherwise be written as an expression but require resorting to if-statements:
      high < value    ? something
    : middle < value  ? other
    : low < value     ? allgood
                      : dflt
  • Functions and even classes are no objects in their own right. For example, in order to call a function with a list of arguments from an array, its name has to be known and passed to the function call_user_func_array:

    call_user_func_array('array_merge', $arrays)

    This can be done much more elegantly in JavaScript using Function.prototype.apply:

    Array.prototype.concat.apply([], arrays)

  • While on the subject of array_merge(), it doesn't return an empty array when called without arguments like it obviously should do. So you end up with this idiom:

    $arrays ? call_user_func_array('array_merge', $arrays) : []
  • It's seriously scary how many situations call for the name of a function or class in PHP. For example, in order to access static methods or properties of an unknown class:

    $model = '\\Model\\Employee';
    $model::load(123);

    It really feels a lot like programming Tcl.

  • Getting the last element of an array is something that many languages have special tools for, be they vector.back() or array[-1]. PHP lacks such tools and the idiomatic solution is to use the array's internal iterator <shudder>. There's no way to avoid the side-effect of moving that iterator, since PHP provides no way to save and restore the iterator's position.

    $last = end($array);
    reset($array);
    This situation is described very succinctly by Keeth's comment on stackoverflow: »Only in PHP would the idiomatic inspection of an array element cause a side effect.«

    One frequently proposed way to avoid the side-effect is to say

    $last = end(array_values($array))
    which conceptually creates a copy of the array. Even if that doesn't actually happen because it's implemented lazily, that's simply too much of an intellectual overload to justify calling »idiomatic«.

  • The sign of PHP's »modulus« operator % is that of its first operand (the dividend in the integral division) rather than the second operand (the divisor). The former choice of sign corresponds with rounding towards zero in the associated integral division (commonly known as »truncation«), while the latter corresponds to rounding towards negative infinity (commonly known as »floor«).

    To see this, recall that the relation between an integral division div and its modulus mod is given by the equation

    p = (p div q) * q + p mod q
    for two integers p and q.

    Sign of »p mod q« when rounding towards negative infinity, i.e. »p div q = floor(p / q)«. The sign of the result of »mod« corresponds to that of the divisor q.

    0 < q floor(p / q) ≤ p / q floor(p / q) * qp 0 ≤ p mod q
    q < 0 p ≤ floor(p / q) * q p mod q ≤ 0

    In comparison, rounding towards zero (or »truncation«) produces a slightly more complex case table.

    Sign of »p mod q« when truncating, i.e. »p div q = (int)(p / q)«. The sign of the result of »mod« corresponds to that of the dividend p.

    0 ≤ p 0 < q (int)(p / q) ≤ p / q (int)(p / q) * qp 0 ≤ p mod q
    q < 0 p / q ≤ (int)(p / q)
    p < 0 0 < q p / q ≤ (int)(p / q) p ≤ (int)(p / q) * q p mod q ≤ 0
    q < 0 (int)(p / q) ≤ p / q

    In addition to simpler formulae, floor division has the attractive property that for a given q, each number n is the result of dividing numbers i ∈ [qn, qn + q) from an interval of length q. This is important when working with time periods such as hours, days and weeks, for example. In contrast, division and truncation means that 0 is the result of dividing numbers i ∈ (−q, q) from a much larger interval of length 2q − 1.

    PHP's preference of »truncation« makes writing algorithms that work correctly with negative integers more difficult than it needs to be.

    /*
     * Polyfill for Python's divmod: return two numbers [ x, y ] where
     * x = floor(p / q) and p = x * q + y. As a result, abs(y) < abs(q).
     */
    function divmod($p, $q) {
    	$x = floor($p / $q);
    	return [ $x, $p - $x * $q ];
    }
  • The standard library function to create a timestamp expects its arguments in the order $month, $day, $year.

    int mktime(int $hour, int $minute, int $second,
    	   int $month, int $day, int $year, int $is_dst)

    The standard library function to calculate the Julian day number for a Gregorian date has the signature

    int gregoriantojd(int $month, int $day, int $year)

    Wow, this is messed up!

The cause for the low quality of this language and its implementation is its community: it quickly becomes obvious that the developers are not professionals, as illustrated by the hilarious tale that started with

int size;
…
if (size > INT_MAX) …

As I come to know PHP more, my feelings are summed up precisely by Eevee in this beautiful analogy:

I can’t even say what’s wrong with PHP, because— okay. Imagine you have uh, a toolbox. A set of tools. Looks okay, standard stuff in there.

You pull out a screwdriver, and you see it’s one of those weird tri-headed things. Okay, well, that’s not very useful to you, but you guess it comes in handy sometimes.

You pull out the hammer, but to your dismay, it has the claw part on both sides. Still serviceable though, I mean, you can hit nails with the middle of the head holding it sideways.

You pull out the pliers, but they don’t have those serrated surfaces; it’s flat and smooth. That’s less useful, but it still turns bolts well enough, so whatever.

And on you go. Everything in the box is kind of weird and quirky, but maybe not enough to make it completely worthless. And there’s no clear problem with the set as a whole; it still has all the tools.

Now imagine you meet millions of carpenters using this toolbox who tell you “well hey what’s the problem with these tools? They’re all I’ve ever used and they work fine!” And the carpenters show you the houses they’ve built, where every room is a pentagon and the roof is upside-down. And you knock on the front door and it just collapses inwards and they all yell at you for breaking their door.

That’s what’s wrong with PHP.

Recommended reading: I got much more useful advice from this post than from the above-mentioned book »The Good Parts«.

My analogy for knocking at the door is trying to find a way to use prepared statements or database transactions with this language.