PHPCoord is 10! (ish)

Documenting the project history

Posted by Doug Wright on October 05, 2022 · 28 mins read

Ten years ago, in October 2012, I made the first commit to the PHPCoord open source project on GitHub. This is a reconstructed record of the project's origins and subsequent development.

Origins of the project

I used to work for an educational non-profit that was focused on the area of nurseries and pre-schools. The organisation ran about 100 of these directly, and provided advice and support to approx 13,000 others. As part of my role in looking after the technical side of the organisation's website, in early 2008 I built a “find my nearest childcare” tool (now offline) that could be used by parents of young children. Although “find my nearest” tools were relatively common even back then, I did learn that my particular use case had some complexities that didn't necessarily apply to the implementations that were described in the blog posts I was then reading explaining how to build one:

  • The number of venues I was working with was ≈13,000. For comparison, the national retailer “B&Q” has about 300. This meant an order of magnitude more calculations were necessary on every search than a typical implementation.
  • With a retailer, a website team has the possibility of having precise location coordinate data be stored centrally as mandated data before a new store can be added to the website. With a primarily membership organisation like I was working for, I could hardly demand that each new member signing up be forced to supply their building's latitude and longitude. I therefore had to develop a solution for determining this data from their postal address.
  • My budget for the project was precisely zero. The entire website ran on a single £59.99/mo dedicated server, spending money purchasing location data for a non-core feature was absolutely out of the question.

The second and third of those problems was solved by scraping Multimap (now Bing Maps) for each new postcode that the search service hadn't seen before (a daily cron for new venues, realtime for search postcodes). These were then stored in a database, and an optimised conversion of the haversine formula into SQL1 2 3 was used to determine the relevant results.

In April 2010, Ordnance Survey started to make available Code-Point Open, which is a free dataset that maps the centre of every postcode in Great Britain to its easting and northing. Making use of this meant there would be no need to abuse Multimap to get the coordinate data anymore, however it did mean that I had to find a method of converting eastings and northings into latitudes and longitudes so that the distance calculations would continue to work 4.

PHPcoord

After some research, I found PHPcoord (lowercase c) by Jonathan Stott. It wasn't ideal in terms of code quality (Jonathan seems to have been an academic rather than a professional developer), for instance it had a single large file rather than the conventional 1 class per file that PHP developers use, and was also written for PHP4 and the PHP4 object model rather than the newer, different PHP5 object model.

However those issues weren't showstoppers (PHP4 code did generally run successfully under PHP5) and testing showed the code seemed to do the job quite well and so I imported it into lib/6, hooked it up to my Code-Point Open import script and easily generated the latitude and longitude for every postcode in Great Britain.

For the next couple of years, everything went well. Periodically I updated Code-Point Open to populate the database with any newly created postcodes, and PHPCoord continued to calculate the appropriate latitude and longitudes for them.

However, it was becoming increasingly clear that PHPCoord was an abandoned library (the author's website had been hacked for years just displaying spam) and therefore the code was never going to be updated for modern PHP.

The initial fork

I therefore decided to fork it, modernise it, and to make the new version publicly available for others to use in much the same way that I had been able to benefit myself from the original work.

It's often said that one of the hardest problems in computing is naming things, and the first problem I had was what to call the fork. I decided that since this wasn't an original work, it would be morally dishonest to give the codebase a completely different name, however I did choose to change the capitalisation (to PHPCoord) since the original was always weird to me. For similar reasons, I chose to use php-coord/php-coord as the package name for Packagist because I felt that using my own name for the vendor portion would not really be honest.

The choice of Kebab case rather than using the much simpler phpcoord/phpcoord is a decision that continues to haunt me to this day though, because I consistently forget the hyphens when I type it out. Lesson learnt.

For a version number, I opted against continuing the numbering sequence of PHPCoord, just in case the original author ever did come back and release something new. I decided that with the new name, this was a new project and thus the first release was v1.0.

This v1.0 release of the fork was mostly intended to be cosmetic in nature (compliance with PSR standards), and was explicitly designed to be a drop-in replacement for any code using the original project. As it was being published with my name attached however, it was also personally important to me to ensure a fairly high level of quality, and as such I also added the first unit tests, and made usage of a CI system (Travis CI) to ensure that these would be automatically run on every commit and against a variety of PHP versions so that I could make claims about the supported versions of PHP with good confidence.

Pleasingly, one the tests I wrote found what appeared to be a rounding issue from the original code which I was able to fix. As part of cross-checking what the expected behaviour should actually be though before changing it, I came across some worked conversion examples from Ordnance Survey. Those revealed that one of the conversion parameters in the code was also slightly wrong (appeared to be a typo), which I was also able to fix. Furthermore, the Ordnance Survey documentation explained that the conversion in question was only approximate and thus revealed the PHPCoord code was making the classic mistake of confusing the number of decimal places in the output of a calculation with its accuracy. That was corrected too, by rounding off to a suitable number of decimal places.

I did add one new feature to that initial release, which was an expanded set of geographical knowledge. The original PHPCoord only knew about (and could only convert between), the global WGS84 (GPS) coordinate system and the OSGB36 (British) system. Forked PHPCoord also knew about ED50 and NAD27, the appropriate conversion values to/from WGS84 for them being taken from a page on Wikipedia.

That initial v1.0 was released in February 2013, with an additional minor release each year for the following three years (not an intentional release pattern, that's just how it worked out).

These did additional cleanup, added additional guard checks to the code to avoid potential errors that could be made from consuming applications and fixed a different inaccurate value I found in a formula. These also included a fix for the first reported bug, which had some mixed feelings for me, namely “I had a bug. It's public :(” but also “Yay, people are using my code!”. Luckily for my self-confidence the first few reported bugs all turned out to be issues in the original code, rather than from my changes. Of course other bugs, reported later did turn out to be mine.

Version 2

With version 2, I did a much larger refactoring of the code, taking advantage of the need to not need to keep backwards compatibility. This was the first step towards having something resembling a domain model, although with hindsight it had some flaws, in particular confusing the concepts of a datum and ellipsoid.

I also received a feature request for, and added support for, the Irish grid systems to complement the existing support for the British grid and also added support for 3D coordinates (those including height) because the conversion formula was inherently 3D but just assuming 0 for height so adding the (optional) ability to supply an actual height was both easy and made sense to do.

v2.0 was released in January 2016, with v2.1 following in September that year to add a new type of distance calculation.

Version 3

I released v3.0 in April 2019, mostly just to bump the minimum version to PHP7 and add the enhanced type information that had by that time become expected practice. There were no compelling new features in the release although I did take advantage of the BC break to convert to using immutable objects (a feature request).

By this time (and well before if I'm honest), PHPCoord had become a mostly unmaintained project. Although I did respond to any issues filed and check compatibility against each new version of PHP, maintenance was definitely something that was on the reactive end of the spectrum. The release of version 3 was basically driven by embarrassment7 with both the PHP7 change and the immutability change both actually having been committed back in 2017. They just hadn't been released because there weren't any new features to justify the cost/benefit of inflicting the BC break on existing users.

Contemplation

After the release of version 3, I began to consider the future of PHPCoord. I was very conscious that despite the grandiose name suggesting it was some kind of definitive library, a more accurate name for it would have been PHPBritishAndIrishCoordinatesWithAFeebleAttemptAtTheRestOfEuropeAndNorthAmerica. I had obviously known right from the start with v1.0 that it didn't have any support for e.g. Australia, Africa or Asia, but back then I had been under the erroneous impression Europe and the US were catered for. Because I'm British and our coordinate system dates from 1936, when I copied the values from Wikipedia for NAD27 and ED50 I thought nothing of it.

It was only later (I'm not entirely sure when) it came to my attention that NAD27 had long since been replaced by NAD83, and ED50 had long since been replaced by ETRS89. (Much later) subsequent research told me that for most practical purposes those 2 newer systems are actually interchangeable with WGS84 and so they were in-fact kinda covered if the end-user was also aware of that fact, but that particular anecdote shows my historically poor level of subject knowledge for a library that I was responsible for.

So as I was considering the future of PHPCoord, there were 3 factors in play:

  1. As unsatisfied as I was with the state of affairs, was I truly ready to “abandon my baby”?
  2. Was there an alternative library I could point people at? I hadn't found one back in 2010 when I first started using PHPCoord, but that was many, many years ago. Maybe one existed now?
  3. If the answers to the first two questions were “no”, what would it take to build something I was actually proud of?

As it turned out, the answers to the first two questions were indeed “no”. When I had interviewed for jobs previously, PHPCoord was not uncommonly a subject of discussion with my interviewers. Once, one of them revealed during discussion that they'd even previously used it! So with a constant 200+ downloads per day, and an eye on my future career prospects I was inclined to keep it going.

Secondly, when I searched on Packagist to see what else was out there, there was almost nothing. There were lots of geocoding libraries, several geometry libraries and quite a few libraries that did distance calculations (some well, some badly). However, for conversions between coordinate systems there was almost nothing. A couple of other projects did just British conversions (apparently a popular need!), there was one project that did only Swiss conversions...and there was Proj4php.

Proj4php on the surface seemed like an ideal candidate to point people at as a replacement library. The Proj C++ library is well-regarded as a backbone of many GIS applications. However, the more I looked into Proj4php, the more I despaired. Firstly, it's not maintained by the same people who maintain actual Proj. That's a bad sign, because it means that the maintainers are mostly just transliterating someone else's code, rather than actually understanding what's going on. Concepts and code structures that make sense for C++, a low level language are probably not the same as idiomatic PHP. Secondly, it turns out that it's not a port of Proj the C++ library. It's a port of Proj4JS, which is the one that claims to be a port of the original. Being twice-removed from the source and its design decisions is even worse than being once-removed

Thirdly, it clearly isn't even trying to keep up with the core project with the commit history positively anaemic. Proj4php receives releases approximately once a year, with the changelog for each one typically having about 6 entries. Whilst my own project PHPCoord wasn't doing any better, if I was going to point people using it at something that claimed to be Proj instead, I would have wanted it to at least be attempting to maintain parity with the parent project. Some quick research shows that Proj4JS is actually a fork of Proj v4 from the time that Proj itself had been having a fairly extensive duration "minimum maintenance" period as a mature project. Modern Proj, that powers geospatial engines all over the place actually started to have a fundamental rewrite with Proj v5 and Proj v6. Proj4php does not have any of these changes, its fundamental architecture is based on designs that the parent project has thrown away as not fit for purpose.

Fourthly, a quick scan through the release history and the outstanding issues, revealed that the conversion from Proj4JS has been so badly done, that many of the issues being fixed in each release were not things like confusing the return values of e.g. array methods between the two languages, but were fixes to put the missing $ in front of variable names. Even a basic lint check would have picked up those up (or, um, testing the code), but issue, after issue, after issue for years have simply been fixing fatal syntax errors. The whole project to me, although obviously well-meaning seems to me to be a complete horrowshow, and I sincerely hope that everyone involved are just hobbyists rather than people employed for money as software-writing professionals.

The rewrite

With all that in mind, in August 2020 after my PHP code coverage work was released I decided that I would invest back into PHPCoord to make it something I was proud of. And more than that, that I would make an actual Proj equivalent for PHP because I'm stubborn like that. Not a port, or any kind of copy, but a PHP-native reimagining of that particular problem space. “🎵Everything Proj can do, I can do better” would be the new mantra.

I began doing research, discovering in the process just how large a job I had gotten myself into. In late August8, I made my first commit of the “new” PHPCoord deciding for a couple of reasons to utilise a brand new, empty branch for the purpose.

Firstly, I didn't want to be mentally attached to/constrained by the existing code and its structure. I wanted to come at things completely afresh (plus, seeing how similar/different the result would be to the current code would be intellectually interesting). Secondly v3.0, although now bearing little resemblance to the original PHPCoord that I'd forked, was still legally (and morally!) derived from the work of Jonathan Stott and thus by necessity was published under the same GPL software license that he had used. Personally I dislike the GPL finding it to be very preachy and akin to getting a sermon. If (and only if) I started from scratch and therefore avoided all Ship of Theseus type issues, would I be safely able to release under a different license, like the MIT.

I released v4.0 in March 2021, after releasing a beta in January. It definitely took a lot longer than I'd originally anticipated having started way back in August, but the results were definitely worth it!

v3.x v4.x v5.x
Coordinate systems supported 8 (for conversions)
10 (for distance)
6200+ 6500+
Conversion methods9 supported 2 70 100
Units supported 2 48 48
Documentation Basic readme Full manual Full manual

April saw the addition of another major feature, that of the concept that where a coordinate system spans a wide area, there are often multiple conversions to/from it that pertain to part of it only. v4.2 added the necessary polygon awareness to the codebase, with v4.3 introducing the concept of “datapacks”, 8 optional/supplementary Composer packages that increased the accuracy of conversions by supplying the necessary high-quality polygons.

v4.4 in June went one step further and added the first support for gridfiles, enabling the absolute highest accuracy conversions possible, commensurate with output of government tooling like NCAT, GDAy, or Circe. The large size of the grid files means that they sadly cannot be distributed with the base package, but are only available to datapack users.

v4.5 and v4.6 from September and October were mainly focused on optimisations.

Annoyingly, rather than have v4.7 as the subsequent release, I was forced into releasing it as v5.0 due to a PHP8.1 deprecation that required changing constructor signature to address. I'd previously had no plans to bump the major version again anytime soon, having done it less than a year earlier as it's obviously annoying for consuming applications.

Obviously, the research needed to accurate determine the details of over 6000 coordinate systems and how to convert between them is a mammoth undertaking, especially compared to the mere 10 that PHPCoord had previously supported. Thankfully, I didn't have to do most of this research thanks to EPSG whose work I very gratefully acknowledge. The EPSG dataset includes the definitions, the formulas, the parameters and also in many cases worked examples that I was able to use to test the code with. The presence of this valuable data inside modern versions of PHPCoord is why the license is the slightly clunky10 (MIT and proprietary) since although the EPSG data is freely redistributable, it has its own license rather than anything in the OSI list.

What's next

I'll certainly be doing regular minor releases to pick up new/improved coordinate system data from EPSG. Other than that, I have plans to add import/export functionality for commonly used interchange formats like GeoJSON and WKT, and some vague ideas about adding to/exposing the internal polygon functionality so that operations other than conversions and distance calculations can be done. Maybe I'll write another blog post at the 15 year mark to see how those predictions hold up 😁

Footnotes

1 The Vincenty formula is more accurate but is slower and over the relatively small distances involved here that difference was considered to be negligible. Especially since any formula used would be only be calculating straight-line distance rather than actual road distance.

2 MySQL at the time did not have built-in support for spatial calculations, so I had to do the calculations the hard way. I did consider installing PostgreSQL+PostGIS, but given my single server was already running both Apache/PHP and MySQL decided it wouldn't be a good idea.

3 The SQL I constructed first culled the points of interest by evaluating a simple bounding box of roughly a few miles in each direction (these were simple < and > conditions that the DB could use indexing for), and then ordered the remaining points by haversine distance.

4 Approximately 15 months later I would have a shower thought and realise that I could just do distance calculations on eastings and northings directly using simple Pythagoras rather than the complex trigonometry involved with latitudes, longitudes and the haversine formula. Oops5.

5 By this point though the search tool also had browser geolocation functionality which only returns the user location in terms of latitude and longitude. This meant that I did still have a need for coordinate conversions, although now it was in the opposite direction! If I'd realised about the Pythagoras trick at the same time I originally started using Code-Point Open, I'm not sure that I would have ever sourced conversion routines just for geolocation purposes...

6 Composer hadn't been invented yet.

7 In 2019 I was considering quitting my then-current job, and decided I would prefer to have PHP7 code and a recent release highlighted on my GitHub profile rather than PHP5 code from several years prior...

8 The published Git history shows October for this first commit, but I was developing on private branch for a long time before merging to master and did many rebases along the way.

9 Refers to methods/formulas. Different mapping systems often use the same underlying formulas but with different parameters.

10 Apparently I'm the first person to ever release a package that was part OSI, part not because doing it broke Composer and Packagist.

Header image by Kyle Glenn on Unsplash