Geocoding CiviCRM UK postcodes

CiviCRM can use OpenStreetMap or Google as a geoencoder (i.e. looking up latitude, longitude from address). With this correctly set up each contact you add will have a longitude and latitude value and can be placed on a map.

I had a large set of addresses (about 10,000) that had been entered without this and wanted to retro fit the data. There is a job you can use for this, but it's way too slow. Here's how I did it.

Tools used:

  • the open source postcodes.io service.
  • jq - for json manipulation
  • bash, curl, sed, wc.

 

First, get the UK postcodes out of the database where we're missing the geocoding:

mysql -e 'select CONCAT("\"", postal_code, "\"") 
   from civicrm_address 
   where geo_code_1 is null 
     and country_id = 1226 
     and postal_code is not null and postal_code NOT LIKE "%\"%" 
  group by postal_code;' -B mycivicrmdb >postcodes

This created a file with one post code per line, surrounded by "quotes".

I found that postcodes.io doesn't like to give out more than 100 at a time, so I then batched them in 100s. Not all the postcodes would be valid so I had to root out ones without a result.

while [ -s postcodes ]
do 
  head -n100  postcodes | jq -s '{postcodes:.}' >postcode-body
  curl -s -XPOST -H "Content-Type: application/json" -d@postcode-body https://api.postcodes.io/postcodes | \
       jq -c '.result[] | select(.result)| [ .query, .result.postcode, .result.longitude, .result.latitude ] ' >>geo.json
  sed -i '1,100d' postcodes;
  echo -n "postcodes: " ; wc -l postcodes ; echo -n "coded: "; wc -l geo.json ;  
done

This ran through and gave me a file geo.json with lines like:

["rh175da","RH17 5DA",-0.145082,51.014794]

The first item is the value from CiviCRM; the next is the corrected postcode; the next is the longitude and the last is the latitude.

I was able to transform these into SQL updates with jq again:

jq -r '"UPDATE civicrm_address SET postal_code=\"" + .[1] + "\", geo_code_2=" + (.[2]|tostring) + ", geo_code_1=" + (.[3]|tostring) + " WHERE country_id = 1226 AND geo_code_1 IS NULL AND postal_code = \"" + .[0] + "\";"' geo.json >address-updates.sql

Finally I visually checked these updates looked OK (in theory someone could have planted some malicious SQL injection in a postcode field), top-and-tailed them with BEGIN; and COMMIT; and applied them to the database.

I found that it was taking ages because there is no index on postal_code, so I added one and then it only took a few seconds to apply the updates.

Now my data is cleaner (postcodes are corrected) and also has geolocation data. Yay.

Tags: 

Add new comment