Spatialite Speed Test

This post is part of a series [1], [2], [3], [4], [5]

Based on my earlier tests I felt confident that I could expand the size of the dataset to my intended extent. So I grabbed 12-digit HUCS for the entire United States. I was confident this would crush spatialite and finally make the response time for my question extend to close to 1 second. But just in case it didn’t, I would make my question harder at the same time.

While I’m playing with these queries I need a reference guide. Of course the starting point is the spatialite SQL guide found here http://www.gaia-gis.it/gaia-sins/spatialite-sql-3.0.0.html. But, while this is a nice list of functions, I need more help than it provides. Thankfully, spatialite shares many technical concepts with PostGIS (follows OGC, etc. SQL and data standards, and they both use the GEOS spatial library). So, I’ve been successfully poaching help from the PostGIS documentation (http://www.postgis.org/docs/ST_GeomFromText.html) … which is quite good and gives plenty of use examples.

So, getting back to the data, here is my new queryable data set.

12-digit hydrologic units for the entire US
12-digit hydrologic units for the entire US

The pink blob on the right my new, harder question.. searching with not one pair of coordinates, but a bunch of pairs in the form of a pink polygon. Pink scares computers, so this should hurt it a little. It might also be scary that my database file representing the HUCs is now 1.9GB in size (lots of coordinates and the indexes to describes them).

Pink test polygon
Pink test polygon

Because I’m likely going to be using coordinate pairs passed in from some kind of Web application, I converted the polygon to well-known text using

 

SELECT ST_AsText(geom) from test_polys;

 

 

which of course gives us

select a geometry (polygon) as well-known text (wkt)

With the handy text string to describe my polygon given to me, I’m able to just copy and paste it into my text SQL. So let’s do that and the first query should really hurt because I’m not going to use an index. Note that I switched from Contains.. to Intersects since I want to detect anything that touches my pink poly.

 

select HU_12_NAME FROM huc12
WHERE ST_Intersects(Geometry, ST_GeomFromText('POLYGON((-70.286127 43.839038, 
-70.305482 43.823696, -70.30855 43.798912, -70.292028 43.773421, 
-70.243169 43.769644, -70.231367 43.774601, -70.193601 43.811186, 
-70.186048 43.836206, -70.264648 43.839274, -70.264648 43.839274, 
-70.274089 43.840218, -70.286127 43.839038))')) = 1

 

How did it do? Surprisingly well. 40 seconds give or take. Of course, that won’t work for my application, so I’m keeping my fingers crossed that the index rescues me.

Polygon query with no index
Polygon query with no index

Now how about with the index? Here’s the query.

select HU_12_NAME FROM huc12
WHERE ST_Intersects(Geometry, ST_GeomFromText('POLYGON((-70.286127 43.839038, 
-70.305482 43.823696, -70.30855 43.798912, -70.292028 43.773421, 
-70.243169 43.769644, -70.231367 43.774601, -70.193601 43.811186, 
-70.186048 43.836206, -70.264648 43.839274, -70.264648 43.839274, 
-70.274089 43.840218, -70.286127 43.839038))')) = 1
AND ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'huc12'
AND search_frame = ST_GeomFromText('POLYGON((-70.286127 43.839038, 
-70.305482 43.823696, -70.30855 43.798912, -70.292028 43.773421, 
-70.243169 43.769644, -70.231367 43.774601, -70.193601 43.811186, 
-70.186048 43.836206, -70.264648 43.839274, -70.264648 43.839274, 
-70.274089 43.840218, -70.286127 43.839038))'));

 

And survey says! 0.186 seconds! Oh yeah.

Pink polygon query with spatial index
Pink polygon query with spatial index

So this all well and good, but the real reason why these queries are so fast is because the test geometry (the pink polygon) is so small. So lets push that a little.

Multipart polygon with lotsa geometry
Multipart polygon with lotsa geometry

So, here I’ve made a single multipart polygon with lots of vertices to keep my query simple. I’ll spare you the geometry and the query, but the pink polygon above, querying a whole country of HUC12s with the spatial index, took 1.4 seconds. So, we finally broke our time limit with enough testing geometry.

Published by

John C. Zastrow

Grew up in Youngstown, NY with a deep respect and fascination for the Laurentian Great Lakes and all things water. After a dabbling with interests in sports medicine, and being a professional potter, undergraduate work at CU - Boulder led me back to my passion for water and science. Undergrad degree in tow, I worked as an aquatic toxicologist and GIS/RS analyst in the Front Range. Graduate work at Wisconsin – Milwaukee fed my curiosity about the Great Lakes extended my interests in data analysis and management. My projects involved climate change modeling, phytoplankton physiology, riding research vessels in the frigid Lake Michigan, managing an environmental database development effort, and of course GIS/RS. My next job found me in Fairfax, Virginia where I was able to continue with most of my professional interests. The chaos of Greater D.C and a need to be closer to family brought us to Portland, Maine. I am still passionate about all things water, environmental, and spatial and am a strong advocate for Open Source software (for a variety of reasons). I lead software development and data management projects for Tetra Tech and focus on things geospatial.

Leave a Reply

Your email address will not be published. Required fields are marked *