The extension is part of the PostgreSQL contrib package and should therefore be present on the vast majority of systems:Īs you can see enabling the extension is easy. The pg_trgm extension implements “trigrams” which is a way to help with fuzzy search. So what can we do to solve this problem? pg_trgm: Advanced indexingįortunately PostgreSQL offers a module which can do a lot of trickery in the area of pattern matching. Reading 3.2 GB to fetch just a single is now to efficient at all. Schema | Name | Type | Owner | Size | Description The reason for bad performance is that the table is actually quite large, and the database has to read it from the beginning to the end to process the request: But: It also means that we are readily sacrificing two CPU cores to answer this query returning just a single row. That is basically a good thing because the execution time is cut in half. > Parallel Seq Scan on t_hash (cost=83.88 rows=2083 width=37)īecause of the size of the table the PostgreSQL query optimizer will go for a parallel query. Test=# explain SELECT * FROM t_hash WHERE md5 LIKE '%e2345679a%' To see what is going on under the hood I decided to include the execution plan of the SQL statement: The user experience is already going to suffer and there is a good chance that a long running query like that will already increase the load on your server quite substantially. In 90+% of all applications out there this is already way too long. On my iMac, the query takes 4.7 seconds to complete. Test=# SELECT * FROM t_hash WHERE md5 LIKE '%e2345679a%' Mind that the percent symbol is not just at the end but also at the beginning of the pattern: Let us turn our attention to LIKE: The Following query selects a substring which exists in the data only once. Running simple LIKE queries in PostgreSQL The following listing shows what the data looks like in general: What we got here are 50 million ids and their hashes. A simple md5 hash is more than sufficient to prove my point here. To avoid searching the web for sample data I decided to generate some data. These index types are not equally efficient, so it makes sense to dig into the subject matter and figure out what is best when.īefore we get started I have created some sample data. Both index type can handle LIKE as well as ILIKE. In this blog post you will learn mostly about Gist and GIN indexing. What can PostgreSQL do to speed up those operations and what can be done in general to first understand the problem and secondly to achieve better PostgreSQL database performance. People use those things all over the place in their application and therefore it makes sense to approach the topic from a performance point of view. LIKE and ILIKE are two fundamental SQL features.
0 Comments
Leave a Reply. |