
Boost query performance using bitwise operators
In this blog, we explain bitwise operator techniques to boost the storage or compute performance of database queries. It reduces the storage and CPU requirements so that one can cut down on costs. Additionally, it reduces the query complexity by simplifying it. So, it can prevent the database slowdown caused by some complex logic. We will understand these benefits using an example and discuss more benefits later. It can benefit the database or data warehouse designer to design optimized schema solutions.
What are bitwise operators? How do they work?
Bitwise operators perform operations on the data at the bit level. So, it converts integer value into binary and then operates bit-by-bit on the value. The working of fundamental bitwise operators is shown in the table below.
How can you use bitwise operators for optimization?
Let us consider a movie information table for example, where movies are classified based on the genres.
Suppose we have such a scenario of the multi-value column. In that case, people generally move to non-relational databases, with some compromise on performance (not compatible with SQL, do not support ACID, not consistent). The non-relational approach is shown below:
Also, some prefer a normalized schema like the one below,
Movie (MovieID, MovieName)
Genre (GenreID, GenreName)
MovieGenre (MovieID, GenreID)
Scenario 1: Normal schema without bitwise operators
However, performance-wise this schema is not good, as it requires an extra junction table (MovieGenre). As the movie table grows, the size of the junction table will consume more storage. Also, query execution becomes more complex and costlier in this schema, which is demonstrated visually in the later part.
Using bitwise operators, we can deal with multi-value data scenarios efficiently in a relational database without moving to a non-relational schema and junction table requirement.
Scenario 2: Schema using bitwise operator
Here, in the MovieInfo table, we stored multiple values in one column using the bitwise OR (|) operator. Titanic genre is Romance (GenreID 64) & Drama (GenreID 4). So, we stored Genre 68 = 64 | 4 = 1000000 | 0000100 = 1000100 = 68. Similarly, Avatar Genre 41 = 1(Action) | 8(Fantasy) | 32(Sci-Fi), and so on.
This scenario optimizes storage performance as it drops the requirement of the junction table. In the below Figure 1 of the example Movie entity-relation diagram, we can remove many mediator junction tables like production_country, movie_languages, movie_genre, movie_company, and movie_crew. It will simplify the database, reduce development and maintenance costs, and lower the risk of errors. Because of fewer tables, the number of join operations required while querying (data from more than one decomposed table) will also decrease.
Source: Databasestar – Sample movie database
Next, we will see compute optimization by comparing query execution plans for both bitwise and without bitwise operator scenarios.
We can run all queries (e.g., Movie x has which genres?, Do we have any movies with x & y both genres?, …) based on this multi-categorical field using bitwise operators (&, |, !, ^).
Example queries
Query1 – which movies are based on the genre Drama
(Note: In the subsequent visuals, S1 will show the query written/executed for schema tables demonstrated above in Scenario 1, and S2 stands for the query written/executed for Scenario 2 schema tables.)
Query syntax for Scenario 1 and Scenario 2
Query execution plan for Scenario 1 and Scenario 2
From the query execution plan for S1 and S2, we can see that S1 requires two table scans, one for movies and the other for the junction table, while S2 requires only a single table scan.
Query2 – Get the horror comedy movie
Query syntax for Scenario 1 and Scenario 2
Here in S2, we can observe how the bitwise operators can simplify the complex query logic.
Query execution plan for Scenario 1 and Scenario 2
From the query execution plan for S1 and S2, we can see that S1 requires two table scans, join and aggregate and scalar compute operations, while S2 requires only a single table scan.
Comparison of query execution statistics for both scenarios:
Query execution statistics for Scenario 1 queries and Scenario 2 queries
The benefit of the bitwise technique (S2) is the reduced reads/writes as it drops the large junction table. In this case, you can notice the increase in performance is significant. Also, the query execution is much faster for the same queries in the bitwise scenario.
Bitwise operators common use cases:
In general, bitwise operator techniques can be applied in any database that has categorical data fields. Here, we discuss few common use cases of bitwise operators.
Job application: Employers need to store candidate skillset data.
Resource administrator: The administrator needs to store information about access/permission of the resources given to the user.
Role-based access on a database: This technique can help store and retrieve user/role permissions that are used dynamically in an application, especially when permissions for multiple users/roles must be aggregated on the fly to decide effective permissions.
Railway reservation: Many flag fields are possible here or in many databases, like the ticket isConfirmed, isInWaitlist, isCancelled, etc. These kinds of flag fields we can combine into a single field.
Also, E-commerce and customer service platforms are required to store many categorical data in it.
(Note: No. of categories should be less than 63 – as we are storing key values in the power of 2, the max we can store up to SQL BIGINT limit 2^63.)
Benefits of using the bitwise operator
- Storage optimization as it reduces the storage resources required to run the query
- Compute optimization as it reduces the elementary operations required during query execution
- Simplifies the database design as we can drop many junction tables.
- Simplifies query logics, as seen in the above movie examples.
- We can store multi-value data in a single column, and the multi-value data more closely relates to a real-world scenario.
- If you have multiple flag fields, they can be combined into a single field.
Limitations of bitwise operator method
The limitations of using the bitwise technique to remove a junction table include the following:
Violation of relational integrity: The parent IDs in the lookup table (e.g., GenreID in Genre table) must have a specific order. If you make a mistake when setting up the key values, you can get incorrect results.
Limited bitwise values can be stored in a bigint: In SQL Server, a BIGINT is 8 bytes, which means that there are 64 bits. When using a single bithash column, you can only have one value per bit. (Note that you can work around this by using multiple columns, but that gets complicated).
Conclusion
We have seen how bitwise operators can significantly optimize the database design and boost query performance.
Neal Analytics has deep expertise in cloud and data modernization solutions that can help design better data warehouse solutions using bitwise operators.
Reach out to us for optimizing your database/data warehouse solutions or to learn more about other applications of bitwise operators.