Preamble
Creating a large table
The initial query is: “How do we make a table with numerous columns?” The simplest method is to just use “generate series” to generate the CREATE TABLE statement:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
test=# SELECT'CREATE TABLE t_broad (' || string_agg('t_'|| x || ' varchar(10) DEFAULT ''a'' ', ', ') || ' )'FROMgenerate_series(1, 4) ASx; ?column? ----------------------------------------------------------CREATETABLEt_broad (t_1 varchar(10) DEFAULT'a',t_2 varchar(10) DEFAULT'a',t_3 varchar(10) DEFAULT'a', t_4 varchar(10) DEFAULT'a' )(1 row)test=# \gexecCREATETABLE |
I’ve only used 4 columns here for simplicity’s sake. The string we just compiled can be executed using gexec once the command has been produced. A highly useful feature of gexec is that it treats the prior result as SQL input, which is exactly what we need in this case. We are left with a table that has four columns.
But let’s abandon the table and build a massive one instead.
|
1
2
|
test=# DROPTABLEt_broad ;DROPTABLE |
Create an extremely wide table
The statement that comes after produces a table with 1500 columns. Remember that there are a maximum of 1600 columns:
|
1
2
3
4
|
test=# SELECT'CREATE TABLE t_broad (' || string_agg('t_'|| x || ' varchar(10) DEFAULT ''a'' ', ', ') || ' )' FROMgenerate_series(1, 1500) ASx; |
Such a table is rarely effective in real life and shouldn’t typically be utilized to store data. Simply put, it will add too much overhead, and the majority of the time, the modeling is poor to begin with.
Let’s fill the table with 1 million more rows.
|
1
2
3
4
5
6
7
8
9
10
11
|
test=# \timingTiming ison.test=# INSERTINTOt_broad SELECT'a'FROMgenerate_series(1, 1000000);INSERT0 1000000Time: 67457,107 ms (01:07,457)test=# VACUUM ANALYZE ;VACUUMTime: 155935,761 ms (02:35,936) |
The default values in the table let us know that the columns do, in fact, contain something. In order to ensure that all hint bits and similar items are set, I have finally run VACUUM.
The following line can be used to quickly calculate the size of the newly formed table, which is around 4 GB in size:
|
1
2
3
4
5
|
test=# SELECTpg_size_pretty(pg_total_relation_size('t_broad')); pg_size_pretty---------------- 3907 MB(1 row) |
Accessing various columns
Rows are how PostgreSQL stores data. As you may be aware, data can be stored either row- or column-oriented. The benefits of either strategy may depend on your use case. A row-based method is typically far more effective in the case of OLTP.
We’ll count (*) the seconds and see how long it takes:
|
1
2
3
4
5
6
|
test=# SELECTcount(*) FROMt_broad; count--------- 1000000(1 row)Time: 416,732 ms |
We can complete the query in about 400 milliseconds, which is good. The optimizer will perform a parallel sequential scan as anticipated.
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
test=# explain SELECTcount(*) FROMt_broad; QUERY PLAN-------------------------------------------------------------------- Finalize Aggregate (cost=506208.55..506208.56 rows=1 width=8) -> Gather (cost=506208.33..506208.54 rows=2 width=8) Workers Planned: 2 -> PartialAggregate (cost=505208.33..505208.34 rows=1 width=8) -> Parallel Seq Scan ont_broad (cost=0.00..504166.67 rows=416667 width=0) JIT: Functions: 4 Options: Inlining true, Optimization true, Expressions true, Deforming true(8 rows) |
Comparing this to a count in the first column will help. Performance will differ just a little bit. The rationale is that count(*) must verify the row’s existence, whereas count(column) must determine whether or not a NULL value was supplied to the aggregate. If the value is NULL, it must be ignored:
|
1
2
3
4
5
6
|
test=# SELECTcount(t_1) FROMt_broad; count--------- 1000000(1 row)Time: 432,803 ms |
Let’s check out what happens if we access column 100, though. There will be a big difference in how long that takes.
|
1
2
3
4
5
6
7
|
test=# SELECTcount(t_100) FROMt_broad; count--------- 1000000(1 row)Time: 857,897 ms |
Execution times have nearly doubled. If we run a count on column 1000, the performance is considerably worse:
|
1
2
3
4
5
6
|
test=# SELECTcount(t_1000) FROMt_broad; count--------- 1000000(1 row)Time: 8570,238 ms (00:08,570) |
Wow, we’ve already slowed down 20 times compared to before. This is not a small difference, so it is important to understand what is going on.
Debunking PostgreSQL performance issues: column order
Consider the way PostgreSQL stores data in order to see why the issue arises in the first place: Following the tuple header, which appears in each row, there are a few varchar columns. Here, we’ve only utilized varchar to demonstrate the idea. The same problems will arise with other data types as well; however, because varchar has greater internal complexity than, say, integer, the issue is more obvious.
How can I access a column in PostgreSQL? To determine the location of the requested column within the row, it will first fetch the row and then analyze this tuple. Therefore, if we wish to access column #1000, we must determine how lengthy the first 999 columns are in actuality. This might be really complicated. For an integer, we just need to add 4, but for a varchar, the procedure becomes very expensive. Let’s look at PostgreSQL’s varchar storage method to understand why it is so expensive:
- A single bit designates a string as short (127 bytes) or lengthy (> 127 bits).
- length of 7 bits or 31 bits (depending on the first bit)
- “data” + 0 (to terminate the string ) (to terminate the string)
- alignment (to ensure the following column starts at a multiple of CPU-word length) (to make sure the next column starts at a multiple of CPU-word length)
Think about the implications if we need to loop over 1000 columns. It does result in some significant overhead.
Finally …
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
How to Enable Intelligent Wealth Growth with Enteros Database Analytics, RevOps Automation, and Gen AI
- 24 June 2026
- Software Engineering
Introduction Wealth management and investment organizations are entering a new era defined by data-driven decision-making, AI-powered advisory systems, and highly automated operational environments. As client expectations grow and financial markets become more dynamic, firms must continuously improve performance, efficiency, and personalization to remain competitive. Modern wealth organizations now operate complex ecosystems that include: Portfolio management … Continue reading “How to Enable Intelligent Wealth Growth with Enteros Database Analytics, RevOps Automation, and Gen AI”
How to Improve Financial Cost Visibility with Enteros Database Management Platform and Cost Attribution Analytics
Introduction The financial services industry is rapidly evolving as banks, insurance companies, fintech platforms, and investment firms modernize their digital infrastructure to support real-time transactions, data-driven decision-making, and highly personalized customer experiences. Modern financial organizations operate complex ecosystems that include: Core banking systems Digital payment platforms Investment and trading systems Risk management applications Fraud detection … Continue reading “How to Improve Financial Cost Visibility with Enteros Database Management Platform and Cost Attribution Analytics”
How AI-Driven Database Monitoring Enhances Business Continuity and Resilience
In today’s always-on digital economy, business continuity and operational resilience have become essential for enterprise success. Organizations depend heavily on digital systems to support customer interactions, financial transactions, supply chain operations, analytics, internal workflows, and real-time decision-making. Any disruption to these systems can lead to significant financial loss, operational inefficiencies, and reputational damage. At the … Continue reading “How AI-Driven Database Monitoring Enhances Business Continuity and Resilience”
Reducing Application Latency with Intelligent Database Performance Management
In today’s digital economy, application speed is directly tied to business success. Whether users are shopping online, using banking applications, streaming content, accessing SaaS platforms, or interacting with enterprise systems, they expect fast and seamless experiences. Even minor delays can impact user satisfaction, engagement, and revenue. Application latency has become one of the most important … Continue reading “Reducing Application Latency with Intelligent Database Performance Management”