Download PostgreSQL Database Performance Tuning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

PostgreSQL wikipedia , lookup

Transcript
PostgreSQL Database Performance Tuning
Jean-Paul ARGUDO e.mail:
[email protected]
$Id: soft-tuning.sgml,v 1.12 2003/11/02 14:24:47 jpargudo Exp $
This document presents how to tune PostgreSQL Databases for performance
It is really outdated now, but some principles exposed in the article are still valuable
1. Document licence
2. About the author
3. The purpose of this document
4. Introduction
5. Use correct datatypes
6. The 20-80 rule
7. Indexes workaround : principles
8. Indices workaround : Little tables
9. Indices workaround : Big tables
10. Using EXPLAIN
11. VACUUM your database
12. pg_autovacuum to avoid crontab managed vacuums
13. Should you disable fsync?
14. Archiving your data
15. Use stored procedures
16. Conclusion
17. Credits
1. Document licence
Copyright © 2002 Jean-Paul Argudo
The document is freely distribuable under licence terms of GNU Free Documentation License
(http://www.gnu.org/copyleft/fdl.html). .
All comments, suggestions or help is very welcome and encouraged. Please send any
comment you want to the author. Just mail to [email protected]
2. About the author
I'm a PostgreSQL fan of many years now. I used to be consultant DBA in a french Open
Source company (look at my cv zone here), where I almost worked only with PostgreSQL. I
started to write this document there.
I really thanks Nat Makarevitch for everything I learnt in his company, IDEALX.
PostgreSQL is now in version 7.3.1 as per December 2002. I hope I'll have enough time to
work on this article, since it gets many hits (see stats page). Now I really know many people
learns from this article. All your messages goes directly to my hart and I thank all the people
who signed my guestbook in this site.
All encouragments give me strenght to continue writing this article. Thanks a lot.
3. The purpose of this document
This document tries to give some waypoints for the PostgreSQL DBA in tuning PostgreSQL
databases. This is *not* about PostgreSQL server tuning. For tuning your PostgreSQL server
configuration, read Bruce Momjian's articleon the subject. There are additional useful articles
at the PostgreSQL Technical Documentation site.
So what is it about then? Let's say it's about software tuning to simplify it roughly. I'll give
some tips & hints about database tuning, I mean, your database model, the way you store the
data, the way you can help PostgreSQL in finding the best ways to search for data, etc.
My objective is for you, the reader, helping me to help us all. So I'd like this document to
become a kind of participative CookBook, so don't hesitate in sending patches to this
document. As a return of what you may learn reading this document, please send what you
think others should know too.
Hope you'll find here all you wanted to know about PostgreSQL database tuning!
4. Introduction
In your architecture, the database has been placed on a node, typically what you think being a
strong configuration: lots of memory, lots of disk space with raid 5. Hopefully you have put in
a standby raid 5 card, in case the main one burns. Since you had load problems with or
without swapping, you increased your optimisation parameters in postgresql.conf. Now you
use much more of you 2G ram... But nothing seems to help you, you still have a big load on
your server. Queries seems to slow each day passing too...
Seems you need some software tuning. I mean, you have already tried server configuration
tuning. Hope you read Bruce Momjian's excellent articles, where you learnt that sometimes
big is too big.
The PostgreSQL RDBMS, like all others, is based on simple ideas. Sometimes, simple ideas
need complex code to be implemented. But that is not the problem for now, so I'll try to
explain some concepts of databases to help you understand the few tips I give here.
So, let's dive into tuning!
5. Use correct datatypes
This is just common advice: try to use the smallest data type to store your data. This supposes
you know the possible values your fields will have.
For character data it is recommended to use the VARCHAR type, since PostgreSQL manages
it very well. On the contrary, if you use the CHAR datatype, the size you specify for the string
is pre-allocated in the pages and you allocate space which may not be used. So your database
is bigger than what it should be and performance is lowered, because this way you need more
I/O to read the data.
But, if you are sure you'll use all the characters, it is better to allocate it directly. For example,
you store fingerprints of public keys or your applications use a user profile coded in a 20
character string, with each character representing a user level or a special privilege for your
application.
With the last example, I'll take a moment to advise you to code as much as you can of your
data logic into your database. But beware of thinking maybe you'll need more time decoding
the data than PostgreSQL needs time to find the data in a classic, pure entity-relationship data
model...
Let's use an example. Imagine you have a table "user" in which you store the user's last name,
telephone, etc. Your application manages user privileges and other permissions. Since you are
a solid DBA, you surely designed the database with some entity-relationship method. Then
you separated table "user" from table "rights" accordingly to the proposition "a user has 0 to n
rights".
There, the interesting thing is "n rights". Are you sure you'll have "n" rights to manage?
Surely not. You yet know all the privileges you want to manage in your application. Let's say
you have 10 privileges to manage. So, why don't you make a special field in table "user" to
manage this 10 rights? For example, add column user.privileges CHAR(16), which is quite
nice because you know you have space for 6 more rights, just in case of a forgotten
specification, or whatever. You can now code all your privileges in one field: 1st character is
privilege P1, 2nd character is privilege P2, etc.
This technique is great for performance enhancement. You don't need table "privileges"
anymore and when you have to search privileges granted to a user (in your client-side
meaning, not SQL's GRANT), you make a simple select on one table which returns a single
tuple. It's then really simple to manage.
Note you can also create 16 fields of CHAR(1) in table "user", and then your have similar
advantages. Perhaps this is less usable, its for you to decide.
6. The 20-80 rule
During my studies, a well advised economics teacher told me about the 20-80 rule. It's kind of
a magic rule which Vilfredo Pareto (1848-1923), an Italian Economist, found. It simply
means 20% percent of something often represents 80% of another.
You can apply this rule to databases. I am always surprised to find I'm correct when I tell a
customer 20% percent of their queries create 80% of the load. I mean 20 out of 100 queries
are the ones which client-side applications call the most often.
Naturally, you'll have to list this 20% of your queries in a short list, and concentrate your
efforts in tuning the database to make them run faster. That's just a good way to reach faster
our main goal: make the client-side application run faster.
You may ask yourself how to identify this 20% of queries? If this is the case, you are mostly
not working with hackers of your business application :-)... Well just joking. This is the first
way to know which queries are the most often called: ask your hackers, they surely know.
You are not sure they are correct? Just put spies everywhere. It's really simple. Just define a
debug mode in your sources, if you have not yet done this (gasp!). Numerotate each query. I
mean, give each query in the code a single number, or identifier. In your database, create a
table with one row:CREATE query_spy (queryid INT);. In the debug mode, each time you
execute a query, make an INSERT too in the table query_spy, with the number of the query.
Put your database in production level and your application in debug mode. Wait for some
significant cycles of application's life. When you a think sufficient sample is done, put your
application back into normal mode.
Now you have the table query_spy full of records, with very good information for us: each
time a query has been called, the call has been clearly logged in there. Make a query on it to
know which queries are most often called. For example:
select queryid, (count(*)*100/(select count(*) from query_spy))||'%'
from query_spy
group by queryid
order by count(*);
This will be very explicit and help you classify which queries are most often called. Now we
can go onto the next section.
7. Indexes workaround : principles
Indexes surely are the best way to improove the performance of your queries. I already saw
queries gaining a 50:1 ratio speedup with good indexes.
Have in mind that indexes are *not* costless. The time needed to maintain an index for
PostgreSQL's backend has to be considered. So? Yes, sometimes, you better have interest in
dropping all indexes on a table! Sometimes, you'd better not create more indexes because they
will require costly treatments to be up-to-date.
8. Indices workaround : Little tables
Many customers use "reference" tables. Some call them "label" tables, others "description"
tables, etc. Here is a classic example :
CREATE TABLE "label" (
"label_id" int4 UNIQUE DEFAULT nextval ( 'label_seq' ) NOT NULL,
"label_labeltext" character varying(40)
);
In this table consider we have 130 labels. We know these labels aren't updated, except in
version changes of our client software... If you copy-paste this example into a sample
database, you'll see PostgreSQL is creating a default index on label_id. What you have to do
is drop the index.
Why? You need to know a little about databases for that. When a query is sent to the server, it
is first parsed. Parsing is syntax checking only. If the syntax it's not okay, the process stops
and corresponding error messages are sent. If the parsing is okay, to simplify, the backend
will figure out how to return the data corresponding to the query. During this process, the
backend analyses in terms of cost which way is the best. The backend will choose the
cheapest way to return data. So, with little tables, the tablescan is most always chosen.
It is more effective making a full table scan on a little table than using an index. So now, if
you drop all indices on a little table, you gain the time of choosing the best way: the optimiser
doesn't have to compute costs or whatever, it will see there is no other way than making full
table scans, which we know is the best way to go in this case.
9. Indices workaround : Big tables
With big tables it is not so simple. You'll have to analyse each one of the 20% of queries you
listed in previous chapters. You will have to watch the WHERE clause of your queries. You
you have to consider if each field in the where clause is indexed or not. There are many
possibilities :
The WHERE clause is simple, you make JOINS (inner or not) between tables using their
primary keys. For example, you join table A and table B using A.PKEY = B.FKEY_A. On
table A it's okay, you should have an index on the primary key (unless it is a small table, see
previous chapter). On table B, you have to verify you have an index (not unique!) on
B.FKEY_A. Index this field if it is not yet indexed;
The WHERE clause is the same as above, but you also have a field in table A that is used too.
Index this field too. You'll have to figure out if you have better results creating an index on
this field alone or mixing with the primary key field too. Depending on the circumstances,
you'll have better results with one or the other.
10. Using EXPLAIN
Once you think all your indices are okay, you'll probably notice performance improvements.
In case you have a doubt, you can trace which execution plan is chosen to return the data with
the EXPLAIN command.
Use extensively EXPLAIN to create indices for example. Here is one example of a tuning
session I made in a production database:
First, I noticed a query wich I considered suspicious:
mytestdb=#
mytestdb-#
mytestdb-#
mytestdb-#
Sort
->
EXPLAIN SELECT t12_bskid, t12_pnb, t12_lne, t12_tck
FROM t12
WHERE t12_bskid >= 35
ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;
(cost=1348.70..1348.70 rows=8565 width=16)
Seq Scan on t12 (cost=0.00..789.20 rows=8565 width=16)
Sequential sorts are used to satisfy the ORDER BY clause. I decided to create an index on the
rows used to sort the results.
mytestdb=# create index t12_bskid_pnb_tck_lne on t12 (t12_bskid, t12_pnb,
t12_tck, t12_lne);
CREATE
Now, you think that's okay? Look at the next EXPLAIN plan on this query:
mytestdb=#
mytestdb-#
mytestdb-#
mytestdb-#
NOTICE:
Sort
->
EXPLAIN SELECT t12_bskid, t12_pnb, t12_lne, t12_tck
FROM t12
WHERE t12_bskid >= 35
ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;
QUERY PLAN:
(cost=1348.70..1348.70 rows=8565 width=16)
Seq Scan on t12 (cost=0.00..789.20 rows=8565 width=16)
Ouch! Exactly the same result! This is allright... Just because we didn't notify the optimizer
the existence of a new index. We have to compute statistics (Oracle folks may notice this ;-)):
mytestdb=# vacuum analyze t12;
VACUUM
And finaly test the performance enhancement:
mytestdb=#
mytestdb-#
mytestdb-#
mytestdb-#
NOTICE:
EXPLAIN SELECT t12_bskid, t12_pnb, t12_lne, t12_tck
FROM t12
WHERE t12_bskid >= 35
ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;
QUERY PLAN:
Index Scan using t12_bskid_pnb_tck_lne
rows=25693 width=16)
on
t12
(cost=0.00..2232.11
We notice now the optimizer uses the index, wich in this case was a pure benefit. But don't
think indices are a solution to everything. Sometimes, you'll notice PostgreSQL doesn't use
your index, even if you are sure there maybe a performance in using it (you still remark PG
optimizers prefers sequential scans rather than your brand new index, ouch). Howto be sure of
this kind of trap? Just modify ENABLE_SEQSCAN variable.
mytestdb=# set enable_seqscan='false';
SET VARIABLE
mytestdb=# set enable_seqscan='true';
SET VARIABLE
With ENABLE_SEQSCAN to FALSE, PG will use your index. In this mode, you are able to
compute query's cost with EXPLAIN PLAN. Then, setting it back to TRUE, do an EXPLAIN
plan again, and see the difference...
11. VACUUM your database
Here you won't learn much more than what you can find in the official documentation... Just a
tip: remember that the more frequent your vacuums are the less time they time to finish.
So? Personally, I've enhanced performance by making a simple VACUUM happen every 5
minutes, and a VACUUM ANALYZE every 15 minutes. It is quite easy to program this with
a cron command. But note that this is an extreme example.
Let me show you the importance of vacuumdb with the pgbench program - you can find it in
contrib/pgbench in PostgreSQL sources; you may also have pgbench installed if you installed
postgresql-contrib for example.
Beware this example works only on PostgreSQL 7.2 versions because it uses a new feature of
PostgreSQL : ability to run concurrent vacuums. This means PostgreSQL can now vacuum
the database *without* locking resources.
First, we create a database for testing purpose:
$ createdb benchpg72
CREATE DATABASE
We initialize this test database with -i option. Let's make a representative database with 1
Million tuples using -s 10:
$ pgbench -i -s 10 benchpg72
[...]
930000 tuples done.
940000 tuples done.
950000 tuples done.
960000 tuples done.
970000 tuples done.
980000 tuples done.
990000 tuples done.
1000000 tuples done.
vacuum...done.
Be patient, this may take a while depending your machine. This will create a 160 Mb
database in you $PG_DATA/base/ directory tree.
Let's create a tiny script:
#!/bin/sh
# mybench.sh
HOST=localhost
USER=jpargudo
DB=benchpg72
totxacts=10000
for c in 10
do
t=`expr $totxacts / $c`
psql -h $HOST -U $USER -c 'vacuum analyze' $DB
psql -h $HOST -U $USER -c 'checkpoint' $DB
echo "===== sync ======" 1>&2
sync;sync;sync;sleep 10
echo $c concurrent users... 1>&2
./pgbench -n -U $USER -t $t -h $HOST -c $c $DB
done
Adapt the 3 variables depending where is your database, the name of the owner and the name
of your database.
Run the script:
./mybench
Since this database is important, you may have to wait a while. Watch the load of your server
growing...
You may now have an output with a look like this:
$ ./mybench.sh
VACUUM
CHECKPOINT
===== sync ======
10 concurrent users...
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 66.064132(including connections establishing)
tps = 66.104508(excluding connections establishing)
$
Okay, as you can see the server can make here 66 transactions per second. It is not that bad, I
made the test on my server, a Bi-Pentium 3 1GHz, with 512 Mb RAM. If you run this same
script again and again, you'll see pgbench is not accurate, I personnaly see difference up to
15% more or less... But pgbench is far enough precize to allow one to have an idea of what
performances are on a server.
Running again the script: I got a tps = 43.337163(excluding connections establishing)! This is
okay... Just because the first time you use a PostgreSQL database, all PG ressources are free
(shared memory, buffers, etc).
Try re-run again, you'll see TPS are quite the same, but never as much as the 1st test...
Now, create a script wich may look like this:
#!/bin/bash
# cron-vacuum
HOST=localhost
USER=jpargudo
DB=benchpg72
WAITFOR=30
while [ 1 ]
do
echo -n "`date +'%Y-%m-%d %H:%M:%S'` - "
echo `vacuumdb -h $HOST -U $USER $DB`
sleep $WAITFOR
done
Now, on another term, run this script:
$ ./cron-vacuum
And on the first term, run again the bench script *while* the cron-vacuum runs.
Here are the results I got with a WAITFOR=30 (seconds): $ ./mybench.sh VACUUM
CHECKPOINT ===== sync ====== 10 concurrent users... transaction type: TPC-B (sort of)
scaling factor: 10 number of clients: 10 number of transactions per client: 1000 number of
transactions actually processed: 10000/10000 tps = 59.111491(including connections
establishing) tps = 59.144385(excluding connections establishing)
I have re-runed this method with decreasing values for WAITFOR variable. Here are the
results:
WAITFOR value | tps (excluding connections establishing)
_______________|_________________________________________
30
| 59.111491
20
| 56.651573
10
| 55.113913
5
| 57.637565
3
| 58.177821
2
| 54.168578
1
| 57.595908
0
| 61.206833
--------------------------------------------------------tps: Transactions Per Second
Those results are a bit disturbing, isn't it? There's no simple explanation. Performance gained
by concurrent vaccum is certain. Just remember that is it not the more vaccum you do the
more efficient your database is.
Conclusion: vacuumdb is really usefull for PostgreSQL databases. Since it cleans the database
and permits then faster management of it for PostgreSQL "kernel". Concurrent vaccums,
appeared with version 7.2 is a great enhancement, since it allows you to gain much time when
running big processes, typically, nighlty batches.
There are some discussion about making the vacuum process a background process of
PostgreSQL as DBAs won't matter of it anymore. However, you may won't need to make
vacuums on the entire database you have every 3 seconds.
Beware that vacuumdb is *not* costless. You'll have to think when to do your vacuums and
remember this is usefull on databases whose activity is really high. pgbench makes the
activity of the database hudge. Depending your application, you may need a vacuum only
once a month (datawarehouses for example), or hourly (web applications for example)...
Since PostgreSQL v.7.4, Matthew T. O'Connor added a new contrib: pg_autovacuum. This
program allows automatic VACUUM. Let's talk about it in the next chapter.
12. pg_autovacuum to avoid crontab managed vacuums
A new contrib is now shipped in PostgreSQL 7.4. It allows everyone to forget about
vacuuming databases, since this usefull new feature does vacuum your database when it has to
be vaccumed. My thanks goes to Matthew T. O'Connor, since he gave everyone a simple and
efficient way to forget about vacuum management of PostgreSQL databases.
I sincerely encourage you to read more about this feature: have a look in the contrib directory,
under pg_autovacuum. If you read the entire README.pg_autovacuum, you don't have to
read what follows here since I just copy/paste/comment this README.
About what are the benefits using this contrib, note that:
The primary benefit of pg_autovacuum is that the FSM and table
statistic information are updated more nearly as frequently as needed.
When a table is actively changing, pg_autovacuum will perform the
VACUUMs and ANALYZEs that such a table needs, whereas if a table
remains static, no cycles will be wasted performing this
unnecessarily.
A secondary benefit of pg_autovacuum is that it ensures that a
database wide vacuum is performed prior to XID wraparound. This is an
important, if rare, problem, as failing to do so can result in major
data loss. (See the section in the _Administrator's Guide_ entitled
"Preventing transaction ID wraparound failures" for more details.)
About how it works, remember that:
pg_autovacuum performs either a VACUUM ANALYZE or just ANALYZE depending on
the mixture of table activity (insert, update, or delete):
- If the number of (inserts + updates + deletes) > AnalyzeThreshold, then
only an analyze is performed.
- If the number of (deletes + updates) > VacuumThreshold, then a
vacuum analyze is performed.
The number of such events are determined by the "Query/Index Statistics Collector" in
PostgreSQL backend. So you'll have to set following parameters in postgresql.conf:
stats_start_collector = true
stats_row_level = true
It's a libpq client program coded in C. This means that you'll have to compile it and make it
run in daemon mode just after you start the PostgreSQL server. As the README suggest it:
If you have a script that automatically starts up the PostgreSQL
instance, you might add in, after that, something similar to the
following:
sleep 10
# To give the database some time to start up
$PGBINS/pg_autovacuum -D -s $SBASE -S $SSCALE ... [other arguments]
The complete list of parameters is given in the README. I did not yet test all the parameters.
Since Matthew put some default parameters, I assume those defaults are efficient enough :-).
Now, you see how this new contrib can be usefull... If you don't, read previous chapter, it
relates how vacuuming PostgreSQL had to be installed in previous PostgreSQL versions:
crontab... |-(
13. Should you disable fsync?
Fsync option in postgresql.conf is very interesting since it allows you to multiple per x2 your
overall PostgreSQL server. But it is risky. He're is the explanation given from "Pratical
PostgreSQL" O'Reilly book:
The fsync-disabling option. Using this increases performance at the risk
of data corruption in the event that the operating system or physical
hardware
crashes unexpectedly. Be sure you know what you are doing before you use
this
parameter!
Wow! What you surely think is: "this man is mad. I'll never use an option which may result in
crashing all my database". You'll be probably right, depending on what you use your server
for.
What is fsync? If fsync is enabled (default option), it's PostgreSQL backend wich decides
when to flush data on disk, for example when the transaction is commited and when changes
have to be effectively wrotten to disks. In the case fsync is disabled, then you trust your
system. It's the system which decides when to flush data on disks, not PostgreSQL alone.
Then, there is always data from committed transactions in your systems'memory. If a power
outage occurs, this data is lost!
I participated in a project based on Jabber. I was the DBA of the team. I faced a big problem,
we had performaces issues. I tunned PG as far as I could, but still had problems. Let me
explain the project. Jabber is an instant messaging system, just like ICQ, AIM or MSN. It's a
free software. We built the backend with PostgreSQL and developped a new Load-balancer
with Proxy functions for Jabber.
The database is used to store all the data about users: friend list (called here "buddies"),
profile, behaviours, etc. And offline messages. Since the whole system is based on RAID 5
devices, with arrays of redundant servers, with no possible power outage, I finally had to
disable fsync to allow performances to reach the customers'requirements. But in fact, I just cut
the database in two parts, with two concurrent PostgreSQL. One is running with fsync true
(the database with persistant and important data), the other one with fsync false (the one with
offline messages only).
Then, in the extreme case where a crash or power outage occurs (none since January 2001),
our client only loses offline messages sent by users.
As a DBA, I can't stand to lose any data, but, with business requirements, I had to deal with
fsync. Sure, our customer knows everything about it and he accepted this risk to meet his
needs.
That's exactly what I want you to understand: you have to balance all arguments and make
you own choice, depending on you particular situation.
Here's a comparison with fsynch=true and fsynch=false (with previous script "mybench", in
same conditions (scale factor=10)):
TEST Number
| FSYNC=FALSE | FSYNCH=TRUE
|
(tps)
|
(tps)
_____________|_____________|____________
1
| 88.694718 | 66.326287
2
| 104.763422 | 59.251771
3
| 107.409698 | 56.862442
---------------------------------------tps: Transactions Per Second
No need to say more. fsync parameter really improves performances when it is disabled (set
fsync=false in postgresql.conf). As I told you before, here we are at yet x2 speed
improvement. Beware too about load... The server load increases dramatically with
fsync=false. On my test server, the load is just the double.
Now it's up to you to decide wether disable fsync or not.
14. Archiving your data
This chapter is probably the one which really requires all your knowledge of your database.
You have to bear in mind a database is faster when the tables are little.
Then, you will have to search for ways to make your tables smaller. One of the ways I know
which works very well is to separate data in two groups. In the first one, you'll place all your
"alive" data and in the other one, you'll put your "dead" data.
For example, you may have a database with invoices or other business documents stored in it.
Think if there is a way to separate business documents which are old, let's say you think these
are documents finalised more than a month ago. So you'll create a table "arch_invoices", with
the same structure as the table "invoices". You'll create a stored procedure whose job will be,
each night for example, to search for such invoices in table "invoices" and move them into the
table "arch_invoices". Beware of integrity constraints, if you use them.
Then, you'll have to manage this new system in your application, permitting users to switch
from the active database to the archive database. This is an old tip that surely you have used
before. In my country, like surely in yours, there's a proverb that says « Divide and conquer ».
You can apply this principle all over. I know of a supermarket's application where all the cash
registers send the articles and tickets data. I know every day the application creates a new set
of tables! This way, the transactions are real fast. There are stored procedures to aggregate the
data during the night when the store is closed, etc.
You are able to place these archive tables on another disk too, physically, using symbolic
links. Read Bruce Momjian's article about Hardware tuning of PostgreSQL to know more
about it. In this manner, you're also lightening the I/O on your database.
15. Use stored procedures
Your application probably always uses the same SQL queries, because there are identified
processes. For example, the creation of a new invoice, requiring multiple INSERT and/or
UPDATE statements to be done. If you are working in a Client/Server environment, this
invoice creation process will generate a lot of network traffic.
What I would do is use a stored procedure to create this invoice. The client side application
will just then call the function to create the invoice. You'll just have to give the arguments to
the function, and it will return some reference/id you need to continue the client side program.
Other data treatments can benefit from stored procedures. The first one are batch treatments.
For example, each day at 6pm, when the day's activity report needs to be done, you (or cron?)
starts the report job. Typically it's batch job, there are many calculations, many queries, etc.
Here again you'll generate a lot of network traffic, and probably, the client side application
will compute slower than the server! Here again, try to use stored procedures to save time in
your client application.
Consider also that using stored procedures is a good method to centralise your code, making
widespread application installation easier, and changes applied automagically for client side
applications. This is generally not a problem for internet applications that centralise code with
other methods, but this is interesting for client/server applications.
Finally, stored procedures will radically improve the security of you applications, as you
avoid SQL injections. I'll write more on this topic later.
16. Conclusion
I hope you'll find here at least some ideas to help you tune your database. As you may now
know, tuning a database is an affair of understanding how a RDBMS works.
You'll probably notice too that database tuning is not so different from cooking, it's a matter
of time to search and test different methods, like a chef will test different tastes.
Finally, if you have any ideas, tips or hints to tell us, please email me your contributions, as
I'll maintain this document with your comments.
Thanks a lot for using PostgreSQL, the most Advanced Open Source Database.
17. Credits
I'd like to thank all hackers at PostgreSQL to provide such a good RDBMS. Thanks to all
contributors to this project. There are so many...
Thanks to all people on PostgreSQL mailing list for the good support they give to us all.
Thanks to their open mind, their right choices.
Special thanks to Justin Clift that maintains http://techdocs.postgresql.org that give us so
much good technical docs related to PostgreSQL.
Thanks to all of you that read this article once and helped me to enhance it.