OT: MySQL versus PostgreSQL and database design

Hi folks,
Please excuse the off-topic post. I thought I'd ask this question in rec.crafts.metalworking because there are some smart people here.
I need a database for a major website project. Eventually it'll grow to be a large database, probably gigabytes in size. It needs to be straightforward to build a web-based interface for the database. Probably using PHP, as I'm familiar with PHP and don't fancy learning Perl right now.
Data integrity is more important to me than speed, although speed is not unimportant. I don't want to have to spend time restoring a corrupted database from back-ups. The database will be modified frequently, i.e., there will be substantial write activity as well as read activity.
I can't afford to pay for a commercial database system like Oracle. It needs to be free software. It also needs to run on a Unix or Linux server of some kind. I don't want to use a Windows server for several different reasons.
I've come to the conclusion that there are two serious contenders: MySQL and PostgreSQL. I get the impression that MySQL interfaces with the web slightly better than PostgreSQL, but that PostgreSQL has a better reputation for data integrity and a more precise philosophy behind it. PostgreSQL also appears to have a wider range of advanced functions, such as approximate string matching, which may be useful to me. As far as I can see, the initial set-up costs for the two systems are similar. Right now, I'm tending towards PostgreSQL, but I'd really like to hear from some knowledgeable people. Can anyone make a recommendation based on experience?
I could also do with having a book which explains best practice in database design. Particularly how to structure a database for best performance. Intuitively I feel that if I have many different types of data, it will be best to have one table for each type of data, rather than having several types of data in a single table. That way, when I go to retrieve a particular kind of data, the server won't have to do any sorting. It can just get everything from a single table. But I'm not sure if there are disadvantages to having a great many tables. This is where a good book would be useful. Can anyone recommend such a book?
I have a small amount of experience of using SQL Server, and much more substantial experience of PHP, Apache and various versions of the Solaris SPARC operating system.
Any thoughts on the subject would be much appreciated.
Many thanks,
Chris Tidy
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
I am using mysql for algebra.com and at work and I am satisfied. 30+ queries per second, no problem. Your data needs to be normalized, for which you can look up "principles of database normalization".
A good approach for optimization is to set up a master database, and as many slaves as necessary that would replicate the master in real time.
Then you can direct only updates to the master and all queries to slaves.
i

Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Ignoramus7501 wrote:

How long have you been running Algebra.com for? I did various searches online, trying to find accounts explaining why people were dissatisfied with either MySQL or PostgreSQL. Many of the accounts of problems with MySQL date back 7 to 10 years. I couldn't find any substantial accounts of dissatisfaction with PostgreSQL. That could be because it's good, or because few people use it, I don't know. But I get the impression that there is less of a difference between the two systems than there was when they were first released. It seems they've both matured.

That would only work well if you have a much greater volume of queries than updates, is that right? In my case, I'm not sure that will be true. I suspect the volume of queries will be higher than the volume of updates, but perhaps not by that much. Updates are going to be a big part of the picture.
From reading a few documents online, I get the impression that PostgreSQL gives you (the programmer) less room to get things wrong. In contrast, I get the impression that MySQL can be configured in a greater variety of ways, using different storage engines, with some of the possibilities being a bit questionable where data integrity is concerned. But I'm guessing you got things right with Algebra.com, so you've got the best out of the system.
Thanks for the advice.
Best wishes,
Chris
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Since 2004 based on my sql. I did a major overhaul of algebra.com in 2004, and made it into a social website where students and tutors exchange math questions for free.

I think that you will be fine either way.

It is not the volume, but expense of queries.

I have no opinion on postgres.
i
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Ignoramus14359 wrote:

Thanks.
<snip>
Interesting. So queries are more expensive, in terms of CPU time, than updates? If so, I'd have thought that anything I can do to reduce searching and sorting that the storage engine has to do, such as breaking varied data into separate tables, would be a good thing.
Best wishes,
Chris
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Some queries are more expensive. You just have to know which are expensive.
i
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
On MySQL vs. PostgreSQL -- I've used both, and the differences really don't seem significant. If you've found features you like in PostgreSQL, go for it. The work I did building my own database with a web interface was PostgreSQL, and it interfaced just fine.
Your questions about "table per type of data" are too vague for me to be able to address. The key to any relational database is that what you want in the tables is the relationships between the data; a row should in some sense be a bunch of related information. If you'd like to send me an email we can try to be a little more specific.... I don't know what the good database books are, but I'll be that you'll find one from O'Reilly.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Joe Pfeiffer wrote:

That's good to hear.

I'll try to give a brief explanation. To give you a very simplified idea of what's going on, I have two kinds of data: data which concerns people and data which concerns products. I actually have more than two kinds of data, but these two serve as a representative example, I think. There will be substantial similarities between the two kinds of data. Loosely speaking, both kinds of data form a record of events. So I was considering putting both the people data and the products data into the same table. However, there are certain pieces of information which can only apply to a person (e.g., sex) and certain pieces of information which can only apply to a product (e.g., manufacturer). So if I put both kinds of data in a single table, certain regions of the table would be empty, for want of a better word. Also, as I'm not going to want both kinds of data at the same time, the data will need to be sorted every time I want to retrieve it. This makes me think that splitting the data into two tables, each representing one kind of data, would be more efficient.
However, events could be broken down further by month and by country, etc. Taking this process to its ultimate conclusion could, at a quick calculation, give me a database with 100,000 tables. This is why I'm wondering if having a large number of tables can be a bad thing.
I'm also interested to know if there's any advantage to having separate databases for separate kinds of data, except for security reasons.
I also need figures for the mean and standard deviation of a series of numbers. Ideally it would be good if I could constrain a record in the database to be the mean or standard deviation of a range of other records, so that I don't have to keep re-calculating the number using PHP. But I've yet to figure out if this is possible.
I hope this makes some sense. I may not have always got the terminology quite right.
I'll have a look for an O'Reilly book on the subject. I need to gain a better understanding of the concepts involved. At that point, I may take up your generous offer of advice, but for now I fear I might end up asking a bunch of really obvious questions!
Thanks for the advice.
Best wishes,
Chris
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

As a general rule, if there are empty entries in some of the fields of a row, that's evidence of a problem in the design (I'm being a little broad, and avoiding an absolute "the design is broken" pronouncement. But it's a sign).
What if you had an 'events' table, that had the parts of events that were common to people and to products, and an ID? Then a person-events table could have fields specific to a person, and the ID of an event, and a product-events table could have fields specific to a product, and the ID of an event.

I'm a little vague on just what an 'event' is, but if every event takes place in a particular place and with a particular time, then you'd want to combine them into a single table.
Another general note -- just as blank fields is evidence for a poor design, any time you ever have two fields that are *always* together, it's evidence they belong in the same table.

Not a direction I've ever explored.

Also something I haven't explored -- the approach is called using a 'calculated field'.

Happy to (try to) help -- just remember it's worth what you paid for it!
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Hi Joe,
Thanks for the thoughts. Sorry for my very slow response. I thought the thread had come to an end, and I was busy and didn't check rec.crafts.metalworking for a few days.
Joe Pfeiffer wrote:
<snip>

Thanks. That's what I was guessing.

I had wondered about this. But the thing is, I'm never going to want to retrieve all the events at once. I'm only going to want to retrieve all the events relating to a single person or product at once. So I thought it might make sense to split the data into many tables rather than few, so that each table contains exclusively the data I'm going to need at one time, to avoid searching/sorting?

Even if you didn't want to retrieve all the events at once?
I might take up your offer of discussing this further by e-mail once the ideas have developed better in my mind. But to explain what an event is in public rather gives my game away.

Thanks. That's a useful tip.
Best wishes,
Chris
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

If I understand what you're suggesting, putting individual events (as opposed to event types?) in different tables would be an odd way to do things. You'd have to do something like establish DB connections based on event ID (or similar) to retrieve the table for the event -- in effect, you'd be doing the searching in your application instead of taking advantage of the database. If you're able to define a key in a way that you can avoid any real searching in your application, you can make that a primary key in the database.
Figuring out how to organize these things is what databases are good at. By simply figuring out what your most-common search keys are and making them primary, and then not worrying about the problem any further, you're leveraging a *huge* amount of work done by other people. Unless what you've got in mind is really weird, you are extremely unlikely to come up with anything more efficient than just using the DB as intended. If what you're doing is that weird, you may be better off using flat files and index tables than a DB in the first place -- but that's really, really unlikely.

Yes.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Joe Pfeiffer wrote:
Hi Joe,
Thanks for the thoughts. Once again, sorry I've been slow to reply. I'm working on several projects at once.

No, I don't think I've conveyed my meaning correctly. It would not be one event per table, but one type of event per table. I was thinking of sorting the events into different tables so that each table contains one type of event - specifically a subset of events that I'm likely to want to retrieve all at once, without any sorting.
Does establishing a connection to a particular table take a significant period of time, such that I would need to minimise the number of connections to tables that I'm going to open and close?

I'm sure you're right here. Ultimately, this may not be the simplest beginner's database project I could choose, but it certainly isn't the most complicated either.
I was guessing that using the database for searching and sorting as much as possible would be more satisfactory than using a PHP script for the same purpose.

I'm afraid I don't quite understand why this is the case. Possibly my explanation has been too abstract. But if it adds anything, each event will have the same fields in a row, with no empty fields. But I will never want to retrieve all the events at once. I will always want to retrieve a subset of the events, and I was thinking of sorting them into tables according to subset, so that the database didn't have any sorting to do when it came to retrieve the events. Instead it would just need to sort the events when inserting them into the different tables.
You also mentioned O'Reilly books a while back. I already have Jonathan Gennick's "SQL Pocket Guide", a great little book which explains the differences between the major SQL database systems.
I had a look on Amazon and the only suitable O'Reilly PostgreSQL book I could find was this one: (Amazon.com product link shortened)
However, it seems to have a lot of bad reviews, saying it is lacking important information, especially in the index. Do you (or anyone else) have this book? If anyone does, it would be great to have another opinion on it.
Many thanks,
Chris
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Ah, sorry, I misunderstood (whew!). I guess this would depend on how different the event types are -- if they've all got exactly the same fields, then just having an event type field may well be the most logical approach. But if they're really *different* types of events (in spite of having the same fields), then breaking them up may sense.

No, that wouldn't be a worry (establishing a connection to the database takes a while, but not grabbing a table). On the other hand, I wouldn't expect filtering on an 'event type' field would be terribly time-consuming.
Sorry I'm being so wishy-washy on this one...

Besides, if you're anything like me, you'd never get around to learning it until there was a real project to learn on!

Vastly more satisfactory. That's what databases are good at.

Hopefully my wishy-washy answer above does a better job of explaining what I was trying to get at on this one.

(Amazon.com product link shortened)
Surprising to see an O'Reilly get bad reviews (though I haven't looked at their DB books).
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
On Wed, 08 Apr 2009 00:51:38 +0000, the infamous Christopher Tidy

I've been working with MySQL for years now, but all has been in limited online work, always with small databases; and I like the way DuBois writes, so take a look at his books on MySQL.
Everything I've read says that either program will work well for you. I recommend reading the reviews by people who are pumping gigs through them if that's your end desire for the DB. There are differences between their handling of transactions, but I believe those are getting slimmer as time goes by. There's a real competition between MySQL and PostgreSQL.
G'luck, Chris!
-- You can't do anything about the length of your life, but you _can_ do something about its width and depth. -- Evan Esar
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Larry Jaques wrote:

I'll see if I can take a peek on Google Books.

From reading around I get the same impression too. My inclination is towards PostgreSQL, partly because of the advanced features, partly because I like the philosophy behind the project, and partly because some guys I respect here in England use it for their own projects.
But I'm asking here just in case there's a hidden "dog" side to PostgreSQL. If anyone has bad things to say about it, I want to hear :-).
Best wishes,
Chris
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
On Wed, 08 Apr 2009 20:04:28 +0000, the infamous Christopher Tidy

And Amazon.uk?

Here ya go, sport:
http://www.google.com/search?q=mysql+vs.+postgresql
Spend a week with the meager (13 million) results, and let us know what you chose.
I think My has a larger base of potential helping hands, but like Linux, the PG base is getting larger all the time.
-- You can't do anything about the length of your life, but you _can_ do something about its width and depth. -- Evan Esar
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Larry Jaques wrote:

Thanks for the links, Larry. I'll check them out over Easter and let you know what I choose.
Best wishes,
Chris
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
On Sat, 11 Apr 2009 01:13:31 +0000, the infamous Christopher Tidy

May the Bunny be with you. Lettuce pray.
-- I define comfort as self-acceptance. When we finally learn that self-care begins and ends with ourselves, we no longer demand sustenance and happiness from others. -- Jennifer Louden
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Be aware the unless you want to release your application as Open Source you may need a license to use MySQL commercially.
http://www.mysql.com/about/legal/licensing/oem /
                            ---john.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
John Haskey wrote:

My understanding was that the commercial licence is only required if you distribute software. To me, running a website is not distributing software, it's providing a service. But my interpretation of the licence terms could be wrong. If it is, please let me know.
Best wishes,
Chris
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Polytechforum.com is a website by engineers for engineers. It is not affiliated with any of manufacturers or vendors discussed here. All logos and trade names are the property of their respective owners.