OT: MySQL versus PostgreSQL and database design

I am using mysql for algebra.com and at work and I am satisfied. 30+ queries per second, no problem. Your data needs to be normalized, for which you can look up "principles of database normalization".

A good approach for optimization is to set up a master database, and as many slaves as necessary that would replicate the master in real time.

Then you can direct only updates to the master and all queries to slaves.

i
Reply to
Ignoramus7501
Loading thread data ...

Hi folks,

Please excuse the off-topic post. I thought I'd ask this question in rec.crafts.metalworking because there are some smart people here.

I need a database for a major website project. Eventually it'll grow to be a large database, probably gigabytes in size. It needs to be straightforward to build a web-based interface for the database. Probably using PHP, as I'm familiar with PHP and don't fancy learning Perl right now.

Data integrity is more important to me than speed, although speed is not unimportant. I don't want to have to spend time restoring a corrupted database from back-ups. The database will be modified frequently, i.e., there will be substantial write activity as well as read activity.

I can't afford to pay for a commercial database system like Oracle. It needs to be free software. It also needs to run on a Unix or Linux server of some kind. I don't want to use a Windows server for several different reasons.

I've come to the conclusion that there are two serious contenders: MySQL and PostgreSQL. I get the impression that MySQL interfaces with the web slightly better than PostgreSQL, but that PostgreSQL has a better reputation for data integrity and a more precise philosophy behind it. PostgreSQL also appears to have a wider range of advanced functions, such as approximate string matching, which may be useful to me. As far as I can see, the initial set-up costs for the two systems are similar. Right now, I'm tending towards PostgreSQL, but I'd really like to hear from some knowledgeable people. Can anyone make a recommendation based on experience?

I could also do with having a book which explains best practice in database design. Particularly how to structure a database for best performance. Intuitively I feel that if I have many different types of data, it will be best to have one table for each type of data, rather than having several types of data in a single table. That way, when I go to retrieve a particular kind of data, the server won't have to do any sorting. It can just get everything from a single table. But I'm not sure if there are disadvantages to having a great many tables. This is where a good book would be useful. Can anyone recommend such a book?

I have a small amount of experience of using SQL Server, and much more substantial experience of PHP, Apache and various versions of the Solaris SPARC operating system.

Any thoughts on the subject would be much appreciated.

Many thanks,

Chris Tidy

Reply to
Christopher Tidy

On MySQL vs. PostgreSQL -- I've used both, and the differences really don't seem significant. If you've found features you like in PostgreSQL, go for it. The work I did building my own database with a web interface was PostgreSQL, and it interfaced just fine.

Your questions about "table per type of data" are too vague for me to be able to address. The key to any relational database is that what you want in the tables is the relationships between the data; a row should in some sense be a bunch of related information. If you'd like to send me an email we can try to be a little more specific.... I don't know what the good database books are, but I'll be that you'll find one from O'Reilly.

Reply to
Joe Pfeiffer

On Wed, 08 Apr 2009 00:51:38 +0000, the infamous Christopher Tidy scrawled the following:

I've been working with MySQL for years now, but all has been in limited online work, always with small databases; and I like the way DuBois writes, so take a look at his books on MySQL.

Everything I've read says that either program will work well for you. I recommend reading the reviews by people who are pumping gigs through them if that's your end desire for the DB. There are differences between their handling of transactions, but I believe those are getting slimmer as time goes by. There's a real competition between MySQL and PostgreSQL.

G'luck, Chris!

-- You can't do anything about the length of your life, but you _can_ do something about its width and depth. -- Evan Esar

Reply to
Larry Jaques

Since 2004 based on my sql. I did a major overhaul of algebra.com in

2004, and made it into a social website where students and tutors exchange math questions for free.

I think that you will be fine either way.

It is not the volume, but expense of queries.

I have no opinion on postgres.

i
Reply to
Ignoramus14359

Some queries are more expensive. You just have to know which are expensive.

i
Reply to
Ignoramus14359

How long have you been running Algebra.com for? I did various searches online, trying to find accounts explaining why people were dissatisfied with either MySQL or PostgreSQL. Many of the accounts of problems with MySQL date back 7 to 10 years. I couldn't find any substantial accounts of dissatisfaction with PostgreSQL. That could be because it's good, or because few people use it, I don't know. But I get the impression that there is less of a difference between the two systems than there was when they were first released. It seems they've both matured.

That would only work well if you have a much greater volume of queries than updates, is that right? In my case, I'm not sure that will be true. I suspect the volume of queries will be higher than the volume of updates, but perhaps not by that much. Updates are going to be a big part of the picture.

From reading a few documents online, I get the impression that PostgreSQL gives you (the programmer) less room to get things wrong. In contrast, I get the impression that MySQL can be configured in a greater variety of ways, using different storage engines, with some of the possibilities being a bit questionable where data integrity is concerned. But I'm guessing you got things right with Algebra.com, so you've got the best out of the system.

Thanks for the advice.

Best wishes,

Chris

Reply to
Christopher Tidy

That's good to hear.

I'll try to give a brief explanation. To give you a very simplified idea of what's going on, I have two kinds of data: data which concerns people and data which concerns products. I actually have more than two kinds of data, but these two serve as a representative example, I think. There will be substantial similarities between the two kinds of data. Loosely speaking, both kinds of data form a record of events. So I was considering putting both the people data and the products data into the same table. However, there are certain pieces of information which can only apply to a person (e.g., sex) and certain pieces of information which can only apply to a product (e.g., manufacturer). So if I put both kinds of data in a single table, certain regions of the table would be empty, for want of a better word. Also, as I'm not going to want both kinds of data at the same time, the data will need to be sorted every time I want to retrieve it. This makes me think that splitting the data into two tables, each representing one kind of data, would be more efficient.

However, events could be broken down further by month and by country, etc. Taking this process to its ultimate conclusion could, at a quick calculation, give me a database with 100,000 tables. This is why I'm wondering if having a large number of tables can be a bad thing.

I'm also interested to know if there's any advantage to having separate databases for separate kinds of data, except for security reasons.

I also need figures for the mean and standard deviation of a series of numbers. Ideally it would be good if I could constrain a record in the database to be the mean or standard deviation of a range of other records, so that I don't have to keep re-calculating the number using PHP. But I've yet to figure out if this is possible.

I hope this makes some sense. I may not have always got the terminology quite right.

I'll have a look for an O'Reilly book on the subject. I need to gain a better understanding of the concepts involved. At that point, I may take up your generous offer of advice, but for now I fear I might end up asking a bunch of really obvious questions!

Thanks for the advice.

Best wishes,

Chris

Reply to
Christopher Tidy

I'll see if I can take a peek on Google Books.

From reading around I get the same impression too. My inclination is towards PostgreSQL, partly because of the advanced features, partly because I like the philosophy behind the project, and partly because some guys I respect here in England use it for their own projects.

But I'm asking here just in case there's a hidden "dog" side to PostgreSQL. If anyone has bad things to say about it, I want to hear :-).

Best wishes,

Chris

Reply to
Christopher Tidy

Thanks.

Interesting. So queries are more expensive, in terms of CPU time, than updates? If so, I'd have thought that anything I can do to reduce searching and sorting that the storage engine has to do, such as breaking varied data into separate tables, would be a good thing.

Best wishes,

Chris

Reply to
Christopher Tidy

Be aware the unless you want to release your application as Open Source you may need a license to use MySQL commercially.

formatting link
---john.

Reply to
John Haskey

On Wed, 08 Apr 2009 20:04:28 +0000, the infamous Christopher Tidy scrawled the following:

And Amazon.uk?

Here ya go, sport:

formatting link
Spend a week with the meager (13 million) results, and let us know what you chose.

I think My has a larger base of potential helping hands, but like Linux, the PG base is getting larger all the time.

-- You can't do anything about the length of your life, but you _can_ do something about its width and depth. -- Evan Esar

Reply to
Larry Jaques

dont worry about license if you run a website.

i

Reply to
Ignoramus13610

My understanding was that the commercial licence is only required if you distribute software. To me, running a website is not distributing software, it's providing a service. But my interpretation of the licence terms could be wrong. If it is, please let me know.

Best wishes,

Chris

Reply to
Christopher Tidy

Thanks for the links, Larry. I'll check them out over Easter and let you know what I choose.

Best wishes,

Chris

Reply to
Christopher Tidy

On Sat, 11 Apr 2009 01:13:31 +0000, the infamous Christopher Tidy scrawled the following:

May the Bunny be with you. Lettuce pray.

-- I define comfort as self-acceptance. When we finally learn that self-care begins and ends with ourselves, we no longer demand sustenance and happiness from others. -- Jennifer Louden

Reply to
Larry Jaques

As a general rule, if there are empty entries in some of the fields of a row, that's evidence of a problem in the design (I'm being a little broad, and avoiding an absolute "the design is broken" pronouncement. But it's a sign).

What if you had an 'events' table, that had the parts of events that were common to people and to products, and an ID? Then a person-events table could have fields specific to a person, and the ID of an event, and a product-events table could have fields specific to a product, and the ID of an event.

I'm a little vague on just what an 'event' is, but if every event takes place in a particular place and with a particular time, then you'd want to combine them into a single table.

Another general note -- just as blank fields is evidence for a poor design, any time you ever have two fields that are *always* together, it's evidence they belong in the same table.

Not a direction I've ever explored.

Also something I haven't explored -- the approach is called using a 'calculated field'.

Happy to (try to) help -- just remember it's worth what you paid for it!

Reply to
Joe Pfeiffer

If I understand what you're suggesting, putting individual events (as opposed to event types?) in different tables would be an odd way to do things. You'd have to do something like establish DB connections based on event ID (or similar) to retrieve the table for the event -- in effect, you'd be doing the searching in your application instead of taking advantage of the database. If you're able to define a key in a way that you can avoid any real searching in your application, you can make that a primary key in the database.

Figuring out how to organize these things is what databases are good at. By simply figuring out what your most-common search keys are and making them primary, and then not worrying about the problem any further, you're leveraging a *huge* amount of work done by other people. Unless what you've got in mind is really weird, you are extremely unlikely to come up with anything more efficient than just using the DB as intended. If what you're doing is that weird, you may be better off using flat files and index tables than a DB in the first place -- but that's really, really unlikely.

Yes.

Reply to
Joe Pfeiffer

Hi Joe,

Thanks for the thoughts. Sorry for my very slow resp> As a general rule, if there are empty entries in some of the fields of

Thanks. That's what I was guessing.

I had wondered about this. But the thing is, I'm never going to want to retrieve all the events at once. I'm only going to want to retrieve all the events relating to a single person or product at once. So I thought it might make sense to split the data into many tables rather than few, so that each table contains exclusively the data I'm going to need at one time, to avoid searching/sorting?

Even if you didn't want to retrieve all the events at once?

I might take up your offer of discussing this further by e-mail once the ideas have developed better in my mind. But to explain what an event is in public rather gives my game away.

Thanks. That's a useful tip.

Best wishes,

Chris

Reply to
Christopher Tidy

Thanks.

Best wishes,

Chris

Reply to
Christopher Tidy

PolyTech Forum website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.