The Database Administrator is dead
thenextweb.comI think someone's spent too long in tech-startup-land and has forgotten that there's a whole bunch of businesses outside of that, some of which don't have access to reliable internet connectivity and need to keep going even if the Internet's down.
I guess I had better tell my colleagues that our jobs are all dead! Wait, it's an absolute statement for a headline, so it's actually "absolute crap".
> These days, the technology decision maker is the dude with Sublime Text open and a cloud control panel up in Chrome.
And when he is successful and gets clients, and a few thousand rows in his database, he realizes that he needs someone to keep that database alive. He needs someone to figure out how to make the cartesian product queries he's written into efficient queries.
At first, he hires a consultant for a few one-off gigs. However, then he's paying someone $200[1] an hour, typically with 8-16 hour engagements. After getting sick of that cost, and still lacking any kind of long term caring about his product, he comes to our team, and hires us to be his DBA, albeit remotely.
Business as a DBA is booming. Nobody thinks they need a DBA, but the reality is that you really can't afford to not have a DBA. We have customers coming on board with no backups, no high availability plans, no disaster recovery plans, queries that are performing cartesian products (and thus taking minutes against very small datasets), and no monitoring. (And yes, a good portion of users come to us while using the "solution" proposed by the OP (like AWS RDS), for many the same problems.)
We set them up with comprehensive backups, automated failover solutions, and 24x7 monitoring. Suddenly, their DB is no longer the primary source of downtime. They're no longer loosing customer engagement because their frontend takes seconds to render. They're no longer in the position of loosing their entire company because some junior developer accidentally dropped their users table in production.
In short, DBAs are a required part of your business, if you're using a database. You just haven't been burned bad enough by a poor database setup to realize it.
[1] Actual hourly rates for a planned engagement. Emergency rates are closer to $450 an hour. Why so much? You can't get a DBA from a college, from a technical school, or from any other form of formal education. Most DBAs these days are grown internally from developers or system administrators who decide to (or are forced to) specialize while on the job. There are single-digit thousands of us world wide, and we're in high demand.
Apparently you didn't make it to the conclusion.
The whole post was making the exact same point as you in the end, the author actually runs a datastore-as-a-service business.So perhaps the role of the DBA isn’t necessarily dead, it’s just moved to its new home at the datastore-as-a-service provider. The successful DBA will understand that this new world means handling petabytes of data and billions of operations on thousands of logical databases. They will cope with less mature database technologies in increasingly difficult workload environments. They will automate or die. Long live the DBA.I did, except that:
> The successful DBA will understand that this new world means handling petabytes of data and billions of operations on thousands of logical databases.
is untrue. Most of our customers have DBs that are in the GB size. A few have TB size DBs, and none are on that scale.
datastore-as-a-service doesn't replace DBAs - we make good money being remote DBAs for people who are using datastore-as-a-service providers, because they still run into the same problems as everyone else.
> He needs someone to figure out how to make the cartesian product queries he's written into efficient queries.
I wish our DBAs were like that. I am a developer with Sublime Text. I have to make the queries fast. I have to design good indexes.
They can complain if a query is slow, but they never actually help to fix it. They only have to make and restore backups when disks die.
Sounds like you have a sysadmin with mad interview skills. ;)
Sorry to hear it, either way.
Exactly. You could extrapolate this into any "The [whatever] is dead" statement that we see so often. And it's not getting any easier - e.g. are we capturing and processing less data these days? Is the data we store less valuable?
> You can't get a DBA from a college, from a technical school, or from any other form of formal education. Most DBAs these days are grown internally from developers or system administrators who decide to (or are forced to) specialize while on the job.
I can't say how much truth is in this statement. This stuff if learned organically by doing it on real world projects. It's scary at times to think that you just can't teach this stuff.
> This stuff if learned organically by doing it on real world projects. It's scary at times to think that you just can't teach this stuff.
Yup. And with the supply being so low, it's hard to get a DBA (you'll probably have to steal one from another business), so people are flocking more and more to DBaaS, and DBaaS providers are more than happy to propagate the fiction that "you don't need a DBA, you have us!".
It honestly doesn't bother me much that they make these statements; it's marketing.
On the other hand, believing those statements harm our customers; they spend time and money to migrate to these providers and find out the hard way that they still need someone who can handle their DBs for them. That does bother me.
I disagree. dedicated DBAs are on their way out.
Databases are becoming pretty good at managing themselves and the marginal performance gains from tuning usually are easily offset by throwing bigger kit at the problem or throwing more cash at the plan you are on.
Often the act of throwing "bigger kit" at the problem requires specialized tuning of the DB to be able to take advantage of it.
More memory? Increase the buffer pool size.
Faster HDD? Tweak the settings that determine how many disk operations are attempted every second.
Bigger CPU? Figure out the point of diminishing returns on the number of CPU cores for your DB, and start sharding onto multiple DBs to make sure you can use all of the cores.
SAN? But I thought you wanted performance. ;)
Plus, what gives you the best DB kit for the buck? I could probably tell you that (I am a DBA, and get paid to answer those questions), but do you know? Do you know where to find out?
Databases definitely require knowledgeable tuning. But so do most other complicated moving parts in a modern infrastructure - web servers, app servers, kernels, cache servers, etc. Sysadmins manage those quite successfully. In my experience, they also manage databases quite successfully. The idea that a database is a special beast that requires special keepers is a holdover from a dark age.
> The idea that a database is a special beast that requires special keepers is a holdover from a dark age.
As are most databases (which are 10-20 years old).
Compared to a webserver, a DB is significantly more complicated, and significantly more important to your average business. Nginx takes a nosedive or performs poorly, and nobody really cares. Your DB takes seconds to respond to basic queries, and your entire business suffers.
I'm guessing you've yet to be bitten by DB performance issues that couldn't be resolve by adding indexes or basic query profiling. I'm glad to hear it, because it's not fun to have to set aside your day job and dive beneath that.
This is all trivial knowledge under the "application performance" banner. All of our development staff are capable of diagnosing these issues and providing suggestions.
> All of our development staff are capable of diagnosing these issues and providing suggestions.
I hope you're paying them well, because if they're all crack DBAs on top of being crack developers, they're the proverbial unicorn, and are probably worth $2-3 hundred thousand apiece.
No they just know how not to fuck up their queries to start with so we don't need a DBA to optimise 300 join statements...
Knowing how to write good queries and indexes is admittedly 80% of having a good performing database. However, as with any other 80/20 split, the remaining 20% is the hard part, and you'll end up paying a specialist to take care of it for you.
Or your dataset and query volume will not grow to the point where you actually need a performant database, but that's a business problem, not a DB solution.
I wonder how much of your business is represented by startups and really capable development teams ... and how much is making hibernate perform better (or should I say "less horrible").
Development teams that go after speed will soon realize that re-serializing the data they need for a web page push and sending it over the network results in a factor 5-50x drop in performance over having that data in a local data structure. And God forbid you have even minor packet (like 0.0000001%) loss in the network between your webserver and database server. Your 99% tail latency will sprint for the 10s mark.
And as the DBA's here said : most databases are in the GB range, with a few in the tens or hundreds. SQLite will beat the crap out of any other solution at those sizes.
People don't realize how much tail latency affects maximum QPS. Once you calculate how one affects the other you see that tail latency is the enemy of performance in webservers. A webserver that can generate 100% of responses in 1ms can serve 1 million qps (ie. can saturate a 10Gbps link). At 99% at 1ms and 1% at 10ms (the very minimum to execute a single query against a database that isn't local to your machine), you're left with 900k. More typical database figures would be 20ms average and 600ms for the 1%, which will leave you with 30k QPS. In this example using a database cost you 97% of your original performance.
That 97% figure is perfectly normal. So if you want decent performance, using a database server (in the serving path) is just not in the cards at all.
MS SQL Server has always been good at managing itself. I can only guess it is getting better.
But Oracle needs the finetuning, and I can't see that changing anytime soon, with Oracle some queries basically require that you use a IOT, other some are better with partitioned storage, and so on.
So your premise is extremely dependent on what DB you use.
I disagree. dedicated DBAs are on their way out.
Is ETL considered a DBA task?
Not for us. Our developers built an API and tool chain to self service.
I normally associate DBA with efficiency and performance of data storage. However, you comment describes a blend between that, and a security administrator who's job include the creation and maintenance of high availability plans, disaster recovery plans (like backups), and monitoring.
Do you think the overlapping speaks for the decrement of either profession specialization?
No - DB software is highly dependent on the system it's on. So much of a DB's underlying performance is going to depend on matching the settings you create for your DB to the hardware you're on (and optimizing the system to run a DB).
In other words, efficiency and performance both depend heavily on the machine your DB is running on.
As such, a good DBA needs to be able to do sysadmin tasks. The business won't care that it was the sysadmin's fault for not realizing that a battery had gone dead on the raid controller, and a DBA shouldn't care either. Their purvue is the database, and everything that it entails.
What happens if you also have a “Web Server Administrator” with a similar holistic view? Won’t they step on each others’ toes? Is this not why you have separate roles, like the Dev/Ops division?
Well, your web server is not your database server. If it is, just know that your first scaling task will be to separate the two onto separate systems.
I was in a hurry when I wrote my earlier reply - system monitoring is not the only monitoring we perform. We watch the database itself for health and performance. We watch for abnormally long queries, abnormal locking, excessive deadlock resolution, replication health, and exploits.
Backups themselves require a bit of specialization - there are at least four different methods I'm aware of off the top of my head, and the proper choice for which to use is going to depend on the business (we typically set up 3 of the four - logical dumps, rolling binlogs, and binary dumps).
HA plans, particularly live failover, requires a deep knowledge of the database, its replication solution, and your business needs. Safely failing over a live MySQL database in a way that results in no data loss is hard to do.
Quite a bit of specialized knowledge is required to work with databases, even for normal sysadmin tasks.
That's too bad since this is one operation that incurs a very significant local drop in performance. The best optimizations I've ever done in website servers is to introduce local caching where reasonable, ignoring updates. User records, item records, comments, sessions, all would be cached locally. You wouldn't believe the speedup.
Having your data on a different server as your website introduces, effectively, a much higher minimum latency. It adds 5-6 ms to every single page load. If I were writing the site that would be a factor 5 drop at least. This is extra time that your webserver has to keep state for a request, and so effectively can mean a drop of a factor 5 or more.
The DBA said then that this would undermine consistency. And, of course, he's right. However for most things websites don't actually need consistency (plus the web request response model makes it impossible, as the data client-side is not included in transactions). The prices at checkout, sure, there you want consistency. Everything else ... do you really care if it takes 1-2 minutes for your webfarm to be entirely in sync ? (and if it's important enough you can implement "clear this cache item now" requests too, which I did for user records after a complaint)
Databases are so ridiculously slow it's not funny. If your data fits in 64-bit address space, an mmap'ed serialized data structure will blow it's socks off, and for everything that doesn't absolutely require synced data it's fast. If your critical data is small enough, keeping it in memory with a background thread getting signaled to dump it to disk after every update may also work.
I wrote a network-as-a-service infrastructure for an isp once (something to do with data on cell phones), where each machine would happily do 1Gbit of web traffic, directed by it's 5Gb database (meaning it would need to look up user account balance, potentially update it, for every request), consistently and quickly (99.9% of requests was < 1 ms). Mmapped protos FTW !
Databases are a language + a B+tree data structure for those who can't, or won't implement it themselves. Generalized databases have zero hope of competing with specialized ones.
Most DBAs these days are grown internally from developers or system administrators who decide to (or are forced to) specialize while on the job.
I agree with you on above statement. Though my conclusion is different from yours. I think in the future the line among DBA/SysAdmin/Developers will become even more blurred , developers will be trained/required to take over more and more work from DBA and sysadmin (DevOps anyone?); consequently, the demand for dedicated roles such as DBA and sysadmins will diminish. Hope I am wrong though.
My role within our DBA group is that of devops exactly (after spending a year as a line DBA).
Given that background I still think that the topic of databases is just too deep for a generalist. I know a lot about the MySQL database (and a little about PostgreSQL), enough to write failover software, automate deployments, write guardian crons which slap down problematic queries & pre-emptively, automate backups, do vip failover and haproxy configuration... and I still have to go to my boss for most of the hard questions.
His knowledge encapsulates 12 years of working with and around MySQL, and it's proven invaluable to our customers. Knowing when to force certain optimizations, how to make subqueries run O(1) vs O(n), how to rebuild a complete database from binary logs, how to configure MySQL to work with SSD caches... these problems don't come up often, but when they do, not having a DBA available to you means contracting out to one at exorbitant rates.
It's the difference between a few minutes of downtime when the proverbial dung hits the fan, versus a few hours or days.
Could you provide any tips/advice/reading recommendations to a programmer who is interested in learning more dba stuff? It would be appreciated!
Sure - here's a list of what's good to know for MySQL. Other DBs are going to have different needs, though the indexing data is good to know regardless.
Start with the book High Performance MySQL. [1]
Follow up with the whitepaper "Causes of Downtime". [2]
Then find a copy of the IMDB dataset, put that in a database, and write an app against it. Make that app perform well, then simulate load against the app (pretend it hit the top of Reddit and Hacker News simultaneously), and keep it performing well.
After that, it's a matter of practical practice.
[1] http://www.amazon.com/High-Performance-MySQL-Optimization-Re...
[2] http://www.percona.com/redir/files/white-papers/causes-of-do...
If you are working on confidential data, it is very likely that your security policy does not allow you to store it "in the cloud", or to hand it off to a third party without the kinds of contractual obligations that will perplex and confuse anyone trying to sell a best-effort service.
Thus your databases will be in-house, thus you will hire one or more DBAs and depend upon their expertise.
>> If you are working on confidential data, it is very likely that your security policy does not allow you to store it "in the cloud"
The regulations will change with the times, hopefully.
We stored little more than names/emails and non-identifiable/non-sensitive data at my last job. The 'security auditor' for one client wouldn't let them sign with us because our servers were hosted at Rackspace, so the servers were not under our control and a Rackspace employee could access our data, since they managed our servers.
It's not a matter of regulation. It's a matter of liability. As long as the company running the database-as-a-service won't accept full liability against breach of privacy for the information stored there by clients, clients can't store confidential information there.
And who wants to do that?
The DBA is essential for big companies, not only Facebook or Apple. The author of the post works for a database-as-a-service provider, it's just marketing.
Exactly.. Why bother optimizing your database, just spin up more servers (which we will happily continue to squeeze your opex budget dry)!
It seems cloud/SaaS is the new answer to throwing hardware at the problem. Something about the inefficiency grinds at me. I do get at certain scales, it makes sense.. But it's not always obvious where that line is.
The cloud is killing all the low-level admins. Mail administrators? Backup operators?
Moving to BYOD will continue to reduce the numbers of administrators as well. Eventually you'll have two groups of IT staff, a very small group of high-level engineers who build and implement everything, and then a very large group of low-skill helpdesk type people who reset your accounts and fill in your login information on your device.
Amazon's Mayday support service shows this already. Soon, a form of this will be on every product you can think of. Office, Windows, every tablet and computer in your office.
I couldn't agree more. The most apt analogy to explain this to non-tech folks I can come up with is how the automobile went from a hobby toy for the mechanically inclined to a tool people use everyday.
There will always be a place in the world for engineers who understand how to get the most of out an internal combustion engine and ways to improve it, just like for database engineers. But we don't all need our car in the shop twice a month having parts tuned. Nor do the vast majority of databases hitting performance limits that would require full-time DBAs.
Consulting is a great business decision when you reach hundreds of GB in data. In-house when your tossing around Tera or Peta-btyes. By the way, I can whole-heartedly recommend http://www.pgexperts.com if your having Postgres issues. After speaking with a few of their core engineers at DjangoCon 2013 I would consult with them anytime I had PG performance issues.
TL;DR: Cloud-based DB co-founder says that most of his customers don't employ a DB admin.
Um, it would only be surprising if they did, that would surely indicate that his company wasn't doing a good job.
True.
But most of these folks didn't have DBA's or even OPS engineers before they became customers.
Mostly they are teams of developers, they have developed the application and have (I think correctly) worried about the customer experience, code base, IP, performance, etc.
They do want the database driver API (in this case MongoDB) to just work beyond the interface. Backups, scaling, filesystems, etc they want to be part of the service, and just handled.
This is true for new business and it's true for businesses that are developing new apps or new projects or even projects that can't scale out on other infrastructure.
Full disclosure: I wrote the article in question ;-).
The DBA may be dead, but he has a lot more life in him than "MongoDB database-as-a-service" is ever likely to have....
I am a database administrator.
I somewhat agree with the conclusion which essentially says having your on premise DBA is on the decline.
Hiring DBAs has always been tricky for employers anyway. It's a position of responsibility (to protect the business' data) that is hard to know you've hired the right person for, difficult to replace, and difficult to allow to take any vacation/leave since there will seldom be more than one.
What I don't agree with in the conclusion is that by moving off premise it's all moving to datastore-as-a-service. Large amounts will also move to remote DBA as a service (consulting firms).
If they define DBA as a guy who sit in a room 9-to-5 for a $100k+ salary, then, yes, there is a recession, you know. But usually (or rather unusually) DBA is an engineer, who could do much more than looking at EXPLAIN output and tune some variables once in a week.
I have been Informix DBA for years and I could tell that we were the strongest guys in a team, because in order to do our job we had to understand (abstract out of running system) the data-flows, access pattern (and especially locking issues), actual server's topology (disk controllers, channels, hard-drivers) data partitioning (where this or that table-space lives, what's else on this volume, this channel, this controller) what are the access patterns for each table, how indexes are utilized, etc.
We also have patched, compiled and installed all the required software (have you ever tried to compile Informix support into PHP4? you definitely should.)) and to teach coders how to use it, and then deal with access patterns of silly scripts, etc.
The claims that some crap like MongoDB (of all things!) service could replace skilled, productive, (but, yes, quite expensive) professionals is, of course, utter nonsense (what else we could expect from MongoDB?).
DBAs and Sysadmins (real ones, not these clowns who use nothing but chef or puppet and doesn't know how ./configure && make works) are becoming extinct purely from economical reasons, and all these cloud services, ironically, require even more knowledge to deal with, because all that virtualization crap messes everything up even more (google for redis on EC2 for a change).
Sadly, idiots are taking over the world slowly but steadily,)
__DBAs and Sysadmins (real ones, not these clowns who use nothing but chief or puppet and doesn't know how ./configure && make works) are becoming extinct purely from economical reasons__
As long as we're insulting people ... it's going to be 2014 in three weeks. Who compiles anything for PHP4? Boasting about an ability to handle other people's messy scripts sounds like an anti-pattern to me, and your recommendation to your org should be to develop a plan to refactor anything dependent on a library that has been deprecated for 6 years and actively out of development for 5.
I also feel like the vast majority of folks claiming merit badges as DBAs in 2013 are the product of failed technology roadmaps and technical debt with no plans to pay it off. I'm sure there are still places where true, full-time DBAs are worth their paycheck, but in my experience these are far and few between with the current status of OSS database options and hardware performance.
> As long as we're insulting people
People may assume whatever they wish.)
> Who compiles anything for PHP4?
It was long ago, but you probably wouldn't believe how many people are scared to touch anything, leave alone to perform even a necessary security update - "what if it stop working?!"
Unfortunately, frustration, which sometimes influenced wording of some of my comments, is based on a quite long time in the industry, and I never asked for it.
>The claims that some crap like MongoDB (of all things!) >service could replace skilled, productive, (but, yes, quite >expensive) professionals is, of course, utter nonsense (what >else we could expect from MongoDB?).
The claim isn't that the service replaces the skilled, productive and expensive folks entirely, those professionals are still needed. They just work for the service provider.
The business entity that creates the actual product can focus on just that, and not complicated DBA tasks. It's a developers world. ;-).
Also, it's not a MongoDB centric concept. Rackspace and Amazon have multiple data products now, with more coming, and they all fall under this concept in my mind.
DBAs have specialized knowledge related to the maintenance and optimization of a particular SQL engine. Developers see the database as a black box where data gets stored, but there is a lot going on under-the-hood to keep the data reliable, consistent, secure, redundant, and available.
The last thing you want is a full transaction log during peak hours.
> These days, the technology decision maker is the dude with Sublime Text open and a cloud control panel up in Chrome.
My sublime text is open and I have a cloud control panel open on chrome, am I the technology decision maker? Nobody told me that here on the company.
Wait... maybe nobody told me because... I'm the decision maker D=
Who organizes how the data is stored? In some companies, the database is largely ignored until performance and scaling is an issue. At that point a DBA is brought in and the next months or even years are spent fixing the design so the entire product can perform properly and grow.
Persistence ignorance introduced by mainstream ORMs about 5-7 years ago have turned the new generation of programmers blind to the fact database IS part of your application. Records you save don't get sucked into a magic black box that just responds to your queries as your app needs grow.
DBAs are not going to go anywhere. Sure, you can scale the DB in the cloud quite a bit in and perform well but it's not free :) In the cloud, you would be literally paying for your bad design decisions in terms of hard dollars rather than performance issues.
Most start-ups or in house apps are fine while the salary(dba) => cost of cloud. However, that magic condition starts returning false pretty quickly if you are doing any non-trivial data management.
The world is definitely changing for the DBA to which he made a very good points. I feel he graced over a big facet in a single line of the article which had to do with Apple/Facebook and large companies.
The banking world doesn't give their data to others and organization of ATM data, market data, customer account data, etc is only getting more complex and requiring excellent organization and management for terra data analysis to protect cusomters for fraud and worse.
So, a thought experiment.
Pick any data storage system you like - MongoDb, Redis, Riak, whatever.
Now, you get to place a bet. In the future (let's say 25 years) what will still exist: your choice of NoSQL, or the standard database with schema, SQL, access, and so on? If you choose wrong, you die.
Now you get to see the future and see if you die. Which are you betting on?
Sometimes you just bet on red AND on black to stay safe while keeping the wheel rolling and the drinks coming.
The DBA is not dead! They just look and smell that way!
But in all seriousness, if your app uses a database, you are incompetent not to employ an expert to help with the database, whether advising on the query plan of those non-performant queries, or what is the best setup for the current stage of the business and app, they are very useful.
Most people don't need dedicated DBAs and worse the DBA often never has sufficient domain knowledge of the problem to make an intelligent suggestion anyway.
DBs performance is complicated, yes, but the vast majority of it is extremely simple. It's just none of this simple stuff has to be learnt until it's too late and the cost of fixing it has dramatically increased.
There's a certain level where you need a DBA and that bar has been getting higher for years.
What we really need is to demystify DB performance, which for the most part is fairly simple.
What you need to do is teach your developers how to read those query plans. Show them how to find the expensive queries. To give them tools to easily see what queries their ORM is spitting out. To show them how to use a SQL profiler.
Tell your developers how query plan caching actually works. How a clustered index works and what you should and shouldn't put it on. Explain how indexes work. Explain how DB pages actually work and then it's obvious why certain indexes are a bad idea. Explain how relational keys are very important for the DB engine and leaving them off is not an 'oops', it's a serious mistake with long term consequences.
And that's at most a few days work. So why do you need that DBA?
Aside from that you need someone who knows how to maintain a DB, but again that's not particularly complicated and once it's done you can forget about it apart from the occasional sanity check that it's all working properly.
"Most people don't need dedicated DBAs"
That's a pretty easy statement to defend. But, I'll respond by saying that most Companies running Oracle 11g with more than a couple terabytes of databases, require a competent DBA, particularly if Disaster Recovery/Transaction Rollback is important.
" the DBA often never has sufficient domain knowledge of the problem "
Of the half dozen or so truly high level DBAs I've worked with (and managed on occasion), I can say they had incredible domain knowledge of Oracle Database Server, and worked extraordinarily hard to have next to zero knowledge of the application running on it. Their focus was to keep the database running, defend it from engineers and users, and recover it when things went really awry.
"DBs performance is complicated, yes, but the vast majority of it is extremely simple."
Any time you see the phrase, "Extremely Simple" when discussing a domain in which the expert practitioners routinely make $250K/year or more without any form of market manipulation, you need to reconsider why, exactly, these technicians are being paid so much to do something, "Extremely Simple."
"What you need to do is teach your developers how to read those query plans. "
Completely agree here, but, there are two perspectives on this topic. There is the "Engineers are ultimately responsible for the efficiency of their query plans, and should be educated/trained to take that responsibility" and then there is, "We can't train our engineers to be query plan experts, just keep them from shooting themselves in the foot, and let Query Optimizer handle the rests - it's up to the DBA to manage stats gathering to keep DBM_STATS healthy"
I think we tend to see the second approach more frequently in the enterprise, where your engineers are likely making less money, and the company is keen to leverages their many 10s of millions of dollars of Oracle Technology.
Finally, when a company is paying 10s of millions of dollars a years in Oracle licenses, they consider it a worthwhile investment to have a few high-level DBAs to fully leverage that investment.
Why would someone run Oracle in this day and age? "We have 100k lines of established PL/SQL running our business" is a reason, of course, but you could make a similar argument that COBOL development is alive. PL/SQL pretty much always leads to an unmaintainable mess, and if you need BIG DATA, you can go much bigger without Oracle than you can with it, and cheaper to boot.
Guh - every few months this comes up, and it's fairly hilarious to me how much anti-oracle dogma there is here.
Oracle is the most feature-rich database in the industry, and the most expensive, and the most complicated. This comes with benefits and curses.
It is freakishly powerful. If you want, you can roll back just your view of the database to three hours ago. You can show different versions of the same schemas to different clients. You can dynamically scale out your database to hundreds of servers without having to manually shard. You can do multi-master replication, master-to-slave, master-to-slave-to-master, any combination you can think of.
Your database clients can automatically detect when a primary has gone down and fail over to a standby, which will automatically be brought online, no admin intervention necessary.
If you give Oracle raw disks, you can tell it to automatically use the faster parts of the spindles for more commonly used data. Or give it some SSDs, and it'll use it for cache. Or, if you buy Oracle storage servers, it'll actually offload basic query execution to the storage.
With that power comes great cost and complexity, which is why many web companies don't bother with it - when you get to google or facebook scale you build these types of capabilities into your application tier.
But I know of a ton of multi-petabyte Oracle implementations at big traditional companies, and they love it. Because they don't want to have to build all of that functionality at the application tier, and they trust Oracle's reliability.
Well, I'll admit that I haven't used Oracle in 4 years, but back then, the automatic-sharding thing (goldengate IIRC?) just plain didn't work.
I guess I'm just biased towards solving scalability problems at the application level. It seems like an uphill battle to take a declarative/descriptive language like SQL and tune it to execute a query the way you want it executed -- it seems a lot more straightforward to just write code that does what you tell it to.
GoldenGate is for multi-master replication and shares a lot of the challenges you'll find with any multi-master replication solution.
No, I was referring to Oracle RAC, which does away the need for sharding. All of your nodes see a complete and comprehensive picture of the data on a shared set of disks.
> I guess I'm just biased towards solving scalability problems at the application level.
That's a totally valid strategy, and indeed, an option that many companies go for. But you're just shifting complexity from one place to another. Either you're going to be building in complexity to make your application aware of data distribution, movement, sharding, and so on - or you're going to use a more complex data storage platform like Oracle.
And SQL tuning is a skillset, much like writing good code. If you are good at SQL tuning, it's not that hard.
My point is just that it depends on your business requirements. I personally think Oracle's price point is so high that I would never use it. But if money were no object, and I was designing an application, why would I want to have my application have to think about where the data lives?
Wouldn't it be a lot simpler to just say, "go here for your data", and let the dedicated application deal with that?
Or put another way, in the same way it seems insane to shove 100k lines of business logic into the database layer, why doesn't it seem insane to shove 100k lines of data management logic into the application tier?
Because at a certain size of business, paying developers to make patches against your favorite OSS database doesn't scale. It's cheaper to pay for a, let's be honest, damned fine piece of software.
Oracle, despite being the very definition of evil, has one hell of a software product. It is performant, has built-in features that makes Postgres look like BerkelyDB, HA solutions, and perhaps most importantly, the backing and support of a multi-billion dollar company.
Eh. Different strokes for different folks, I guess. Most of those built-in features are anti-features IMO, and none of them make up for having to maintain long-ass PL/SQL functions as opposed to a reasonable programming language. And as far as performance goes, SSDs are cheaper than Oracle licenses, by a lot.
Oracle doesn't have any magic pixie dust that changes your underlying hardware. If you want raw performance, BerkeleyDB will wipe the floor with Oracle, because it's that much simpler. You've just got a much smaller feature-set as a result.
You probably see things differently than me, though. Related question since you're a DBA -- what are the best practices for testing PL/SQL, you have a link or anything? Every place I've seen it done was a nightmare and involved a lot of crossed fingers during releases.
I converted one of these ugly PL/SQL unmaintainable mess from Oracle to the MySQL query language.
Because the boss told us MySQL was cheaper.
Because of MySQL limitations, I think it is even more unmaintainable now. In real world terms, it is ugly and about ten people out of 500 can fix stuff there.
Well, IMO, you're saying that you basically trapped yourself in a box there. If you're completely determined to embed your business logic into a relational query language, then PL/SQL is probably the best there is.
I'm saying that your business logic shouldn't be there. You've got a whole universe of programming languages, potential disk formats and all that available to you. If you're writing huge ugly queries, and I have, that's when I generally step away from the computer for a bit and think if that complexity is better managed someplace else.
The stored procedures are not actually big ugly queries. But they surely have business logic.
So far, besides the ugly MySQL workarounds for some missing features, it has been a good decision.
And well, it was a company decision, they are the ones trapped. I'm free from maintaining it, while I had to maintain the first version with all business logic outside the DB.
> What we really need is to demystify DB performance, which for the most part is fairly simple.
Without getting into the rest of your argument, I'd like to quickly address this.
No, it really isn't.
In your typical MySQL database, your performance for a simple "select * from x where y" is going to go through a lot of complicated machinery (most of which can be tuned for performance), a few points of which I will enumerate below.
Any and all of these can (and often should) be tuned. There are 600+ page books and very old (and oft updated) blogs dedicated to this topic... it's not something you can teach a developer in a couple of days.1) Acquire a query cache lock & see if this query is there 2) Run the query through the optimizer 2a) Perform multiple shallow dives into a table to look at the cardinality of the filtered columns 2b) Identify the best indexes based on the shallow dives 2c) Create a query plan 3) Push the query plan down into InnoDB 4) Load the index into memory, if its not already there 5) Load the potential rows into memory, if they are not already there 5a) If there's not enough memory, load a few into memory, and be ready to push those rows out of memory in favor of more rows when needed 5b) Load rows that are still in the insert tree but not yet part of the regular buffer pool or pages on disk 6) Loop through the candidate rows for matches to the filter 7) Return the data to MySQL 8) Acquire the query cache lock & update it 9) Return the data to the clientAs an example, I attended an introductory course to being a MySQL DBA; it lasted 5 days of 8-5 teaching & running examples. And it only scratched the surface of what I do on a daily basis.
This sort of nonsense is exactly the sort of mysticism I'm talking about.
My, for example, do you think every query needs to be tweaked for this one:
Push the query plan down into InnoDB
So why did you include it on the list?
5a + 5b, Is MySQL so bad at memory management that you tweak it for every query?
So why did you include it.
Virtually everything on that list are implementation details that the vast majority of applications and developers don't need to even think about.
> Virtually everything on that list are implementation details that the vast majority of applications and developers don't need to even think about.
Particularly if you hire a DBA.
But that's not what you meant... and you're right. If your business doesn't generate enough query traffic and data to stress your database, then no, you don't need more than the indexing and smart query development which your average developer can learn.
The moment your business hits that wall, however, you'll be scrambling to get someone who can dig into those "implementation details" and wring every ounce of performance out of your database.
> [...] do you think every query needs to be tweaked for this one: Push the query plan down into InnoDB
Nope. However, for that one table that is better served by using Tokudb, or MyISAM, or the Archive engine it does matter (yes, there are actually use cases for using MyISAM tables instead of InnoDB).
> 5a + 5b, Is MySQL so bad at memory management that you tweak it for every query?
Again, for every query? No. Also, 5a and 5b have nothing to do with memory management, and everything to do with the size of the dataset you have to pull into memory to identify and return the results. Knowing how this affects the buffer pool LRUs, disk usage patterns, and how to optimize the interaction between the two can be vital.
Look, you're doing the equivalent of looking at a car and saying that every single one needs to be hand tuned.
Just because you're a mechanic.
I said there is a bar when you will need a DBA. It is much, much, much higher than you're making out.
And the things you're talking about, most of the people mucking around with them probably shouldn't be. They've probably made it worse. Any the tiny few who actually need to? They're a tiny few.
"DBA often never has sufficient domain knowledge"
"often never"?
I've met some DBAs who were quite versed in the business domain. And some that weren't. This is no different to the majority of the developers I run in to, so I'm not sure why you're drawing a line there, except that the article was about database stuff.
So basically train them to be DBAs?
No, train them to use the database properly. A DBA is the person who keeps the databse running, much like a sysadmin is the person who keeps the server running.
That's a naive view of both DBAs and sysadmins these days. They both do much more than just keep things running. Sysadmin has mostly turned into opsdev. Same with DBA.
Sysadmin has mostly turned into opsdev.
Not in places large enough to value loose coupling (e.g., if keeping your servers running comes out of a different budget than improving your applications).
Opsdev != dev
If you are willing to agree that's what constitutes a DBA, then I don't see much value to the idea of hiring a "dedicated" DBA.
(I don't personally accept the first clause, but I also agree that hiring a dedicated specialist is now something only a rarified few need to do, really.)
Database administration - is not a fire and forget task. It dismays me greatly that so many developers do not see beyond the code in their IDE, and think they can just bring a DBA for a few days, and all will be well. It may also surprise some developers just how much return on investment a good DBA can bring - they can and do learn about the business, the data, the processes and can then help get the best from the database as a result of that knowledge. But they can also serve as an SME on the database engine technology, perhaps pointing out where it is not being used in the right or optimal way. The biggest gains in performance generally come when they help a team of developers who were treating the database as a dumb data store and not making any use of the features offered.
> if your app uses a database, you are incompetent not to employ an expert to help with the database, whether advising on the query plan of those non-performant queries, or what is the best setup for the current stage of the business and app, they are very useful
Given modern databases and modern hardware, I think the vast majority of applications never reach the point where query performance is an issue. For many small software companies hiring an expert in databases makes no more sense than hiring an expert in operating systems or networking - unless your needs are very specialized, these things work well enough out of the box.