Settings

Theme

Google App Engine's Datastore Admin is Terribly Inefficient

marram.posterous.com

48 points by marram 14 years ago · 17 comments

Reader

peterknego 14 years ago

A GAE datastore delete takes multiple operations because it also updates indexes:

1 entity delete = 2 Writes + 2 Writes per indexed property value + 1 Write per composite index value

All from this page: http://code.google.com/appengine/docs/billing.html#Billable_... And more about why it is so: http://code.google.com/appengine/articles/life_of_write.html

Well, the OP is just another coder who can't read docs, but can write a blog.

  • theli0nheart 14 years ago

    Regardless of how decent Google's AppEngine documentation is, this is indeed a bug.

    The correct behavior would be to recalculate the indices just once, instead of reindexing after every single delete operation.

    It then becomes

        2*entities + 2*indexed property values + composite index values
    
    operations to delete all entities in the datastore, instead of

        2*entities + 2*entities*indexed property values + entities*composite index values
    
    operations.
    • tantalor 14 years ago

      To delete all entities should be free. Who cares about indexes? `rm -rf`, done.

  • easy_rider 14 years ago

    In any case, good that something is pointed out that can and will be easily overlooked :)

    • marramOP 14 years ago

      I deleted all indices referencing those entities before starting the entity deletions.

stickfigure 14 years ago

This is something that a lot of GAE developers misunderstand: put()ing a datastore entity is not a single write operation. There are indexes to update - in your case, lots of them - and updating these indexes can require several write operations. A simple delete is one write per index but changing a value can be two operations; one to delete the old index value and one to write the new one. And since each property has two indexes (ascending and descending), these numbers are X2.

If you create your own bulk delete method, you will find that it takes exactly as many write ops as the admin console tool.

You probably have defined more indexes on your entities than you need to - you will likely be able to make your app cheaper by removing unnecessary indexes. Managing indexes carefully is a critical part of making apps affordable on GAE.

  • marramOP 14 years ago

    I had "vaccumed" all indices referencing those entities before issuing a delete. Albeit, there was only once index per purged entity type. So this would not explain the 20x write operations.

    Also, note that the deletions were through the "Datasore Admin" app, which was recently added. It is different from the classic Datastore Viewer.

    • stickfigure 14 years ago

      You misunderstand how GAE indexes work.

      There are two kinds of indexes:

      * multi-property indexes which you configure via datastore-indexes.xml (or yaml). You can remove these by removing them from the xml/yaml and vacuuming.

      * single-property indexes, which you decide when you define your data model. You can't vacuum these, and they are defined on a per-entity basis. The only way to make them go away is to re-save the relevant entities without the index defined. Note: multiproperty indexes require single-property indexes on all the properties covered.

      These single-property indexes are almost certainly causing your high write op counts. You really should examine your data model with this new understanding; by removing unnecessary single-property indexes, you may be able to dramatically reduce your bill.

latchkey 14 years ago

Should Google refund developers when they make an uninformed decision that costs them money?

One could argue it is a bug in GAE that allows developers to make an expensive mistake when they don't fully understand how something (fairly complicated) works.

Someone else could argue that we are all developers and we should know the costs associated with the systems we are building. There is a real cost associated with PaaS systems like GAE.

What do you think?

cr4zy 14 years ago

I'm pretty sure this uses the map reduce API which has a lot of overhead in the datastore. In principle map reduce is nice because it could make very large jobs fast. But since Google engineers don't pay for anything, they optimized for time, not cost.

And with regards to your script, you can't just delete 3k keys in one request. If you want I'll send you the script I've adapted for jobs that make large changes to the datastore.

  • sirn 14 years ago

    From my experience purging data via MapReduce API use a lot less write quota than admin interface (but with a bit of instance hour overhead which doesn't seems like a problem)

    I can't remember the exact number but it was about 10 times less than deleting via admin interface and finish in 5 minutes rather than 3 hours.

  • marramOP 14 years ago

    I meant that I needed 3k requests to finish the job, deleting 1k entities in each request :).

ch0wn 14 years ago

I ran into the same issue. If you want to purge all data from an app, it's much cheaper (and sometimes even faster) to start over and create a complete new app with an empty data store than to use the data store admin and delete the data from there.

ecksor 14 years ago

number of writes also depends on the number of indexes you have on the data

Maven911 14 years ago

Can somebody explain the article in laymen terms ? For those not too familiar with GAE...

tnuc 14 years ago

There are plenty of things that are wrong with Google App Engine. And there are plenty of bugs that exist that have cost me money.

Why don't you try filing a bug report/suggesting a warning and send an email requesting something of a refund. They tend to be a friendly bunch who give refunds to obvious problems.

Moving to AWS will of course save you lots of money in the longer term, depending on what your hosting requirements are.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection