Webinar - Approaching 1 billion documents with MongoDB

5 min read Original article ↗

More Related Content

MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents

Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger

Lessons Learned Migrating 2+ Billion Documents at Craigslist

Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...

Fusion-io and MySQL at Craigslist

Understanding and tuning WiredTiger, the new high performance database engine...

Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...

What's hot

GlusterFS As an Object Storage

Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)

MySQL And Search At Craigslist

Setting up mongodb sharded cluster in 30 minutes

Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...

MongoDB Memory Management Demystified

Making the case for write-optimized database algorithms / Mark Callaghan (Fac...

[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...

Avoiding Data Hotspots at Scale

Update on Crimson - the Seastarized Ceph - Seastar Summit

Realtime Search Infrastructure at Craigslist (OpenWest 2014)

Performance tuning in BlueStore & RocksDB - Li Xiaoyan

Evaluation of RBD replication options @CERN

Sphinx at Craigslist in 2012

RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai

Redis persistence in practice

Viewers also liked

You know, for search. Querying 24 Billion Documents in 900ms

Living with SQL and NoSQL at craigslist, a Pragmatic Approach

MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)

Midas - on-the-fly schema migration tool for MongoDB.

Probabilistic algorithms for fun and pseudorandom profit

MongoDB for Time Series Data

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management

Scaling massive elastic search clusters - Rafał Kuć - Sematext

Similar to Webinar - Approaching 1 billion documents with MongoDB

MongoDB Best Practices in AWS

Deployment Strategies (Mongo Austin)

Evaluating NoSQL Performance: Time for Benchmarking

MongoDB and AWS Best Practices

Keeping MongoDB Data Safe

Optimizing MongoDB: Lessons Learned at Localytics

MongoDB: Advantages of an Open Source NoSQL Database

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB

MongoDB and server performance

Andy Parsons Pivotal June 2011

More from Boxed Ice

MongoDB Tokyo - Monitoring and Queueing

MongoUK 2011 - Rplacing RabbitMQ with MongoDB

MongoDB - Monitoring and queueing

MongoDB - Monitoring & queueing

Monitoring MongoDB (MongoUK)

Monitoring MongoDB (MongoSV)

MongoUK - PHP Development

MongoUK - PHP Development

Recently uploaded

Quick Wins with Slackbot - Slack Community Patna

Google Data Center Security: Physical Security, Access Control & 24/7 Monitor...

Building a Central Data Hub with FME: Repower’s Path to a Digital Twin

Smartcomply & Techcabal - AI & the Cyber Frontier East Africa Report 2026

"AWS Audit-Driven SRE: turning findings into measurable improvements at scale...

Dev Dives: Build production-ready process apps with pro-code & AI

SRE Made Easy: Comprehensive Guide to Site Reliability Engineering and Servic...

Presentation - How Google Search Works (3).pdf

Top 8 AI Virtual Dressing Room Tools in 2026

Enabling high-precision GNSS positioning using existing grandmaster infrastru...

NANOTECHNOLOGY, ITS MAIN APPLICATIONS AND IMPACTS ON SOCIETY.pdf

Risk to Patient Delta, Excipients in Digital Twins and Pharma 5.0 - NIPTE 2030

Renewable Energy Technology: Modern Techniques for a Sustainable Future

Databricks Demystified_ Unleashing the Power of Unified Data & AI for the Mod...

TrustArc Webinar - How Leading Teams Run and Prove ROI from Privacy Operations

Towards Better JVM Performance Presentation

Enhancing Content Moderation with Dual-Embedding Trust Scoring Using LLM Summ...

How the Internet Works: Complete Guide to Internet, Search Engines, Crawling,...

DIY Arduino Game Controller: Complete Build Guide & Code for PC Gaming

Inside Google’s Digital Fortress: Data Center Security, Cloud Infrastructure ...

Webinar - Approaching 1 billion documents with MongoDB

  • 1.

    Approaching 1 Billion Documents in MongoDB David Mytton 1/25 david@boxedice.com / www.mytton.net

  • 2.
  • 3.

    db.stats() Documents 981,289,332 Collections 47,962 Indexes 39,684 Data size 369GB Index size 241GB 3/25 As of 25th Apr 2010.

  • 4.
  • 5.

    Initial Setup Replication Master Slave DC1 DC2 8GB RAM 8GB RAM 5/25

  • 6.

    Vertical Scaling Replication Master Slave DC1 DC2 72GB RAM 8GB RAM 6/25

  • 7.

    Tip #1 Keep your indexes in memory at all times. db.stats() 7/25

  • 8.

    Manual Partitioning Replication Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replication Master B Slave B DC1 DC2 8/25 16GB RAM 16GB RAM

  • 9.

    Database vs collections • Many databases = many data files (small but quickly get large). • Many collections = watch namespace limit. 9/25

  • 10.
  • 11.

    Tip #2 Monitor the 24,000 namespace limit. 11/25

  • 12.
  • 13.

    Console db.system.namespaces.count() 13/25

  • 14.

    Replica Pairs =Failover Replica Pair Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replica Pair Master B Slave B DC1 DC2 14/25 16GB RAM 16GB RAM

  • 15.

    Tip #3 Pre-provision your oplog files. 15/25

  • 16.

    A shell scriptto generate 75GB oplog files for i in {0..40} do echo $i head -c 2146435072 /dev/zero > local.$i done 16/25

  • 17.

    Tip #4 Expect slower performance during initial replica sync. 17/25

  • 18.

    Tip #5 You can rotate your log files from the console. 18/25

  • 19.
  • 20.

    Tip #6 Index creation blocks by default. Use background indexing if necessary. 20/25 MongoDB Manual: http://bit.ly/mongobgindex

  • 21.

    Tip #7 Increase your OS file descriptor limit + use persistent connections. 21/25

  • 22.

    Too many openfiles! /etc/security/limits.conf mongo hard nofile 10000 mongo soft nofile 10000 user type limit /etc/ssh/sshd_config UsePAM yes 22/25

  • 23.
  • 24.

    Tip #8 10gen commercial support is worth paying for. 24/25

  • 25.

    Summary 1. Keep indexes in memory. 2. Monitor the 24k namespace limit. 3. Pre-provision oplog files. 4. Expect slower performance on replica sync. 5. Rotate logs from the console. 6. Index creation blocks by default. 7. OS file descriptor limit + persistent connections. 25/25 8. Commercial support is worth it.