DynamoDB
and
Cassandra



Oleksandr Polieno
Amazon DynamoDB Apache Cassandra
Disaributed
Key/value (+columns) storage
DaaS Open source
Managed by AWS Requires DevOps
Scalability, high availability
<300ms response (99.9%) Fatest writes
boto, pynamodb python cassandra driver

Roadmap

  1. DynamoDB and Cassandra under the hood
  2. DynamoDB and Cassandra in action
  3. Conclusions

1. DynamoDB and Cassandra under the hood

1.1. Primary keys

  • Hash and Range keys
  • Unique (incremental, natural, UUID)
  • Query vs Scan
  • Iteration through table
  • Compaund aka composite keys

1.2. Items count

  • Query count
  • Implement it on the application side

1.3. Secondary indexes

  • Local / Global
  • One more table / more space required / heavier writes
  • Projections
  • Sparse, Low cardinality
  • Limitations

1.4. Schema

  • Primary keys
  • Indexes
  • Can't be changed after table creation

1.5. Columns

  • Item size 400 KB for DynamDB vs 2GGB for Cassandra
  • Keep item size small
  • Column types
  • Cassandra: Counters, list/map/set, TimeUUID, ascii vs varchar
  • TTL (Cassandra only)

1.6. Eventual consistency

  • ACID vs CAP (Consistency, Availability, Partition-tolerance)
  • Eventual consistency != inconsistency
  • Strong consistency is available

1.7. Write, write, write or why Cassandra is fastest on writes

  • Write on update and delete
  • Blind write
  • Conflicts (last write win)
  • Compaction

1.8. Normalization vs denormalization

Normalization:

  • Relations, joins, complex queries
  • Constrains
  • All data in one place

Denormalization:

  • ~endless capacity
  • Smart indexes
  • Static field as join replacement
  • Fast write
  • Value == null does not use disk space

1.9. Availability

  • Masterless architecture
  • Data duplication
  • Virtual nodes
  • replication between datatacenters
  • highly available for writes

1.10. Databases

  • DynamoDB: None, IAM service
  • Cassandra: Namespaces

1.11. Users, authorization

  • DynamoDB: IAM service (per table, get/modification)
  • Cassandra: Users and roles (select/modify)

1.12. Interface

  • DynamoDB: REST API + Client library
    PutItem:
    {
        "TableName": "users",
        "Item": {
            "user_id": {
                "N": "123",
            },
            "username": {
                "S": "ryuko",
            }
        }
    }
  • Cassandra: Cassandra driver + CQL
    USE application_namespace;
    INSERT INTO users(user_id, username) VALUES (123, 'ryuko');

2. DynamoDB and Cassandra in action

2.1. DynamoDB prices

Provisioned throughput Price per month Price per hour
100 Write Capacity Units $150 $0.0128
100 Read Capacity Units $30 $0.0025

Read throughput 1 == one read request with response up to 4KB, per second.
Write throughput 1 == one write request with body size up to 1KB, per second.

* strong consitent reads cost twice more. * prices depends on region.

2.1. DynamoDB prices

Data storage Price per month per GB
First 25 GB $0.000
> 25 GB $0.25

2.1. DynamoDB prices

Data transfer per month Price per GB
First 1 GB $0.000
Up to 10 TB $0.090
Next 40 TB $0.085
...

2.2. Integration with other Amazon web services

  • S3
  • Lambda
  • CloudSearch
  • ...

2.3. Python libs

  • boto
  • botocore
  • pynamodb

2.4. Best practices

  • Queue for writes
  • Read retries and automatic throughput update
  • Gunicorn / Celery gevent/eventlet workers
  • Hot spots / throttled reads/writes / caching

2.5. Testing DynamoDB

DynamoDBLocal:

java -Djava.library.path=./bin/DynamoDBLocal_lib \
	-jar ./bin/DynamoDBLocal.jar -port 8010 -inMemory \
	# -dbPath ./bin/db.bin

2.5. Testing DynamoDB

DDB_LOCAL_URL = 'http://localhost:8010'


class DDBUserWalletTestCase(unittest.TestCase):

    @mock.patch('app.DDBUserWallet._get_endpoint_url')
    def test_update_and_get(self, _get_endpoint_url_mock):
        _get_endpoint_url_mock.return_value = DDB_LOCAL_URL
        user_wallet = DDBUserWallet()
        user_wallet.create_table()
        user_id = uuid.uuid4()
        for balance in [100, 123]:
            user_wallet.update(
            	user_id=user_id, balance=balance)
            self.assertEqual(
            	user_wallet.get(user_id=user_id)['balance'], balance)

2.6. Testing Cassandra

brew / apt-get install cassandra

2.6. Testing Cassandra

django-cassandra-engine:

from django_cassandra_engine.test import TestCase

from ..models import CassandraFeed


class ModelsTestCase(TestCase):

    def test_cassandra_feed(self):
        actor_id = uuid.uuid4()
        activity_id = uuid.uuid1()
        activity = CassandraFeed(
            actor_id=actor_id,
            space='my',
            activity_id=activity_id,
        )
        activity.save()
        self.assertIn(unicode(activity_id), unicode(activity))
        ...

3. Conclusions

3.1. Use or not? Decision tree

  • amout of data?
  • database life circle?
  • who will manage the cluster?
  • data flow (constant or not)?
  • AWS "addiction"?
  • features

3.2. Best suited for

  • Shopping cart
  • Feeds: Activities
  • Analytics
  • Toys store (predictable load)
  • Voting engine
  • Suggesters

Resources for Amazon DynamoDB

Resources for Apache Cassandra

Thank You!