Source: Amazon Web Services

AWS Data Ingesting Tools: DMS & Kinesis

Sandeep Bansal
3 min readJan 3, 2022

--

A gentle introduction to DMS & AWS Kinesis for Data Ingestion

The first step in building big data analytic solutions is to ingest data from a variety of sources into AWS. Here I introduce a few tools that are commonly used for data ingestion.

Overview of Amazon Database Migration Service:

Amazon DMS is a versatile tool that can be used to migrate existing database systems to a new database engine such as migrating an existing Oracle database to an Amazon Aurora with PostgreSQL compatibility database. But from an analytics perspective, AmazonDMS can also be used to run continuous replication from a number of common database engines into S3 data lake.

When to use: Amazon DMS simplifies migrating from one database engine to a different database engine or syncing data from an existing database to Amazon S3 on an on-going basis.

When to not use: If you’re looking to sync an on-premises database to the same engine in AWS it is often better to use native tools from that database engine. DMS is primarily designed for heterogenous migrations meaning one database engine to a different database engine.

Overview of Amazon Kinesis for Streaming data ingestion:

Amazon Kinesis is a managed service that simplifies the process of ingesting and processing streaming data in real time, or near real time. There are a number of different use cases that Kinesis can be used for, including ingestion of streaming data (such as log files, website clickstreams, or IoT data), as well as video and audio streams. Before we go into further detail lets summarize some of the Kinesis services:

  • Kinesis Data Firehose: Ingests streaming data, buffers for a configurable period, then writes out to a limited set of targets (S3, Redshift, ElasticSearch, Splunk, and others).
  • Kinesis Data Streams: Ingests real-time data streams, processing the incoming data with a custom application and low latency.
  • Kinesis Data Analytics: Reads data from a streaming source and uses SQL statements or Apache Flink code to perform analytics on the stream.
  • Kinesis Video Steams: Processes streaming video or audio streams, as well as other time-serialized data such as thermal imagery and RADAR data

When to use: Amazon Kinesis Firehose is the ideal choice for when you want to receive streaming data, buffer that data for a period, and then write the data to one of the targets supported by Kinesis Firehose (such as Amazon S3, Amazon Redshift, Amazon ElasticSearch, or a supported third-party service).

When to use: Amazon Kinesis Data Streams is ideal for use cases where you want to process incoming data as it is.

When to use: Amazon Data Analytics If you want to use SQL expressions to analyze data or extract key metrics over a rolling time period, Kinesis Data Analytics significantly simplifies this task. If you have an existing Apache Flink application that you want to migrate to the cloud, consider running the application using Kinesis Data Analytics.

When to use Amazon Kinesis Video Streams: When creating applications that use a supported source, Kinesis Video Streams significantly simplifies the process of ingesting streaming media data and enabling live or on-demand playback.

--

--

Sandeep Bansal

A clumsy hard working goof & a contributing Author to Analytics Vidya; A leading community of Analytics, Data Science and AI professionals