
The recent RA3 instances seem to overlap this niche though. I (again, based solely on my hands-off research) would choose Spectrum when the majority of my data is in S3, which would typically be for the larger data sets. The rest of that answer is good and I do not mean to directly copy any of that here (without references it hadn't registered with me when I wrote this). I wrote this answer because I wasn't satisfied with the leading answer's treatment of Athena outperforming Redshift Spectrum. I appreciate this information might only be useful for the exam, I didn't find his argument convincing.
#Aws athena vs redshift pro#
I had learned (from Adrian Cantril's/LA's 2019 SA Pro course) that Redshift Spectrum would use one's own Redshift cluster to provide more consistent performance than is available by leveraging the shared capacity which AWS makes available to Athena queries. However, if you are beginning to explore options then we can consider Athena as a tool to go ahead. If you are using Redshift database then it will be wise to use Spectrum along with redshift to get the required performance. But it has still a long way to go to be mature. BTW Athena comes with a nice REST API, so go for it you want that.Īll to say Redshift + Redshift Spectrum is indeed powerful with lots of promises.As Spectrum is still a developing tool and they are kind of adding some features like transactions to make it more efficient.So Redshift Spectrum is not an option without Redshift. Access to Spectrum requires an active, running Redshift instance. You are a new user and don't have Redshift cluster.Spark unloading of your data and if you just want to import data to Pandas or any other tools for analyzing.You want to move colder data to an external table but still, want to join with Redshift tables in some cases. You are an existing Redshift user and you want to store more data in Redshift. I suggest you use Redshift spectrum in the following cases: If you are a Redshift user, making your storage cheaper makes your life so much easier basically. Again Redshift isn't that horizontally scalable and it takes some downtime in case of adding new machines. But creating your Reshift cluster and storing data was a bottleneck. Many analytics tools don't support Athena but support Redshift at this time. But they wanted to make life easier for Redshift users, mostly analytics people. So, AWS folks wanted to create an extension to Redshift (which is pretty popular as a managed columnar datastore at this time) and give it the capability to talk to external tables(typically S3). Then comes the question of what is Redshift Spectrum and why Amazon folks made it when Athena was pretty much a solution for external table queries? Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. This question has been up for quite a time, but still, I think I can contribute something to the discussion.Īmazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

ODBC but many more products offer "standard out of the box"Īlso, for either solution, make sure you use the AWS Glue metadata, rather than Athena as there are fewer limitations. Its easy enough to connect to Athena using API,JDBC or Athena is derived from presto and is a bitĭifferent to Redshift which has its roots in postgres.

You may find one much cheaper than the other This is the major difference and depending on your use case If you do not need those things then you should consider Athena as wellĪthena differences from Redshift spectrum: Able to join Redshift tables with Redshift spectrum tables.I have used both across a few different use cases and conclude:
