Skip to content

Glue

PySpark extensions

Relationalize

  • Converts a DynamicFrame into a form that fits within a relational database.
  • Relationalizing a DynamicFrame is especially useful when you want to move data from a NoSQL environment like DynamoDB into a relational database like MySQL.
  • Example

Data Access & Security

  • AWS Glue resource policies can be used to control access to Data Catalog resources.

Alt text

Glue Crawler

  • AWS Glue can crawl data in different AWS Regions.
  • When you define an Amazon S3 data store to crawl, you can choose whether to crawl a path in your account or another account or another region.
  • Use one or more of the following methods to reduce crawler run times.
    • Use an exclude pattern
    • Run multiple crawlers
    • Combine smaller files to create larger ones

Alt text

Glue Data Catalog

Alt text

Glue support

AWS Glue job usually executes Apache Spark, Spark Streaming, or Python shell scripts only.

AWS Glue doesn't directly support Apache Hive.