Data engineering ecosystem
𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 is a role that requires building systems to process data efficiently and to model the data to power analytics.These solutions can range from batch data pipelines real-time processing services that integrate with core product features.
𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 Involves a rich understanding of large distributed systems on which data solutions rely.
it makes it possible to take vast amounts of data, translate it into insights, and focus on the production readiness of data and things like formats, resilience, scaling, and security.
𝗕𝗮𝘁𝗰𝗵 - processing a large volume of data all at once.
𝗦𝘁𝗿𝗲𝗮𝗺- processing of a continuous stream of data immediately as it generates.
𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀: SQL and NoSQL their schemas are predefined or dynamic, how they scale, the type of data they include and whether they are more fit for multi-row transactions or unstructured data.
𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗲: it centralizes and consolidates large amounts of data from multiple sources.
𝗠𝗲𝘀𝘀𝗮𝗴𝗲 𝗤𝘂𝗲𝘂𝗲 - A pub-sub
𝗙𝗶𝗹𝗲 𝗙𝗼𝗿𝗺𝗮𝘁: designed for efficient data storage and retrieval
𝗦𝗲𝗮𝗿𝗰𝗵 𝗲𝗻𝗴𝗶𝗻𝗲- Mostly based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface
𝗜𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝘃𝗲 𝗾𝘂𝗲𝗿𝘆: Fast and Reliable SQL Engine for Data Analytics
𝗗𝗲𝗹𝘁𝗮 𝗟𝗮𝗸𝗲 :
𝗖𝗹𝗼𝘂𝗱 𝗣𝗿𝗼𝘃𝗶𝗱𝗲𝗿 :
𝗦𝘁𝗼𝗿𝗮𝗴𝗲 : Massive amount of information gets store
𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁: easy to write, schedule, and monitor workflows.
𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲 𝗠𝗮𝗻𝗮𝗴𝗲𝗿 - distributed resource management that manages resources among all the applications in the system
𝗙𝗮𝘀𝘁 𝗜𝗻𝗴𝗲𝘀𝘁𝗶𝗼𝗻: ingest massive quantities of event data and provide low-latency queries
𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴: Cluster and system monitoring
𝗖𝗼𝗻𝘁𝗮𝗶𝗻𝗲𝗿𝘀 𝗮𝗻𝗱 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻