BigData_DW_Real Document Overview
The document BigData_DW_Real.docx provides an extensive guide on big data processing architectures, covering both offline and real-time processing architectures. Additionally, it details the requirements overview and architectural design of a big data warehouse project.
Big Data Processing Architectures
Big data processing architectures are primarily classified into two types:
- Offline Processing Architecture
- Utilized for data post-analysis and data mining applications.
- Technologies: Hive, Map/Reduce, Spark SQL, etc.
- Advantages: Capable of handling large volumes of data.
-
Disadvantages: Slower processing speed, less sensitive to real-time demands.
-
Real-Time Processing Architecture
- Suited for real-time monitoring and interactive applications.
- Technologies: Spark Streaming, Flink.
- Advantages: High responsiveness for time-sensitive data.
- Disadvantages: Faster processing but limited to simpler business logic.
Big Data Warehouse Project Requirements
The big data warehouse project encompasses six key requirements:
- Daily Active Users: Analysis with hourly trends and daily comparisons.
- Daily New Users: Analysis with hourly trends and daily comparisons.
- Daily Transaction Volume: Analysis with hourly trends and daily comparisons.
- Daily Order Count: Analysis with hourly trends and daily comparisons.
- Shopping Coupon Risk Warning: Function for identifying potential risks.
- Flexible User Purchase Analysis: Customizable analysis functionality.
Architectural Design for Big Data Warehouse Project
- Main Project (gmall): Based on Spring Boot.
- Dependencies: Incorporates Spark, Scala, Log4j, Slf4j, Fastjson, Httpclient.
- Project Structure: Includes parent project, submodules, and dependencies.
Technology Versions:
- Spark: 2.1.1
- Scala: 2.11.8
- Log4j: 1.2.17
- Slf4j: 1.7.22
- Fastjson: 1.2.47
- Httpclient: 4.5.5
- Httpmime: 4.3.6
- Java: 1.8