Přejít na menu

Hadoop

Správa článků

Vyhledávání Vyhledávání
9.10.2013 12:25
,
Počet přečtení: 622
Obrázek ke článku Hadoop

What is Hadoop and which tools are inside?

What is Hadoop?

  • environment built on the distributed filesystem
  • scalability (sending function to the data), robustness (redundancy)
  • bunch of open-source tools
  • typically provided as a service (because of its complicated managing)

Two main cornerstones of Hadoop

  • HDFS (Hadoop Distributed File System)
  • MapReduce paradigm - sending function to the data and collecting them, inspiration by pioneering

Other important tools

Tools for developing:

  • Map Reduce - see above
  • Hive - language with HQL (Hive Query Language), which is automatically transformed to map-reduce tasks. Read-only queries, high latention. Created in Facebook.
  • Pig - language and runtime, translation to map-reduce. Created in Yahoo.
  • Jaql
  • Mahout

Data storage and management tools

  • HDFS - see above
  • Cassandra - NoSQL (key-value) DB, alternative to HDFS, fast
  • HBase - no-relational DB on the top of HDFS, good for sparse data
  • HCatalog - Hadoop tables and storage management

Control tools

  • Zookeeper - controlling configurations, sync...
  • Oozie - jobs management

Data aggregation and mining

  • Sqoop, Chukwa, Flume

Article about hadoop (in czech)

 

Vytvořil 9. října 2013 ve 13:35:02 mira. Upravováno 4x, naposledy 7. prosince 2015 v 16:27:23, mira


Diskuze ke článku

Vložení nového komentáře
*
*
*