- Download and unpack it
- Add Hive to the system path by opening /etc/profile or ~/.bashrc and add the following two rows
- export HIVE_HOME=/home/yao/mysoft/apache-hive-2.1.0-bin
- export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf
- Enable the settings by executing this command
- source /etc/profile
- Create the configuration files
- cd conf
- cp hive-default.xml.template hive-site.xml
- cp hive-env.sh.template hive-env.sh
- cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
- cp hive-log4j2.properties.template hive-log4j.properties
- Modify the configuration file hive-site.sh
- replace <value>${system:java.io.tmpdir}/${system:user.name}</value> with <value>$HIVE_HOME/iotmp</value>
- replace <value>${system:java.io.tmpdir}/${hive.session.id}_resources</value> with <value>$HIVE_HOME/iotmp</value>
- replace <value>${system:java.io.tmpdir}/${system:user.name}</value> with <value>$HIVE_HOME/iotmp</value>
- you may need create the directory iotmp
- Modify hive-env.sh
- add these two
- export HADOOP_HOME=/home/yao/mysoft/hadoop-2.7.3
- export HIVE_CONF_DIR=/home/yao/mysoft/apache-hive-2.1.0-bin/conf
- Make sure Hadoop is running
- Run Hive
- $HIVE_HOME/bin/hiveserver2
- Run beeline
- $HIVE_HOME/bin/beeline -u jdbc:hive2://
Sunday, November 13, 2016
Hive installation
Thursday, October 17, 2013
Hive, Pig and HBase
Hive is best suited for data warehouse applications, where real-time responsiveness to queries and record-level inserts, updates, and deletes are not required.
Pig is described as a data flow language, rather than a query language. In Pig, you write a series of declarative statements that define relations from other relations, where each new relation performs some new data transformation. Pig looks at these declarations and then builds up a sequence of MapReduce jobs to perform the transformations until the final results are computed the way that you want. This step-by-step “flow” of data can be more intuitive than a complex set of queries. For this reason, Pig is often used as part of ETL (Extract, Transform, and Load) processes used to ingest external data into a Hadoop cluster and transform it into a more desirable form.
HBase is a distributed and scalable data store that supports row-level updates, rapid queries, and row-level transactions (but not multirow transactions).