This project has retired. For details please refer to its Attic page.
JSonSerde (Lens 2.1.0-beta-incubating API)

org.apache.lens.lib.query
Class JSonSerde

java.lang.Object
  extended by org.apache.lens.lib.query.JSonSerde
All Implemented Interfaces:
org.apache.hadoop.hive.serde2.Deserializer, org.apache.hadoop.hive.serde2.SerDe, org.apache.hadoop.hive.serde2.Serializer

public class JSonSerde
extends Object
implements org.apache.hadoop.hive.serde2.SerDe

This SerDe can be used for processing JSON data in Hive. It supports arbitrary JSON data, and can handle all Hive types except for UNION. However, the JSON data is expected to be a series of discrete records, rather than a JSON array of objects.

The Hive table is expected to contain columns with names corresponding to fields in the JSON data, but it is not necessary for every JSON field to have a corresponding Hive column. Those JSON fields will be ignored during queries.

Example:

{ "a": 1, "b": [ "str1", "str2" ], "c": { "field1": "val1" } }

Could correspond to a table:

CREATE TABLE foo (a INT, b ARRAY, c STRUCT);

JSON objects can also interpreted as a Hive MAP type, so long as the keys and values in the JSON object are all of the appropriate types. For example, in the JSON above, another valid table declaraction would be:

CREATE TABLE foo (a INT, b ARRAY, c MAP);

Only STRING keys are supported for Hive MAPs.


Constructor Summary
JSonSerde()
           
 
Method Summary
 Object deserialize(org.apache.hadoop.io.Writable blob)
          This method does the work of deserializing a record into Java objects that Hive can work with via the ObjectInspector interface.
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector getObjectInspector()
          Return an ObjectInspector for the row of data
 org.apache.hadoop.hive.serde2.SerDeStats getSerDeStats()
          Unimplemented
 Class<? extends org.apache.hadoop.io.Writable> getSerializedClass()
          JSON is just a textual representation, so our serialized class is just Text.
 void initialize(org.apache.hadoop.conf.Configuration conf, Properties tbl)
          An initialization function used to gather information about the table.
 org.apache.hadoop.io.Writable serialize(Object obj, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector oi)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JSonSerde

public JSonSerde()
Method Detail

initialize

public void initialize(org.apache.hadoop.conf.Configuration conf,
                       Properties tbl)
                throws org.apache.hadoop.hive.serde2.SerDeException
An initialization function used to gather information about the table. Typically, a SerDe implementation will be interested in the list of column names and their types. That information will be used to help perform actual serialization and deserialization of data.

Specified by:
initialize in interface org.apache.hadoop.hive.serde2.Deserializer
Specified by:
initialize in interface org.apache.hadoop.hive.serde2.Serializer
Parameters:
conf - the conf
tbl - the tbl
Throws:
org.apache.hadoop.hive.serde2.SerDeException - the ser de exception

deserialize

public Object deserialize(org.apache.hadoop.io.Writable blob)
                   throws org.apache.hadoop.hive.serde2.SerDeException
This method does the work of deserializing a record into Java objects that Hive can work with via the ObjectInspector interface. For this SerDe, the blob that is passed in is a JSON string, and the Jackson JSON parser is being used to translate the string into Java objects.

The JSON deserialization works by taking the column names in the Hive table, and looking up those fields in the parsed JSON object. If the value of the field is not a primitive, the object is parsed further.

Specified by:
deserialize in interface org.apache.hadoop.hive.serde2.Deserializer
Parameters:
blob - the blob
Returns:
the object
Throws:
org.apache.hadoop.hive.serde2.SerDeException - the ser de exception

getObjectInspector

public org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector getObjectInspector()
                                                                                 throws org.apache.hadoop.hive.serde2.SerDeException
Return an ObjectInspector for the row of data

Specified by:
getObjectInspector in interface org.apache.hadoop.hive.serde2.Deserializer
Throws:
org.apache.hadoop.hive.serde2.SerDeException

getSerDeStats

public org.apache.hadoop.hive.serde2.SerDeStats getSerDeStats()
Unimplemented

Specified by:
getSerDeStats in interface org.apache.hadoop.hive.serde2.Deserializer
Specified by:
getSerDeStats in interface org.apache.hadoop.hive.serde2.Serializer

getSerializedClass

public Class<? extends org.apache.hadoop.io.Writable> getSerializedClass()
JSON is just a textual representation, so our serialized class is just Text.

Specified by:
getSerializedClass in interface org.apache.hadoop.hive.serde2.Serializer

serialize

public org.apache.hadoop.io.Writable serialize(Object obj,
                                               org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector oi)
                                        throws org.apache.hadoop.hive.serde2.SerDeException
Specified by:
serialize in interface org.apache.hadoop.hive.serde2.Serializer
Throws:
org.apache.hadoop.hive.serde2.SerDeException


Copyright © 2014–2015 Apache Software Foundation. All rights reserved.