jaql

{{Short description|Functional data processing and query language}}

{{Infobox programming language

| logo =

| logo caption =

| screenshot =

| screenshot caption =

| paradigm = Functional

| family =

| designer = Vuk Ercegovac (Google)

| developer =

| released = {{Start date and age|2008|10|09}}

| latest release version = 0.5.1

| latest release date = {{Start date and age|2010|07|12}}

| typing =

| scope =

| programming language = Java

| discontinued =

| platform =

| operating system = Cross-platform

| license = Apache License 2.0

| file ext =

| file format =

| website = {{url|code.google.com/p/jaql/m}}

| implementations = IBM BigInsights

| dialects =

| influenced by =

| influenced =

}}

Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data.

It started as an open source project at Google[https://code.google.com/p/jaql/ Original Jaql project] but the latest release was on 2010-07-12. IBM[http://www.vldb.org/pvldb/vol4/p1272-beyer.pdf Initial Publication] took it over as primary data processing language for their Hadoop software package [http://www.ibm.com/software/data/infosphere/biginsights/ BigInsights].

Although having been developed for JSON it supports a variety of other data sources like CSV, TSV, XML.

A comparison{{cite book | chapter-url=https://link.springer.com/chapter/10.1007/978-3-642-24151-2_5 | doi=10.1007/978-3-642-24151-2_5 | chapter=Comparing High Level MapReduce Query Languages | title=Advanced Parallel Processing Technologies | series=Lecture Notes in Computer Science | date=2011 | last1=Stewart | first1=Robert J. | last2=Trinder | first2=Phil W. | last3=Loidl | first3=Hans-Wolfgang | volume=6965 | pages=58–72 | isbn=978-3-642-24150-5 }} to other BigData query languages like PIG Latin and Hive QL illustrates performance and usability aspects of these technologies.

Jaql supports[http://www.havlena.net/en/tag/jaql/ JAQL in Hadoop, a brief introduction] lazy evaluation, so expressions are only materialized when needed.

Syntax

The basic concept of Jaql is

source -> operator(parameter) -> sink ;

where a sink can be a source for a downstream operator. So typically a Jaql program has to following structure, expressing a data processing graph:

source -> operator1(parameter) -> operator2(parameter) -> operator2(parameter) -> operator3(parameter) -> operator4(parameter) -> sink ;

Most commonly for readability reasons Jaql programs are linebreaked after the arrow, as is also a common idiom in Twitter [https://github.com/twitter/scalding Scalding]:

source -> operator1(parameter)

-> operator2(parameter)

-> operator2(parameter)

-> operator3(parameter)

-> operator4(parameter)

-> sink ;

=Core operators=

Source:[http://pic.dhe.ibm.com/infocenter/bigins/v2r0/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.biginsights.jaql.doc%2Fdoc%2Fc0057482.html IBM BigInsights Documentation]

==Expand==

Use the EXPAND expression to flatten nested arrays. This expression takes as input an array of nested arrays {{brackets|T}} and produces an output array {{bracket|T}}, by promoting the elements of each nested array to the top-level output array.

==Filter==

Use the FILTER operator to filter away elements from the specified input array. This operator takes as input an array of elements of type T and outputs an array of the same type, retaining those elements for which a predicate evaluates to true. It is the Jaql equivalent of the SQL WHERE clause.

Example:

data = [

{name: "Jon Doe", income: 20000, manager: false},

{name: "Vince Wayne", income: 32500, manager: false},

{name: "Jane Dean", income: 72000, manager: true},

{name: "Alex Smith", income: 25000, manager: false}

];

data -> filter $.manager;

[

{

"income": 72000,

"manager": true,

"name": "Jane Dean"

}

]

data -> filter $.income < 30000;

[

{

"income": 20000,

"manager": false,

"name": "Jon Doe"

},

{

"income": 25000,

"manager": false,

"name": "Alex Smith"

}

]

==Group==

Use the GROUP expression to group one or more input arrays on a grouping key and applies an aggregate function per group.

==Join==

Use the JOIN operator to express a join between two or more input arrays. This operator supports multiple types of joins, including natural, left-outer, right-outer, and outer joins.

==Sort==

Use the SORT operator to sort an input by one or more fields.

==Top==

The TOP expression selects the first k elements of its input. If a comparator is provided, the output is semantically equivalent to sorting the input, then selecting the first {{var|k}} elements.

==Transform==

Use the TRANSFORM operator to realize a projection or to apply a function to all items of an output.

See also

References

{{Reflist|2}}