Getting Started
with
LR Data Services

Firefox Recommended
We notice you're not using Firefox. Some of the "Try It" features are only supported Firefox as they require a feature, E4X, that is present only in Mozilla's Javascript engine, which CouchDB uses. Please use Firefox if you wish to use "Try It" to it's fullest.

What Was Data Services?

What Is LR Data Services Now?

  1. A Way to Get the Data You Want

    Only interested in one specific kind of data? Data Services aims to provide a way to extract the data that's relevant to you through simple customization.

  2. It's A Design Pattern, With Some "Batteries Included".

    • Follow simple conventions to identify discriminators within the LR documents
    • Reuse or modify community-sourced libraries to aid in extracting data for your use case
    • Install your data service code into a LR Node's CouchDB
    • Access your custom solution via the extract HTTP service API installed on the node.
  3. A Path Towards Making LR use Fewer Resources

    We know LR can potentially hold mountains of data, and has a few data extraction services that utilize loads of storage space. This is a way to extract data in a very focused manner, which will result in substantial storage savings.

What Is a Discriminator?

Discriminators in Data Services are the characteristics of the data that you've identified that you want to capture.

What Can Data Services Do?

Following the conventions outlined, you will be able to extract data using:

The Alignment to Standards Prototype Example

The prototype implementation can provide a Data Service that will allow use of the extract service to request resource data and aggregations where ASN's are Discriminators, `resource_locator`'s are Resource resource_locator, and `node_timestamp`'s are Timestamps, which will enable us to get:

Conventions

To follow the K.I.S.S principle, we're adopting a set of conventions to make adding new data services simple. These conventions are part of a CouchDB design documnent.

{
    "_id": "_design/standards-alignment",
    "dataservice": { 
        "name": "Standards Alignment Data Service",
        "description": "This is where I would document how this data service works."
    },
    "views": { 
        "discriminator-by-resource": { 
            "map": "function (doc) { emit([doc.resource_locator, getDescriminator(doc), getEpochTimestamp(doc)], null); }",
           
        },
        "discriminator-by-resource-ts": { 
            "map": "function (doc) { emit([doc.resource_locator, getEpochTimestamp(doc), getDescriminator(doc)], null); }",
             
        },
        "resource-by-discriminator": { 
            "map": "function (doc) { emit([getDescriminator(doc), doc.resource_locator, getEpochTimestamp(doc)], null); }",
             
        },
        "resource-by-discriminator-ts": {
            "map": "function (doc) { emit([getDescriminator(doc), getEpochTimestamp(doc), doc.resource_locator], null); }",
             
        },
        "resource-by-ts": {
            "map": "function (doc) { emit([getEpochTimestamp(doc), doc.resource_locator], null); }",
             
        },
        "discriminator-by-ts": {
            "map": "function (doc) { emit([getEpochTimestamp(doc), getDescriminator(doc)], null); }",
             
        } 
    },
    "lists": { "to-json": "function(head, req) { ... }" }
}

How Map Functions Work

In LR, each record is JavaScript object, much like the one displayed below. The map function will process each individual object once for each view.

A Resource Data Document

How Map Functions Work: Try It

A map function takes takes one argument, the Resource Data Document. The function should perform the following tasks against the document:

  • Evaluate the document to determine if the document should be included in the index. Does it meet some set of criteria specific to your use case?
  • If the document should be included, define the structure of the keys and values in the index. Remember to follow the key conventions defined earlier.

Below is a sample implementation of a data services map function. Try editing and click run to learn what get's created in the index.

show sample resource data

The Extract Service Output Specification

The basic JSON format for the extract service will is as specified in JSON Schema Internet Draft.

Data Service Response Wrapper
Data Service Result

The Extract Service Example Output

This is an example of the data service output with doc_ID's being returned.

List Functions in Detail

List functions are responsible for formatting the response for the service output.

As the previous slides defined and displayed there are two basic parts to the response, the response wrapper, and the results.

Data Service Response Wrapper

The list function will need to group each result in the "documents" property of the response wrapper.

Data Service Result

Review of Prototype List Implementation

List functions group results and embed results into the Extract service's format for records in the documents list. Because the amount of data requiring processing could be large, we must try to always design list functions to buffer little and mostly stream.

Customize or Make your Own List Function

The prototype implementation should suffice as an adequate skeleton for making enhancing the prototype or building your own from scratch

The Extract Service

Provides a common HTTP interface to access data services with simplified parameters.

The Extract Service HTTP Request Format

The Basics

GET /extract/<data service name>/<view name>[?<DS Query Params>]

Custom List functions (AKA Roll Your Own Output Format)

GET /extract/<data service name>/<view name>/format/<your function name>[?<DS Query Params>]

The DS Query Params

ParameterDescription
fromISO 8601 formatted timestamp for start range.
untilISO 8601 formatted timestamp for end range.
resourceThe resource locator you wish to harvest data.
discriminatorThe discriminator you wish to harvest data.
resource-starts-withA partial resource locator you wish to harvest data that uses the specified value as a prefix. (i.e. resource-starts-with=http://shodor.org will return all resources from http://shodor.org.
discriminator-starts-withThe partial discriminator you wish to harvest data to be used for find the range that uses the specified value as a prefix.
ids_onlyPresence of the value will cause the resource_data values to be a list of doc_ID's instead of full resource_data documents (default behavior)

Extract Service Parameter Matrix

Data services aims to allow you to narrow in on your data. So not all parameters work as expected with all data service views. Here is a breakdown of what parameters work together with specific view. Remember not to be concerned about the characteristics of the data returned, because the map functions have already taken care of that for you. If the data-service map functions only emit keys for documents that contain the word "exciting", that means anything this service returns will be at the very least, "exciting". The parameters just control how much to return.

ViewParameter SetDescription
discriminator-by-resourceresourceGet a list of discriminators for a specific resource locator.
discriminator-by-resourceresource-starts-withGet a list of discriminators that where the resource locator starts with a specified prefix.
discriminator-by-resource-tsresource, from, untilGet a list of discriminators for a specific resource locator between for a specified period of time.
discriminator-by-resource-tsresource-starts-withGet a list of discriminators that where the resource locator starts with a specified prefix, include the timestamp in the result
discriminator-by-tsfrom, untilGet a list of discriminators for a specified time period.
resource-by-discriminatordiscriminatorGet a list of resource locators for a specified discriminator.
resource-by-discriminatordiscriminator-starts-withGet a list of resource locators that start with the specified discriminator as a prefix.
resource-by-discriminator-tsdiscriminator, from, untilGet a list of resource locators that for a specified discriminator for a specified period of time.
resource-by-discriminator-tsdiscriminator-starts-withGet a list of resource locators that start with the specified discriminator as a prefix. Timestamps are included in the output.
resource-by-tsfrom, untilGet a list of resource locators for a specified period of time.

Extract Service API - Try It

Here are some example requests that you can try against a real data service install. Click on the line to populate the example into input box or edit by hand. Click Run to execute the request. This can take a bit of time, and cause your browser to complain. It's okay to click "Wait" or "Continue" if prompted by your browser until it completes.

GET

Next Steps

/

#