Facebook Thrift Full Seminar Report, abstract and Presentation download

Advertisement
 Download your Full Reports for Facebook Thrift

Thrift is a software library and a set of code generation tool which was developed at the Facebook Office at Palo Alto, California, to expedite development and implementation of scalable and efficient backend services. The primary goal of thrift is enable efficient and reliable communication across programming languages by abstracting the portions of each language that tend to require the most customization into a common library that is implemented in each language. This is done by allowing the users to define the data types and service interfaces in a common Interface Definition Logic File (IDL File) which is supposed to be language neutral file and it generates all the necessary code to build Remote Procedure Calls to clients and servers. This report explains the design choices and implementation level details and also tries to demonstrate a sample Thrift Service.

 

The whole concept of Thrift stemmed out from the fact that a new direction was required to tackle the resource demands problems for many of Facebook's on-site applications, which couldn?t be addressed by staying within the LAMP framework. LAMP is the acronym for Linux, MySQL, Apache and PHP. When Facebook was being laboriously designed, it was done from ground up using this LAMP framework. By 2006 Facebook was widely accepted all over the world as the social networking site and consequently its network traffic also grew giving rise to the need for scaling its network structure for many of its onsite applications like, search, ad selection and delivery and event logging.

 

1.??? Introduction

As Facebook's traffic and network structure have scaled, the re-source demands of many operations on the site (i.e. search, ad se-lection and delivery, event logging) have presented technical re-quirements drastically outside the scope of the LAMP framework. In our implementation of these services, various programming lan-guages have been selected to optimize for the right combination of performance, ease and speed of development, availability of exist-ing libraries, etc. By and large, Facebook's engineering culture has tended towards choosing the best tools and implementations avail-able over standardizing on any one programming language and be-grudgingly accepting its inherent limitations.

Given this design choice, we were presented with the challenge of building a transparent, high-performance bridge across many programming languages. We found that most available solutions were either too limited, did not offer sufficient datatype freedom, or suffered from subpar performance. 1

The solution that we have implemented combines a language-neutral software stack implemented across numerous programming languages and an associated code generation engine that trans-forms a simple interface and data definition language into client and server remote procedure call libraries. Choosing static code generation over a dynamic system allows us to create validated code that can be run without the need for any advanced introspec-tive run-time type checking. It is also designed to be as simple as possible for the developer, who can typically define all the neces-sary data structures and interfaces for a complex service in a single short file.

Surprised that a robust open solution to these relatively common problems did not yet exist, we committed early on to making the Thrift implementation open source.

In evaluating the challenges of cross-language interaction in a net-worked environment, some key components were identified:

 

 

Types. A common type system must exist across programming lan-guages without requiring that the application developer use custom Thrift datatypes or write their own serialization code. That is, a C++ ?
programmer should be able to transparently exchange a strongly typed STL map for a dynamic Python
dictionary. Neither program-mer should be forced to write any code below the application layer to

 

Transport. Each language must have a common interface to bidirec-tional raw data transport. The specifics of how a given transport is implemented should not matter to the service developer. The same application code should be able to run against TCP stream sockets, raw data in memory, or files on disk. Section 3 details the Thrift Transport layer.

Protocol. Datatypes must have some way of using the Transportlayer to encode and decode themselves. Again, the application developer need not be concerned by this layer. Whether the service uses an XML or binary protocol is immaterial to the application code. All that matters is that the data can be read and written in a consistent, deterministic matter. Section 4 details the Thrift Protocol layer.

Versioning. For robust services, the involved datatypes must pro-vide a mechanism for versioning themselves. Specifically, it should be possible to add or remove fields in an object or alter the argu-ment list of a function without any interruption in service (or, worse yet, nasty segmentation faults). Section 5 details Thrift's versioning system.

Processors. Finally, we generate code capable of processing datastreams to accomplish remote procedure calls. Section 6 details the generated code and TProcessor paradigm.

 

 

 

 

2.??? Types
The goal of the Thrift type system is to enable programmers to develop using completely natively defined types, no matter what programming language they use. By design, the Thrift type system does not introduce any special dynamic types or wrapper objects. It also does not require that the developer write any code for object serialization or transport. The Thrift IDL (Interface Definition Lan-guage) file is logically a way for developers to annotate their data structures with the minimal amount of extra information necessary to tell a code generator how to safely transport the objects across languages.

2.1??? Base Types

The type system rests upon a few base types. In considering which types to support, we aimed for clarity and simplicity over abun-dance, focusing on the key types available in all programming lan-guages, ommitting any niche types available only in specific lan-guages.

The base types supported by Thrift are:
bool A boolean value, true or false byte A signed byte

i16 A 16-bit signed integer i32 A 32-bit signed integer i64 A 64-bit signed integer

double A 64-bit floating point number
string An encoding-agnostic text or binary string
Of particular note is the absence of unsigned integer types. Because these types have no direct translation to native types in many languages, the advantages they afford are lost. Further, there is no way to prevent the application developer in a language like Python from assigning a negative value to an integer variable, leading to unpredictable behavior. From a design standpoint, we observed that unsigned integers were very rarely, if ever, used for arithmetic purposes, but in practice were much

more often used as keys or identifiers. In this case the sign is irrelevant. Signed integers serve this same purpose and can be safely cast to their unsigned counterparts (commonly in C++) when absolutely necessary.

 

2.2??? Structs

A Thrift struct defines a common object to be used across lan-guages. A struct is essentially equivalent to a class in object ori-ented programming languages. A struct has a set of strongly typed fields, each with a unique name identifier. The basic syntax for defining a Thrift struct looks very similar to a C struct definition. Fields may be annotated with an integer field identifier (unique to the scope of that struct) and optional default values. Field identifiers will be automatically assigned if omitted, though they are strongly encouraged for versioning reasons discussed later.

2.3??? Containers

Thrift containers are strongly typed containers that map to the most commonly used containers in common programming languages. They are annotated using the C++ template (or Java Generics) style. There are three types available:

list<type> An ordered list of elements. Translates directly into an STL vector, Java ArrayList, or native array in script-ing languages. May contain duplicates.

set<type> An unordered set of unique elements. Translates into an STL set, Java HashSet, set in Python, or native dictionary in PHP/Ruby.

map<type1,type2> A map of strictly unique keys to values Translates into an STL map, Java HashMap, PHP associative array, or Python/Ruby dictionary.

While defaults are provided, the type mappings are not explic-itly fixed. Custom code generator directives have been added to substitute custom types in destination languages (i.e. hash map or Google's sparse hash map can be used in C++). The only require-ment is that the custom types support all the necessary iteration primitives. Container elements may be of any valid Thrift type, in-cluding

other containers or structs.

struct Example { 1:i32 number=10, 2:i64 bigNumber,

 

3:double decimals, 4:string name="thrifty"
}

In the target language, each definition generates a type with two methods, read and write, which perform serialization and trans-port of the objects using a Thrift TProtocol object.

2.4??? Exceptions

Exceptions are syntactically and functionally equivalent to structs except that they are declared using the exception keyword instead of the struct keyword.
The generated objects inherit from an exception base class as ap-propriate in each target programming language, in order to seam-lessly integrate with native exception handling in any given lan-guage. Again, the design emphasis is on making the code familiar to the application developer.

2.5??? Services

Services are defined using Thrift types. Definition of a service is semantically equivalent to defining an interface (or a pure virtual abstract class) in object oriented programming. The Thrift compiler generates fully functional client and server stubs that implement the interface. Services are defined as follows:

service? <name>? {

<returntype> <name>(<arguments>) [throws (<exceptions>)]
...
}

An example:

service? StringCache? {

 

void? set(1:i32? key,? 2:string? value),

 

string get(1:i32 key) throws (1:KeyNotFound knf), void delete(1:i32 key)
}

Note that void is a valid type for a function return, in addition to all other defined Thrift types.

Additionally, an async modifier keyword may be added to a void function, which will generate code that does not wait for a response from the server. Note that a pure void function will return a response to the client which guar-antees that the operation has completed on the server side. With async method calls the client will only be guaranteed that the re-quest succeeded at the transport layer. (In many transport scenarios this is inherently unreliable due to the Byzantine Generals' Prob-lem. Therefore, application developers should take care only to use the async optimization in cases where dropped method calls are acceptable or the transport is known to be reliable.)

Also of note is the fact that argument lists and exception lists for functions are implemented as Thrift structs. All three constructs are identical in both notation and behavior.

3.??? Transport

The transport layer is used by the generated code to facilitate data transfer.

3.1??? Interface

A key design choice in the implementation of Thrift was to de-couple the transport layer from the code generation layer. Though Thrift is typically used on top of the TCP/IP stack with streaming sockets as the base layer of communication, there was no com-pelling reason to build that constraint into the system. The perfor-mance tradeoff incurred by an abstracted I/O layer (roughly one virtual method lookup / function call per operation) was immate-rial compared to the cost of actual I/O operations (typically invok-ing system calls).

Fundamentally, generated Thrift code only needs to know how to read and write data. The origin and destination of the data are irrelevant; it may be a socket, a segment of shared memory, or a file on the local disk. The Thrift transport Interface supports the following methods:?????????????????????

open Opens the tranpsort close Closes the tranport
isOpen Indicates whether the transport is open read Reads from the transport

write Writes to the transport

flush Forces any pending writes

There are a few additional methods not documented here which are used to aid in batching reads and optionally signaling the comple-tion of a read or write operation from the generated code.

In addition to the above TTransport interface, there is a TServerTransport interface used to accept or

create primitive transport objects. Its interface is as follows:

 

open Opens the transport

 

listen Begins listening for connections accept Returns a new client transport close Closes the transport??

3.2??? Implementation

The transport interface is designed for simple implementation in any programming language. New transport mechanisms can be easily defined as needed by application developers.

3.2.1??? TSocket

The TSocket class is implemented across all target languages. It provides a common, simple interface to a TCP/IP stream socket.

3.2.2??? TFileTransport

The TFileTransport is an abstraction of an on-disk file to a data stream. It can be used to write out a set of incoming Thrift requests to a file on disk. The on-disk data can then be replayed from the log, either for post-processing or for reproduction and/or simulation of past events.

3.2.3??? Utilities

The Transport interface is designed to support easy extension us-ing common OOP techniques, such as composition. Some sim-ple utilites include the TBufferedTransport, which buffers the writes and reads on an underlying transport, the TFramedTransport, which transmits data with frame size headers for chunking op-timization or nonblocking operation, and the TMemoryBuffer, which allows reading and writing directly from the heap or stack memory owned by the process.

Facebook Thrift Services

Thrift has been employed in a large number of applications at Facebook, including search, logging, mobile, ads and the developer platform. Two specific usages are discussed below.

1??? Search

Thrift is used as the underlying protocol and transport layer for the Facebook Search service. The multi-language code generation is well suited for search because it allows for application development in an efficient server side language (C++) and allows the Facebook PHP-based web application to make calls to the search service using Thrift PHP libraries. There is also a large variety of search stats, deployment and testing functionality that is built on top of generated Python code. Additionally, the Thrift log file format is used as a redo log for providing real-time search index updates. Thrift has allowed the search team to leverage each language for its strengths and to develop code at a rapid pace.

2??? Logging

The Thrift TFileTransport functionality is used for structured logging. Each service function definition along with its parameters can be considered to be a structured log entry identified by the function name. This log can then be used for a variety of purposes, including inline and offline processing, stats aggregation and as a redo log.

 Facebook Thrift Full Seminar Report and PPT

  

.??? Conclusions

Thrift has enabled Facebook to build scalable backend services efficiently by enabling engineers to divide and conquer. Application developers can focus on application code without worrying about the sockets layer. We avoid duplicated work by writing buffering and I/O logic in one place, rather than interspersing it in each application.

Thrift has been employed in a wide variety of applications at Face-book, including search, logging, mobile, ads, and the developer platform. We have found that the marginal performance cost in-curred by an extra layer of software abstraction is far eclipsed by the gains in developer efficiency and systems reliability.

A.??? Similar Systems
SOAP. XML-based. Designed for web services via HTTP, ex-cessive XML parsing overhead.

CORBA. Relatively comprehensive, debatably overdesignedand heavyweight. Comparably cumbersome software instal-lation.

COM. Embraced mainly in Windows client softare. Not anentirely open solution.

Pillar. Lightweight and high-performance, but missing version-ing and abstraction.

Protocol Buffers. Closed-source, owned by Google. Describedin Sawzall paper.

Download your Full Reports for Facebook Thrift

Advertisement

© 2013 123seminarsonly.com All Rights Reserved.