Where

INRIA (Paris)
23, avenue d'Italie
75013 Paris

Meeting will take place in Room Bleu 1 (6 Floor)

INRIA (Place d'Italie)

Day One

Sessions

9:30am 10:00am Arrival / "Tour de table"
10:00am 11:00am Replication / Algorithms
11:00am 11:30am break
11:30am 12:30pm Replication / Algorithms 2
12:30pm 1:30pm lunch
1:30pm 3:00pm Replication / Theory
3:00pm 3:30pm break
3:30pm 5:00pm Replication / Theory 2

Participants

SCORE: Gérald Oster, Luc André, Pascal Urso, Mehdi-Ahmed Nacer, Hyun-Gul Roh
REGAL: Marc Shapiro, Marek Zawirski, Masoud Saeida Ardekani, Pierpaolo Cincilla, Lokesh Gidra, Mesaac Makpangou
CASSIS: Abdessamad Imine, Hoang Bao Thien
ASAP: Stéphane Weiss
XWiki SAS: Fabio Mancinelli
GDD: Pascal Molli
UNL: Nuno Preguica

Designing an advanced CRDT: a graph data structure for asynchronous processing of web data

Presenter: Nuno Preguiça Documents: PDF

Notes.

Graph CRDT
Web structure represented as directed graphs
Web evolves therefore the graph has to be updated
Web pages processed concurrently by multiple servers / incremental processing

State-based solution (based on observed-remove sets)
Two sets: nodes + arcs
Operations:

  • (updates): addNode(n), removeNode(n), addArc(n1, n2), removeArc(n1, n2)
  • query (reads): lookupArc((n1,n2))


Garbage collection mechanism based on state vectors to avoid tombstones
Snapshots management mechanism computed using state vectors

  • useful for? support access to consistent data in transactions and data evolution history

    Pascal U.: gc looks like the managenement in logoot? why don’t use URI/URL as unique identifier? because the referenced content can be added/removed multiple times, remove operation removes all observed uids associated to an URL.
    Marc: You cannot remove immediately, right?
    Nuno: No I can.
    Abdessamad: What about versions vectors limitations?
    Nuno: In our context, cloud computing, we know the limited number of clusters and their unique identifier
    Pascal M: So, basically, the scope of applications of CRDT has been reduced. At the beginning Woot/Logoot were proposed for peer-to-peer networks. You can use consensus in this context, no?
    Nuno: Well consensus might not be suitable. For instance Amazon does not use consensus (may be because of data-centers spread over the world).
    Marc: If you want high-performance you can’t put consensus in your loop
    Nuno: cloud computing literature states optimistic replication is used

C-Set: A Commutative Replicated Data Type for Semantic Stores

Presenter: Pascal Molli Documents: PDF

Notes.

Presented at REsource Discovery 2011 workshop (http://ldc.usb.ve/~mvidal/RED2011/) co-located with Extended Semantic Conference (ESWC 2011).

Context: Social web is adopting semantic technologies and is generating massive new semantic datasets.
Challenge: synchronizing semantic stores (very large data sets, autonomous participants, etc.)

CRDT for semantic store: semantic data can be represented as sets of triples.
Need a CRDT for sets.
Set (real semantic) is not a CRDT.

Proposal: C-Set
S = {(e, count) : e \in elements, count \in Z}
local operations: ins(e: element), del(e:element)
remote operations: rins(e:element, k: Z), rdel(e:element, k: Z)
Current proposal does not preserve intentions (see counter-example)
Marc: Did you try OR-set?
Pascal M.: yes, i think it will work, but we will loose on another point (vectors? tombstones?)

Telex Light: a platform for cooperative social applications

Presenter: Pierpaolo Cincilla Documents: PDF Δ

Notes.

Telex: A communication infrastructure for collaborative nomadic applications
Demo

The cost of consistency in large-scale replication

Presenter: Masoud Saeida Ardekani Documents: PDF Δ

Notes.

Full replication, partial replication (atomic broadcast), genuine replication (atomic multi-cast)
Snapshot isolation (read latest snapshot), generalized snapshot isolation (read any snapshot)
Consistent snapshots are determined using concurrent version vectors

Generalized snapshot isolation (GSI) : snapshot monotonicity

Read-write dependence vector (RWDV)

More scalable than other Genuine approach-based systems (due to relaxing of monotonicity), or partial replication-based (latency greater due to atomic broadcast requirement).

Asynchronous re-balancing of a replicated tree

Presenter: Marek Zawirski Documents: PDF

Notes.

Delta (previous work) : Novel catch-up mechanism based on symbolic positions

State of the art in using semantic information to ensure consistency

Presenter: Marek Zawirski Documents: PDF

Notes.

Bloom [Alvaro et al. 2011]: Use monotonic logic programming model to encourage create pieces of program can run concurrently. What about writing/expressing CRDT in Bloom? http://www.bloom-lang.net/

ESCOPADS: Earth-SCale, COnsistent, Privacy-preserving, Autonomic Data Service

Presenter: Marc Shapiro

Notes.

Distributed search engine/crawler/index/precomputed queries/... based on CRDTs

Day Two

Sessions

9:30am 10:30am Evaluation / Experimentations
10:30am 11:00am break
11:00am 12:30pm Requirements / Architecture
12:30pm 1:30pm lunch
1:30pm 3:00pm Security
3:00pm 3:30pm break
3:30pm 16:30pm Coordination / Discussions

Participants

SCORE: Gérald Oster, Luc André, Pascal Urso, Mehdi-Ahmed Nacer, Hyun-Gul Roh, Claudia-Lavinia Ignat, Hien Thi Thu Truong
REGAL: Marc Shapiro, Marek Zawirski, Pierpaolo Cincilla, Lokesh Gidra
CASSIS: Michaël Rusinowitch, Hoang Bao Thien
ASAP: Stéphane Weiss
XWiki SAS: Fabio Mancinelli
GDD: Pascal Molli
UNL: Nuno Preguica

Evaluating CRDTs for Real-time Document Editing

Presenter: Mehdi Ahmed-Nacer Documents: PDF

Abstract. TBC

Notes.

Nuno: User operations are captured or computed with a diff algorithm?
Mehdi: They are captured when you type in.
Marek: You mean your logs store the causal relation between operation?
Mehdi: Yes
Pascal U.: We replay the history while taking into account causal relations. But it is worse to note that concurrent operations might be replayed in an order different from the real one.
Nuno: remark about average value without deviation
Nuno: why OT does not perform that well in the presented scenario?
Marc: Any differences with the results (performance evaluation) observed with serialized traces from wikipedia?
Marc: Any experimentation results for memory consumption?
Marek: did you have a look on how computed final states differ between algorithms?

Discussions Related to XWiki Integration

Presenter: Fabio Mancinelli

Notes.

We need to define a format for collected traces.

Architecture / Requirements

Presenter: Stéphane Weiss

Notes.

CRDT, Telex, … have different requirements
collaboration model: users you know, users that share preferences, etc.
telex: you need to know number of users, id of users
for instance, git and wikipedia are not based on the same collaboration model. Wikipedia would never work with access right, etc.
p2p rt facebook? p2p rt googledocs? p2p rt "search engine"?
Realtime Web: it breaks the paradigm crawl/cache/search
but it is not editing!
so all cache technics do not work anymore
(as an example for a wiki, rendering have to be performed on the client)
the architecture would be completely different
editing -> propagation as fast as possible to all others users (reader, not only writers)
"TBA", Fabio Mancinelli
Discussion in Nancy when Stephane will come for several days. We might organize a meeting with Fabio too on these days.

On an API for Securing Distributed Social Networks

Presenter: Hoang Bao Thien Documents: PDF Δ

Abstract. TBC

Notes.

Interesting reference: http://www.safebook.us/home.html

A Contact extended Push Pull Clone Model

Presenter: Hien Thi Thu Truong Documents: PDF Δ

Abstract. TBC

Notes.

Discussions related to STREAMS WP4 and Deliverable L4.1 (T0+18) will take place in Nancy during a meeting in July.