#jython IRC Log (v0.9)

IRC Log for 2016-09-09

Timestamps are in GMT/BST.

[0:56] * jimklo (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) has joined #jython
[2:02] * jimklo (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) Quit (Remote host closed the connection)
[2:02] * jimklo (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) has joined #jython
[2:07] * jimklo (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) Quit (Ping timeout: 265 seconds)
[2:30] * jimklo (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) has joined #jython
[4:25] * jimklo (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) Quit (Remote host closed the connection)
[4:25] * jimklo (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) has joined #jython
[4:27] * jimklo_ (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) has joined #jython
[4:27] * jimklo (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) Quit (Read error: Connection reset by peer)
[4:46] <pjenvey> https://twitter.com/raymondh/status/773978885092323328
[4:51] * jimklo_ (~jimklo@71-84-19-18.dhcp.trlk.ca.charter.com) Quit (Remote host closed the connection)
[4:51] <pjenvey> jimbaker: how will this look on the JVM didn't you have some ideas?
[5:35] * stewori (~stefan@5.146.129.76) has joined #jython
[5:36] <stewori> we can use http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html to have ordered dicts, can't we?
[5:41] <stewori> we would have to make it synchronized and - where needed - weak manually
[5:41] <stewori> e.g. with making it synchronized by http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#synchronizedMap(java.util.Map)
[5:49] <pjenvey> that's the key, doing it in a thread safe manner, synchronizing does it, but vs a lock free algorithm which is currently used for dict
[10:22] <stewori> pjenvey: sorry, wasn't aware that ConcurrentHashMap actually works lock-free. However I then took a look at LinkedHashMap source. It works by extending HashMap and adding an overlay structure (a double-linked list) to maintain order. Maybe we could write something similar on top of ConcurrentHashMap. It might require some tricky tweaking effort to let this overlay structure work lock-free too (a double-linked list might not be suitable), but I am confident it
[10:24] <stewori> (but let's see what Jim thinks, maybe he has a better idea)
[11:26] <chrisseaton> @stewori both JRuby and JRuby+Truffle ended up using their own hash implementation for ordered maps - I can't remember if there was a good reason for that which might apply to your use case as well
[11:47] <stewori> IMO it's preferable to rely on the builtin concurrent hash mechanism as far as possible. Adding an ordering structure on top feels like a minor addition to me compared to what smart brainwork in terms of optimization etc a concurrent hash algorithm might have under the hood.
[11:47] <stewori> So I wouldn't want or recommend to reinvent that wheel.
[11:48] <stewori> Anyway, if JRuby already did this, it would be woth a look. Maybe we can adopt their solution.
[11:53] * stewori (~stefan@5.146.129.76) Quit (Read error: Connection reset by peer)
[11:56] * stewori (~stefan@5.146.129.76) has joined #jython
[11:56] <chrisseaton> How would you make the ordering structure on top atomic with the concurrent structure below though?
[11:56] <chrisseaton> Concurrent data structures are usually not composable
[12:37] <stewori> Assuming that ordered dict semantics "just" means that iterating over the dict yields elements in insertion order, I think it should be doable though. Maybe this is too quick-thought, but I would just add a _single_ linked list (reverse iteration isn't required, is it?) by using a specialized entry-class.
[12:38] <stewori> It should be fairly doable to impelemnt this lock-free, because insertion can only happen at the end of the list. List-end would be maintained as an AtomicReference.
[12:39] <chrisseaton> But I mean how do you add to the concurrent hash map and you to your linked list in a single atomic action?
[12:39] <stewori> For removal I would set the payload to null or some dummy, so also this would be easy to get atomic
[12:39] <chrisseaton> Or does it matter if something is in the concurrent hash map but not the linked list?
[12:40] <stewori> Actually ConcurrentHashMap already provides limited guarantees here. First note that this issue would only affect iterators, (wouldn't it?)
[12:40] <stewori> From ConcrurrenHashMap javadoc: Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
[12:42] <stewori> Haven't thought this through in detail, but I am confident it would be possible to preserve this behavior with the explained design.
[13:15] * stewori (~stefan@5.146.129.76) Quit (Remote host closed the connection)
[13:18] * stewori (~stefan@5.146.129.76) has joined #jython
[13:58] * xemdetia (xemdetia@nat/ibm/x-edtrfebgxueevmkv) has joined #jython
[14:24] <jimbaker> pjenvey, stewori - i would assume we would use https://github.com/ben-manes/caffeine, which requires java 8
[14:24] <jimbaker> this is a successor in ben's work on concurrent linked hash maps
[14:26] <jimbaker> but i haven't explored such concerns such as overhead of constructing the map (rather important!) - we already pay a bit in size and time for ConcurrentHashMap vs just a regular HashMap
[14:27] <jimbaker> chrisseaton, re concurrency semantics: we don't do anything more than what is implicitly provided by cpython's GIL in how CHM is exposed
[14:28] <chrisseaton> But if the whole thing isn't atomic, why use a concurrent hash map at all? Why not use an unsyched one?
[14:28] <chrisseaton> (we don't provide safety on most data structures either)
[14:28] <jimbaker> https://github.com/ben-manes/caffeine
[14:28] <jimbaker> oops, bad paste
[14:28] <jimbaker> http://www.jython.org/jythonbook/en/1.0/Concurrency.html#atomic-operations
[14:29] <jimbaker> chrisseaton, it's just a question of using python code expectation - they implicitly expect GIL semantics
[14:30] <jimbaker> we found this to be a problem as we started porting larger codebases onto jython
[14:30] <jimbaker> there's obviously buggy code that would fail at least eventually on cpython. but because of implementation aspects of cpython, it would be find on cpython, fail on jython. hence CHM
[14:31] <jimbaker> and that's muddled
[14:31] <jimbaker> 1) there's some set of buggy code out there. we can't do much about it
[14:32] <jimbaker> 2) there's another set of code that implicitly depends on atomic semantics introduced by the GIL. this code always run fine on cpython (and pypy)
[14:32] <jimbaker> CHM lets us support the code in set #2
[14:32] <chrisseaton> But this is what I'm saying - if adding an item to a dict needs to be atomic, and you're proposing implementing that using a concurrent hash map + a separate linked list, that's not going to be atomic is it?
[14:33] <jimbaker> chrisseaton, no, i'm proposing to use caffeine, which actually goes get it right
[14:33] <chrisseaton> ook
[14:33] <chrisseaton> I think was originally replying to stewori
[14:34] <jimbaker> yeah, clearly composition of two separate concurrent things doesn't mean they will be concurrent. so that's why STM in pypy. or more cleverness in the jvm ecosystem
[14:34] <jimbaker> as with caffeine
[14:34] <chrisseaton> the new STM PyPy paper has been accepted for DLS, btw
[14:35] <stewori> I think in my proposal this only affects iterators. And the current solution with ConcurrentHashMap already doesn't guarantee how an iterator behaves if the map was modified after the iterator was created.
[14:35] <chrisseaton> stewori: but what happens if someone gets in between you updating your concurrent hash map, and updating your linked list?
[14:35] <jimbaker> stewori, correct, it's only best effort
[14:39] <stewori> chrisseaton: If an iterator is created in that moment, depending on implementation detail this iterator might or might not work w.r.t. this update. I think this slightly undefined behavior does not violate the current low guarantees
[14:40] <chrisseaton> That doc Jim linked to me says adding an entry is atomic - what you are describing ain't atomic
[14:40] <chrisseaton> I don't really know Python or Jython though, so I should be humble
[14:43] <jimbaker> per the CHM docs: " Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time." (i overstated the best effort aspect) https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html
[14:44] <stewori> Chrisseaton: the doc linked by Jim mentions "Python guarantees the atomicity of certain operations". The list of atomic actions following that statement does not cover iterator behavior.
[14:45] <jimbaker> chrisseaton, but i think you know CHM's semantics :) let's just assert that the informal implementation semantics of CPython is <= what CHM provides. this may or may not be true. they are informal implementation semantics after all
[14:45] <chrisseaton> "the following are atomic operations for Python code ... adding an item (to a dict)"
[14:45] <chrisseaton> If I can intercept you in the middle of doing that, it isn't atomic
[14:45] <jimbaker> chrisseaton, correct, it's not atomic
[14:45] <jimbaker> the only way would be if CHM supported some sort of persistent dictionary style semantics, similar to clojure
[14:46] <jimbaker> but that would be a different programming model. if very nice
[14:48] <jimbaker> nickmbailey and TomA mix clojure and jython together. i wonder if they have any comments here
[14:58] <stewori> chrisseaton: even if not atomic, the iterator behavior on concurrent modification wouldn't become worse than it already is. Currently you cannot know if the concurrently modified (e.g. removed) element was already iterated over.
[14:59] * TomA_ (~TomA@2601:402:500:8e98:b8ef:9782:5e02:d839) has joined #jython
[14:59] <stewori> E.g. by updating the ConcurrentHashMap backend always before the linked list this just adds another source of potential async between map and iterator. But this must be expected to happen already in current situation.
[15:01] * TomA (~TomA@2601:402:500:8e98:c954:c01d:1d6c:beee) Quit (Ping timeout: 250 seconds)
[15:32] <chrisseaton> I think it could be worse - if the thread dies after adding to the map but before adding to the iterative you'll be forever out of sync
[15:45] <stewori> There should be no potential thread-killing actions involved between these actions. If you assume that a thread can die at any time without apparent reason (e.g. someone calls thread.stop()), this can also put other concurrent data structures in insane state I suppose. (They still use atomics for sync here and there.)
[15:47] <chrisseaton> "this can also put other concurrent data structures in insane state I suppose" right, that's why they use atomics. 'atomic' means you can kill a thread in the middle of doing something and the operation doesn't apply anything
[15:57] <stewori> Also in existing concurrent data structures access actions consist of multiple steps. I find it hard to believe that it is possible to have an implentation of complex data structures completely fault tolerant against any-time thread failure. Doesn't 'atomic' here just mean that the outcome of access by multiple threads is well defined?
[16:00] <stewori> Looking at http://www.docjar.com/html/api/java/util/concurrent/ConcurrentHashMap.java.html they actually do lock the table here and there. It is still called lock-free, because they use efficient atomic locks rather than 'synchronized'. Still if a thread is killed when the table is locked, before unlock is called, this might e.g. lock forever
[16:01] <stewori> They usually have a 'finally' statement to assure unlock in most error-cases. But even finally wouldn't be executed if thread is killed: http://stackoverflow.com/questions/14576765/does-the-finally-block-execute-if-the-thread-running-the-function-is-interrupted
[16:04] <stewori> Though unnecessary IMO this sort of locking could also (trivially?) be applied to glue linked list and backing map access together. Using 'finally' for eventual clean-up should also be straight forward there.
[16:06] <chrisseaton> Yes you're right this does do locking on individual values for updates, I didn't actually know that
[16:12] * jimklo (~jimklo@192.12.16.110) has joined #jython
[16:31] <jimbaker> chrisseaton, stewori - code to demonstrate that a simple linked list implementation + CHM does not work, use the existing collections.OrderedDict in jython 2.7 - https://gist.github.com/jimbaker/ef6449e99cd092376d9acbc380301196
[16:32] <jimbaker> "does not work" here means "exhibits behavior where the data structure is corrupted"
[16:34] <jimbaker> much like HashMap vs ConcurrentHashMap, it requires a bit of effort to demonstrate corruption. enough threads. enough different ops. but not so much - and this combines what we discussed - iteration plus mutation
[16:37] <jimbaker> fwiw, the deadlock/publication bug we observed in python type <-> java class mappng was an example where the CHM's locking structure implementation leaked - the CHM lock was part of the deadlock
[16:37] <jimbaker> i had never seen that before
[16:38] <jimbaker> usually one can treat CHM as lock free and life is good
[16:39] <jimbaker> very interesting questions. i cannot wait to try out caffeine
[16:39] <jimbaker> i assume we will implement in jython 3, which i'm looking forward to working on once we release 2.7.1
[16:41] <stewori> Jim, this ghist is based on CPython's collections module, which was developed in a GIL'ed environment
[16:42] <stewori> I don't think the issue fundamentally applies for a Java-side implementation, if care is taken to make insert and remove thread safe
[16:42] <stewori> anyway, if caffeine does the trick it's also fine
[16:43] <stewori> If Java's CHM is already faulty, it's of course a different story
[16:47] <stewori> oh, okay I overlooked there actually is _collections in org.python.modules
[16:48] <stewori> still the OrderedDict stuff comes from CPython implementation
[17:03] <jimbaker> stewori, collections.OrderedDict is the standard pure python impl of this collection type - the _collections import is something that CPython does as well
[17:04] <jimbaker> i don't think python vs java would matter here. it would be more work to try it out - this was a quick demo of what could go wrong
[17:05] <stewori> so how does CPython make the linked list thread-safe?
[17:05] <jimbaker> it doesn't
[17:06] <stewori> exactly
[17:06] <jimbaker> it would need to do some of the same GIL stuff that presumably the new C impl in 3.6 does
[17:06] <stewori> In a Java implementation of this principle we could use atomicReference and such to secure this
[17:06] <jimbaker> then it get atomicity
[17:07] <jimbaker> stewori, yes - i think we all agree here. i just wanted to point out the challenge of a simple linked list impl
[17:07] <jimbaker> here's a classic blog post on the problematic usage of HashMap (which is perfectly fine!) - http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html
[17:08] <stewori> Alright. Just was responding to "i don't think python vs java would matter here"
[17:08] <stewori> which I disagree on
[17:08] <jimbaker> yeah, we could do the same thing in python, using AtomicReference
[17:18] <stewori> Okay, that's surely true.
[18:47] <jimbaker> i tried the test on cpython 2.7 and 3.5. on cpython 2.7, OrderedDict breaks quickly with my test. on 3.5, i had to increase the number of test runs, but eventually i could break OrderedDict
[18:47] <jimbaker> it will be interesting to see what 3.6 looks like. i will try building it
[19:41] * xemdetia (xemdetia@nat/ibm/x-edtrfebgxueevmkv) Quit (Ping timeout: 255 seconds)
[20:04] * stewori (~stefan@5.146.129.76) Quit (Quit: Leaving.)
[20:23] * jimklo (~jimklo@192.12.16.110) Quit (Remote host closed the connection)
[20:24] * jimklo (~jimklo@192.12.16.110) has joined #jython
[20:28] * jimklo (~jimklo@192.12.16.110) Quit (Remote host closed the connection)
[20:28] * jimklo (~jimklo@192.12.16.110) has joined #jython
[21:01] * TomA (~TomA@2601:402:500:8e98:a19e:c50c:1f1b:fbe4) has joined #jython
[21:04] * TomA_ (~TomA@2601:402:500:8e98:b8ef:9782:5e02:d839) Quit (Ping timeout: 250 seconds)
[21:24] * jimklo (~jimklo@192.12.16.110) Quit ()

Index

These logs were automatically created by JythonLogBot_ on irc.freenode.net using a slightly modified version of the Java IRC LogBot (github).