Saturday, June 29, 2013

ARC is awesome: My comments on Allen Bauer 's blog

Allen is a pretty talented guy, and he has a long history and deep insights into the history of Delphi.  I can confirm (from outside the Embarcadero/CodeGear fold) that everything he says is exactly the way it is, even the motivation for .Net being that Microsoft planned to "yank" the Win32 platform out from under us Win32/VCL programmers, and force everybody onto new "managed" .Net runtime APIs, to the point that existing Win32 APIs would be a "penalty box" environment, a sandboxed subworld inside the real windows, the equivalent of the NTVDM, the virtualization layer that kept 16 bit DOS mode executable compatibility around in Windows even as recently as Windows 7 32 bit.   You may remember that NTVDM only went away when we got to the 64 bit Windows world, because Win32 is a virtual layer inside the main windows environment, now known as "WOW" (Windows-on-Windows).    As being at the root of the platform's APIs is key to Delphi being Delphi,   the now mostly dead and buried "Delphi for .net" product seemed necessary to Delphi's continuing health and it having a future on a .Net-only Windows world, that never materialized.


If Embarcadero could ship a WinRT targeting version of Delphi, they would have already. But Microsoft is holding all the keys, as they have meticulously crafted a sandboxed, signed, Apple-App-Store rip-off mode inside Windows 8.   But WinRT is just a "penalty box" inside Windows 8.  Rather than enabling new features, it is primarily notable for what it takes away.   In Windows 8 AppStore/WinRT applications, your application does not get to decide when it runs, and is more heavily "managed" by a power-consumption-optimizing user shell that is more similar to Windows Phone,  iOS, and Android than it is to classical Windows programming. It's also heavily asynchronous, and if you haven't noticed code-samples for Windows 8 WinRT applications in C# make heavy use of the "magic" await keyword in C#.   I am not slavering for such magic to appear in Delphi, because such magic has a dark underside.     Just as .Net garbage collection has a dark underside.

Does ARC have a difficult and dark underside?  Not really.  But there are a few things to be aware of in reference-counting based memory management, things you probably already should know if you are a competent Windows COM/DCOM programmer, things like:

  • Reference counting cycles are something that happen when object A holds a reference to B and object B either directly holds a reference to A, or holds a reference to something else that holds a reference to A.    A "directed acyclic graph" means "a bunch of dots with lines in between them, and each line has an arrow, sort of line a one-way street sign, and you could follow the arrows any way you like, but never end up back at a node you had previously visited."  If your references are like that, they could in some way be considered a "tree" instead of a "spiderweb".  In your "tree" of objects there are no cycles.  If however your application has more of a spiderweb structure, then you will need to learn how to break these cycles.

  • Breaking reference counting cycles is easy to do in a CLANG/LLVM based ARC scenario, such as Objective-C in XCode, and the same is true in Delphi on iOS.  Weak references are the answer.  Weak references are really cool.  They are like a C++ smart pointer on steroids.   The compiler and its code generation back ends provide some really strong guarantees that enable you to work with a really easy to use programming model.  If the object pointed to by a weak reference still exists, you get that object. If it does not, you get a null (or nil if you like) object reference.  If you were curious how this worked when you first heard about it, you might want to read up on the Apple Clang/LLVM documentation on Weak references, because the Delphi implementation and mechanics are almost 100% guaranteed to be the same, since they're based on the same LLVM code.    Weak references are expensive.  They slow your application down, but remember that correctness (not crashing) is always more important than performance.  It would be a real mistake to pathologically avoid weak references where they are the correct solution just because they can introduce performance issues if abused or used too much.  On the other hand, if you can avoid circular references completely in your design, then you avoid the problem of refcount cycles for free.  Use your head, and choose on a case-by-case basis, and optimize after profiling and running and testing your code, not before.
Having used Objective-C with manual memory management (retain and release calls), and with garbage collection (which was crap, and is going away out of Objective-C), and now having used Objective-C with ARC, I am very happy to see ARC coming to Delphi.

I predict that when ARC comes to delphi on VCL Windows Desktop, a lot of old crap in your codebases is going to have to go away including:

  • Length delimited string syntax like  string[30] which is really the ancient shortstring type from TurboPascal
  • A lot of old compiler flags to turn on dos-mode real48 compatibility,  alias string to shortstring ({$H-}) 
  • UPDATED When I first looked at ARC, I thought Variant Records might be incompatible with it, but actually upon reflection, I'm guessing that there will just be continued restrictions on the combination of Variant Records with Managed Types, which is already the case today. You can't put an AnsiString in a Variant Record for example. So you won't be able to put an ARC object reference into a variant record.
I will be really surprised though, if the trial-balloon that Embarcadero has floated about non-mutable zero-based strings will fly.  I expect that is a deal breaker, as there are hundreds of millions of lines of otherwise clean code that still depend on Strings starting at position 1.   Parsing strings with Pos() and Copy()  and the lack of requirement to deal with a separate String and StringBuilder type are classic differentiators between a Delphi managed-heap-of-strings and a Java or .Net managed string environment.   Changing these semantics is not a simple matter of adding or subtracting 1 from a string position, it is a symptom of a fundamental difference between the classic mutable Delphi AnsiString and UnicodeString copy-on-write behaviours, and the behaviour of the type that it seems is being chosen to replace it, which is in the end,  something homeomorphic to the Cocoa/Apple immutable NSString type.   

  Homeomorphic means "same shape", or in this case, "if it walks like a duck, quacks like a duck, and plays poker as badly as a duck, then it is a duck".  It looks like the easiest way to move Delphi onto an LLVM compiler backend and ARC is to adopt immutable NSString as part of Delphi's base way of doing strings. I think that's a mistake.  I believe it is possible to preserve the appearance and semantic model that Delphi has always provided.  Strings are (to a user of the language, not to the implementor) as simple to work with as integers and characters. This differs from most compiled languages.  Strings in Delphi are as simple to work with as strings in Python or C#.   I am not aware of any other language that is strictly statically typed, and runs without an interpreter, JIT or VM, and yet provides as simple a string model as Delphi.   Don't even try to tell me that C++ has it, because while you could probably write a string class in C++ that only you even use, real world C/C++ codebases routinely mix 10 or 20 string-like types per codebase.  Erasing that advantage would not only break how Pascal/Delphi string coding has worked since 1984, it would prevent people from moving their existing codebases up, and would slow Delphi adoption.  

I expect that the new compiler and the old compiler will be co-hosted in a single Delphi IDE for  the next four to eight releases of Delphi, and that eventually the classic non-LLVM compiler will go away. During that time, it's critical that all Delphi codebases get transferred into a form that allows them to move up.  That means that ad-hoc hacks like "obj.Free" doing nothing, and, if this is going to fly, it also means that String semantics stay exactly where they are, with no attempt to even offer immutability or zero-based offsets is made to users.  The last thing we need is a new {$H+} and {$H-} compiler switch to replace the old one.

A simplified, rational coding model that can be easily applied and used without any possibility of bad things happening silently in the background, and without requiring users to rewrite around immutability or zero-based strings is required for this move towards ARC to be a success.



18 comments:

  1. Do you understand how much perfectly working code, especially those dealing with C/C++ data structuire, will break if variant records are removed? Code won't be ported to ARC. Will be ported to C/C++ instead. And not with C++ Builder, believe me.

    ReplyDelete
  2. I think that some developers would be upset if variant records went away, and I hope that they will keep them, but as they are a bit of a crazy feature, I wouldn't be surprised if they got axed.

    ReplyDelete
    Replies
    1. It's difficult to understand why losing features (easy management of string of different types, variant records, pointers becoming "deprecated", ecc. ecc.) is seen as an improvement. It looks to me that because Emb has no longer the people to write its own compiler it has to rely on an open source project and thereby is forced to remove some great Delphi features simply because they don't fit the new compiler. Getting ARC at that price is not that great. Sure, you avoid some try...finally Free end; but the price is that Delphi can't be used any longer in many application it worked great till now. I'm not sure it's a gain, it looks a big loss to me.

      Delete
    2. I've changed my mind on the VariantRecord thing. They are already limited to non-refcounted types in the current compiler, so probably they will stay the way they are. The point that they are needed for C++ interoperability plus the backwards compatibility reason, is probably enough to keep them around. Delphi has huge backward compatibility and that's always been a huge driver in engineering. You can still take most Delphi 1.0 code and compile it with very few changes.

      Delete
    3. Let me clear a few things up here... First of all, our use of LLVM is not a case of not having people to write our own compiler. I assure you that we still have some very talented and experienced *compiler* people. Secondly, LLVM from our perspective is merely the *backend* to the compiler. The existing front-end which is what really defines the language is identical except for the places that ARC was added for object instances. In fact all the compiler front-end code that handled managed types for strings, interfaces, variants, etc... only required relatively few changes to also support instances.

      Moving to LLVM allows us to better focus on the language itself. LLVM as a project is moving forward at a blistering pace and being able to leverage that is going to be a huge positive gain for our market and customer base. Not only did that allow us to get to the ARM CPU quickly and with relative ease, it will also allow us to continue to gain in terms of optimizations and other tooling around the LLVM projects.

      Finally, LLVM does not dictate anything regarding ARC or it's implementation. In fact, Delphi has done ARC in some form or another since Delphi 2... long before LLVM, iOS, or the addition of ARC to ObjectiveC. In that sense, I think we probably have a lot more experience on the ARC front than some other languages and products.

      Delete
  3. Great article again! Thanks a lot!
    Just as a note: My eyes hurt wehn reading this white-on-black text. Am I the only one to feel so?
    Regards, Klaus

    ReplyDelete
  4. I'm not sure that the VCL is going to be ported to ARC. The VCL is quickly becoming a second class citizen in the Delphi world with most of Embarcadero's attention being paid to FireMonkey.

    ReplyDelete
    Replies
    1. Che shared a similar optimism. EMB is aware that the VCL is strongly required still. What you are talking about is the long term perspective of the long term project.

      Delete
  5. That would mean a permanent commitment to maintain TWO compilers. I doubt that is in the cards. Porting VCL to the new ARC compiler would NOT be a big undertaking.

    VCL small. Compiler big.

    ReplyDelete
  6. I think, first of all Embarcadero should not change the way String type is working if they are wanting to keep some customers still and second, they must drop the old compiler for the new one the NEXT release. Not in 8 years. And the passage must be completely easy for the customer. If they won't achieve this, they loose.

    Just plain clear.

    Programmers are just waiting to switch. They are not waiting to stay, major breaks on new compiler will make them switch that's all. Embarcadero is not trustable, nor committed to keep customers happy or whatever. They don't give support, they don't fix the past. Microsoft is far better, and all the other development tools are far better than Microsoft in some parts and worse in others, but all of them are far better than Embarcadero's.

    I did magic things with Delphi, I work with it since 2.0. But the companies that worked on it just damaged everything.

    ReplyDelete
  7. I won't reopen the can of worms the string arguments become, but variant records are essential to interfacing to C/C++ which uses and exposes unions. Their removal would remove Delphi as a possible solution to another class of problem.

    This is something I have dealt with for many years, in 3rd party APIs to adapter cards. NOTE to EMBT: We do not all spend our lives in desktop and DB work. I could address the interface issue with a wrapper DLL, but there are two good objections to that: First, that it introduces another layer of indirection into an already complex project, and second, that it adds a maintenance issue with long term costs, and which will not always be handled by the project developer.

    I can understand the concerns about maintenance of two compilers, especially in a market which is dwindling. (OK, that's conjectural, but public evidence supports the conjecture, and no counter proof is made public.)

    In the end, EMBT may simply elect to ship a less general language product. Ironically, I know that Delphi remains a popular choice in broadcast automation, where the hardware interface issue will always need a solution. Toes, meet shotgun.

    ReplyDelete
  8. Okay, let's hope I'm wrong about variant records. :-)

    W

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Variant records are not slated for destruction anytime soon... Although you are correct in your assertion about whether or not you can have an instance reference in the variant portion... you cannot, for the same reasons as you cannot have strings, variants, or interfaces.

      For those that are concerned about interoperability with C/C++ code, that shouldn't really be a problem because it is likely that the C/C++ code in question would not really understand those types anyway, let alone be able to manage those types properly. For C++Builder which does have some understanding and knowledge of those types, the same restrictions are already in place for that side.

      Delete
  9. What do you think you'd lose by going to immutable strings? I'd like to understand what, precisely, is the fear here. When you hear immutable strings, what does that mean to you?

    I ask this because I suspect there may be some misconceptions regarding what the world would be like with immutable strings.

    ReplyDelete
    Replies
    1. I think that deserves a long post.

      First, there is an assumption that the delphi String operation S := S + 'X' is low cost and that a StringBuilder class of the .Net or Java variety is not required. Because the string content S references can be grown easily and with low cost from a dedicated string heap or pool.

      Secondly, if an immutable string option is to be added, it should not be as String, it should be as ImmutableString, and users should opt into it. Like "const" declarations in C and C++, they should be explicit and not implicity.

      Thirdly, there is a tonne of code already out there that is written with a var S:String declaration that modifies characters inside the string at will. ( S[x] := aChar ).

      In short, Delphi XE4 with the classic Win32 and the Win64 compiler has more compatibility so far, with Delphi 1.0 from 1995 and even with TurboPascal compiler practices from the DOS era, that it can be a major marketing strength of Delphi; Look we don't break your code, ever. Or rather, when we do, it's because we have to. So the question is, do the supposed benefits of immutable strings provide such a HUGE overwhelming advance for Delphi that they are worth breaking backwards compatibility? No, they do not. I guess I'm flipping the question back to you to say "Can you point me to an advantage that we would get by doing this that would outweigh the fact that you will kill adoption of your product by 90% of existing customers, if you do this?"

      Delete
    2. "First, there is an assumption that the delphi String operation S := S + 'X' is low cost and that a StringBuilder class of the .Net or Java variety is not required. Because the string content S references can be grown easily and with low cost from a dedicated string heap or pool."

      It's as I expected. That won't change with immutable strings. If the memory manager (it's not he string management that does this) decides to re-size a memory block "in place" then that is an implementation detail. Either way, the LValue S and the RValue S are *different* strings even today. That expression merely takes the existing value of S and the literal 'X', creates a new string which can hold both, then moves the data from S and the literal 'X' into it. The fact that the memory manager *may* re-size S because it knows it's going away is an implementation detail and not because "S" is mutable... that expression isn't actually mutating anything... it's creating a whole *new* string. Only the string data would be immutable... S is a variable and as such can be mutated as expected and be reassigned to another string.

      "Thirdly, there is a tonne of code already out there that is written with a var S:String declaration that modifies characters inside the string at will. ( S[x] := aChar )."

      With immutable strings, this is the *only* case that won't be allowed. I understand there is a decent amount of code out there does this, we have code in the RTL that does that too.

      As for adding an "ImmutableString" type... that kind of goes against the whole goal of reducing the number of disparate string types.

      Right now, moving to immutable strings isn't set in stone... even now, the compiler will optionally emit a warning (by default on mobile platforms) in the "S[x] := aChar;", but it will continue to operate as before. You an also disable the warning for now as well.

      90%, eh? Hyperbole much ;-). I can only answer that by saying that if we were to introduce a new internal data structure for strings that made things like concatenation and formatting significantly faster, would that be worth it? What if those data structures helped reduce fragmentation and increase string fragment sharing? So rather than "optimizing" your code by directly assigning each character, you could merely build up the string with simple concatenation and actually minimize the amount memory movement and allocations. It may even be possible to continue to allow mutability and still introduce this change. So, as I've said, immutable strings aren't set in stone.

      Delete