Thoughts on software distribution formats

By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), March 22, 2021 1:45 pm
Room: Moderated Discussions
Source code is unlikely to be desirable for the primary distribution format. Much software is distributed with an assumption that the end-user is not allowed to modify the program outside of specifically implemented mechanisms and is not interested in detailed debugging or bug-seeking. Security (by obscurity) and protecting trade secrets (including interoperability aspects like file formats as well as algorithms and practices of questionable legality) can motivate prohiting disassembly (much less decompilation). Source code is also a low density and human-friendly rather than compiler-friendly.

Web Assembly, Java bytecode, Microsoft CLR, and similar intermediate representations are not primarily designed for install-time or even load-time compilation but place some importance on intrepretation (simplicity and/or performance thereof). Just-in-time compilation is also assumed to be somewhat straightforward, at least for a first compilation. Density of representation is also considered important. These design choices are partially motivated by an assumption of low persistence relative to reuse (i.e., caching is generally not profitable). I speculate that Web Assembly targets low reuse due to many software sources and frequent modification — such seems common practice for the Web where even frameworks (cf. standard libraries) seem to be fragmented not only by libraries with similar functionality but also by version and provider (url vs. theoretical singular uri).

Distributing a directly executable format avoids translation overhead but typically constrains localized optimization and diversity of platforms supported. This is attractive to established and popular ISA and OS vendors since it reduces the availability of software for potential and existing alternatives. With long-term ISA and OS compatibility, end-users benefit from simplified management of software (such a lower level format encourages this compatibility). With limited diversity (also encouraged by lower level format), software developers can more easily adopt more responsibility for reliability and performance since much of such is tied to the platform (controlling the intermediate translation layers also helps). Long-term interfaces are also more thoroughly exercised and the maintainers have some motivation and resources to provide operational compatibility and marketable improvements. The thickness of a persistent interface layer not only extends the persistence over more functionality but contains more interactions with a single layer (which is significant with leaky layer abstractions). Such formats also provide good density.

An intermediate format between source code and virtual machine language is possible. This format would remove human-meaningful names, perform some optimizations, and provide metadata for further optimization. The nature and diversity of expected end-targets would influence the selection of metadata and the closeness of the format to machine language. The software distribution format proposed for the Mill can assume substantial commonality of functionality (operations are directly on queue/belt entries, static scheduling, select/predication almost always preferred for short branches [falling out of static scheduling and wider execution], no implicit threading, etc.), and so more optimizations/processing can be done in advance.

The distribution format also has implications for responsibility for platform specific fixes (from a hardware bug, miscommunication of the hardware specification, or programming bug that is not universally exposed) and for reliability generally. Even software distributed in a directly executable format may include resource recommendations or unsupported configurations; a higher-level distribution format would seem to increase the incentives for caveats and replacing best effort or guaranteed effect with good faith or reasonable effort. This implies a significant cultural change, perhaps comparable or greater in significance to that associated with lease vs. own.

Availability of the software in a given format is also a consideration. The persistence failure of source code is a well-known issue and license managment issues are fairly well-known, but when a necessary software component is not stored locally an unexpected failure of remote storage can generate an unexpected failure. (Even with local storage licensing fine print can unexpectedly remove availability.) While some software vendors might prefer requiring relicensing on any platform change, users often have an expectation of ownership (use not limited by time or execution platform); if a portable software format is managed remotely, some efficiencies of caching translation work are possible but ownership-prevention also becomes easier. Back-up and restore may also become more complicated (as shown by the cases of source code persistence failure).

Software distribution can also exploit various opportunities for sharing. As with processor memory cache sharing, even when sharing would provide a reduction of retrieval work replication can be more efficient overall.

Software formats can be viewed as levels of caching (as well as interface stack levels — and interface stack levels have some relation to pipeline stages). C language development environments have long cached object files under the assumption that modifications are often localized. Theoretically, a programming language and development environment could be developed which facilitated broader use of caching, but any caching should consider the costs of cache hits, the cost of cache management (storage, consistency/coherence, retention choices), and the cost of cache misses (at various levels in a multi-level cache). (If one wishes to be more theoretical and abstract, the software concept could be viewed as a caching level drawn from reality by market research and high-level development and source code is a caching of the programmer effort translating the software concept.)

Mutability and persistence of information are also considerations. Some information is considered highly mutable and is only cached locally in the processor; branch predictors and cache replacement information are common examples. Yet profile information is considered useful for software-managed optimization, implying that earlier optimization lacked this information and it is generic or that the information has limited temporal (e.g., processing one dataset) or spatial (e.g., user) persistence. Another mutability that has been explored is microarchitecture-specific and machine-specific optimization; a machine executable can be optimized for one microarchitecture and a re-optimization for the actual microarchitecture in use might provide other benefits. One could imagine different optimization goals also generating different end-formats; the relative value of different resources (time-to-solution [worst, average, good-enough fraction, variability, etc.], energy, power, memory bandwidth, etc.) can vary among users and time.

Just as bug-reporting is a common up-stack information transmission, one could imagine performance and usage patterns being useful more broadly than a local system. As with bug-reporting, privacy issues exist. As with misfeature-reporting (e.g., observing user interface activity that hints at misunderstanding or confusion), logging and sending performance information could be more expensive than useful — and the utility likely correlates with software maturity.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
What are your ideas for a radically different CPU ISA + physical Arch?Moritz2021/03/20 05:21 AM
  What are your ideas for a radically different CPU ISA + physical Arch?Stanislav Shwartsman2021/03/20 06:22 AM
    I like the analysis of current arch presentedMoritz2021/03/20 10:13 AM
    Did you read this old article?Michael S2021/03/21 02:12 AM
  Deliver programs in IRHugo Décharnes2021/03/20 07:34 AM
    Java bytecode and Wasm exist, why invent something else? (NT)Foo_2021/03/20 08:01 AM
      Java bytecode and Wasm exist, why invent something else?Hugo Décharnes2021/03/20 08:55 AM
        Java bytecode and Wasm exist, why invent something else?Foo_2021/03/20 10:50 AM
          Java bytecode and Wasm exist, why invent something else?Hugo Décharnes2021/03/20 12:40 PM
            Java bytecode and Wasm exist, why invent something else?Foo_2021/03/20 04:54 PM
              It's called source code, no?anonymou52021/03/21 12:43 AM
                It's called source code, no?Foo_2021/03/21 05:07 AM
                Thoughts on software distribution formatsPaul A. Clayton2021/03/22 01:45 PM
    Deliver programs in IRJames2021/03/20 11:24 AM
      Deliver programs in IRHugo Décharnes2021/03/20 12:28 PM
        Deliver programs in IRHugo Décharnes2021/03/20 12:36 PM
    Deliver programs in IRLinus Torvalds2021/03/20 01:20 PM
      Deliver programs in IRHugo Décharnes2021/03/20 01:51 PM
      I'd like to be able to NOT specify order for some things ...Mark Roulo2021/03/20 05:49 PM
        I'd like to be able to NOT specify order for some things ...Jukka Larja2021/03/21 12:26 AM
          NOT (unintentionally) specify orderMoritz2021/03/21 06:00 AM
            NOT (unintentionally) specify orderJukka Larja2021/03/22 07:11 AM
              NOT (unintentionally) specify orderMoritz2021/03/22 12:40 PM
                NOT (unintentionally) specify orderJukka Larja2021/03/23 06:26 AM
          I'd like to be able to NOT specify order for some things ...Mark Roulo2021/03/21 09:47 AM
            I'd like to be able to NOT specify order for some things ...Victor Alander2021/03/21 05:14 PM
      Next architecture will start with MLwumpus2021/03/21 12:24 PM
        Next architecture will start with MLLinus Torvalds2021/03/21 02:38 PM
          Maybe SQL was the better example for general purpose machineswumpus2021/03/22 08:33 AM
            Maybe SQL was the better example for general purpose machinesanon2021/03/22 09:10 AM
        Next architecture will start with MLML will move to PIM2021/03/22 03:51 AM
    Deliver programs in IRanon2021/03/21 03:22 AM
      Deliver programs in IRanon22021/03/21 04:52 AM
        Deliver programs in IRrwessel2021/03/21 05:05 AM
          Deliver programs in IRanon22021/03/21 07:08 PM
            Deliver programs in IRrwessel2021/03/21 10:47 PM
              Deliver programs in IRdmcq2021/03/22 04:33 AM
                Deliver programs in IRrwessel2021/03/22 06:27 AM
  What are your ideas for a radically different CPU ISA + physical Arch?Veedrac2021/03/20 11:27 AM
    Cray MTAanon2021/03/20 06:04 PM
      Cray MTAChester2021/03/20 07:54 PM
        Cray MTAVeedrac2021/03/21 01:33 AM
          Cray MTAnoone2021/03/21 09:15 AM
            Cray MTAVeedrac2021/03/21 10:54 AM
    monolithic 3Dwumpus2021/03/21 12:50 PM
  What are your ideas for a radically different CPU ISA + physical Arch?Anon2021/03/21 12:06 AM
  What are your ideas for a radically different CPU ISA + physical Arch?rwessel2021/03/21 05:02 AM
  What are your ideas for a radically different CPU ISA + physical Arch?juanrga2021/03/21 05:46 AM
  Summery so farMoritz2021/03/21 09:45 AM
    Summery so farrwessel2021/03/21 11:23 AM
      not staticMoritz2021/03/26 10:12 AM
        Dynamic meta instruction encoding for instruction window compressionMoritz2021/03/28 03:28 AM
          redistributing the work between static compiler, dynamic compiler, CPUMoritz2021/04/05 03:21 AM
            redistributing the work between static compiler, dynamic compiler, CPUdmcq2021/04/05 09:27 AM
    Summery so farAnon2021/03/21 08:53 PM
  What are your ideas for a radically different CPU ISA + physical Arch?blaine2021/03/21 10:10 AM
    What are your ideas for a radically different CPU ISA + physical Arch?rwessel2021/03/21 11:26 AM
      What are your ideas for a radically different CPU ISA + physical Arch?rwessel2021/03/21 11:34 AM
        What are your ideas for a radically different CPU ISA + physical Arch?blaine2021/03/21 12:55 PM
          What are your ideas for a radically different CPU ISA + physical Arch?rwessel2021/03/21 01:31 PM
      What are your ideas for a radically different CPU ISA + physical Arch?gallier22021/03/22 12:49 AM
  What are your ideas for a radically different CPU ISA + physical Arch?dmcq2021/03/21 03:50 PM
  Microthread/low IPCEtienne Lorrain2021/03/22 03:22 AM
    Microthread/low IPCdmcq2021/03/22 04:24 AM
      Microthread/low IPCEtienne Lorrain2021/03/22 06:10 AM
        Microthread/low IPCdmcq2021/03/22 08:24 AM
    Microthread/low IPCdmcq2021/03/22 04:53 AM
      Microthread/low IPCEtienne Lorrain2021/03/22 05:46 AM
      Microthread/low IPCAnon2021/03/22 05:47 AM
    Microthread/low IPCHeikki Kultala2021/03/22 05:47 PM
      Microthread/low IPCEtienne Lorrain2021/03/23 03:36 AM
        Microthread/low IPCNyan2021/03/24 03:00 AM
          Microthread/low IPCEtienne Lorrain2021/03/24 04:23 AM
      Microthread/low IPCAnon2021/03/23 08:16 AM
        Microthread/low IPCgai2021/03/23 09:37 AM
          Microthread/low IPCAnon2021/03/23 10:17 AM
            Microthread/low IPCdmcq2021/03/23 12:42 PM
  Have you looked at "The Mill CPU" project? (nt)Anon C2021/03/22 06:21 AM
    Have you looked at "The Mill CPU" project? (nt)Moritz2021/03/22 12:13 PM
      Have you looked at "The Mill CPU" project? (nt)Andrew Clough2021/03/22 04:27 PM
        The Mill = vaporwareRichardC2021/03/23 12:47 PM
          The Mill = vaporwareMichael S2021/03/23 01:58 PM
          The Mill = vaporwareCarson2021/03/23 06:17 PM
          The Mill = doomed but interestingAndrew Clough2021/03/24 08:06 AM
            Solution in search of a problemwumpus2021/03/24 08:52 AM
              Solution in search of a problemdmcq2021/03/24 10:22 AM
          never-ware != vaporware (at least in connotation)Paul A. Clayton2021/03/24 10:37 AM
  What are your ideas for a radically different CPU ISA + physical Arch?anonini2021/03/22 08:28 AM
    microcode that can combine instructionMoritz2021/03/22 12:26 PM
  What are your ideas for a radically different CPU ISA + physical Arch?anony2021/03/22 10:16 AM
    Totally clueless.Heikki Kultala2021/03/22 05:53 PM
  Hierarchical instruction setHeikki Kultala2021/03/22 06:52 PM
    Hierarchical instruction setVeedrac2021/03/23 03:49 AM
      Hierarchical instruction setHeikki Kultala2021/03/23 06:46 AM
        Hierarchical instruction setEtienne Lorrain2021/03/23 07:16 AM
          microthreads on OS call/exceptionHeikki Kultala2021/03/23 07:34 AM
        Hierarchical instruction setVeedrac2021/03/23 09:31 AM
          Hierarchical instruction setEtienne Lorrain2021/03/24 01:13 AM
            Hierarchical instruction setVeedrac2021/03/24 07:11 AM
    Hierarchical instruction setAnon2021/03/23 08:39 AM
  What are your ideas for a radically different CPU ISA + physical Arch?Paul A. Clayton2021/03/26 08:21 AM
    What are your ideas for a radically different CPU ISA + physical Arch?wumpus2021/03/26 09:45 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊