By: NoSpammer (no.delete@this.spam.com), June 23, 2020 3:04 am
Room: Moderated Discussions
Doug S (foo.delete@this.bar.bar) on June 22, 2020 6:14 pm wrote:
> They would have to, as there would be some applications or at least parts of them (any sort
> of generated or self-modifying code, Linus' example of applications that fiddle with the
> signal stack, and other such niche cases) where static translation isn't feasible.
>
> Though I wonder if a combination of both in the same application is even possible. Maybe it is all or nothing,
> either the static translator is able to do the job but when it can't it goes full JIT for that application.
>
> Maybe they've found a way to slide neatly between the two by maintaining enough "x86 state"
> in the translated binary to let the JIT cut in where necessary, but that sounds like it might
> be MUCH harder than static translation alone, which is already much harder than JIT. There
> are a lot of devils hidden in the details that will be interesting to find out more about.
There used to be tools like Resourcer and a couple others the names of which I cannot remember. They did impressive EXE to ASM conversion, correctly guessing many jump tables and target points. Still, here and there they were wrong, and even not looking at the code including C++ style virtual function tables or libraries doing self modification of code or all the function passing in C style structures that is common in libraries even now. So just recovering all the code entry points is a thing you cannot take for granted.
Then on the first pass you don't really know if any code will be modified in a way obscure to your static analyzer. For indirect jumps you cannot be sure where they will go and whether you have that trace already. So you may want to run an interpreter (and/or some other instrumentation) at quite a few points for the very first pass. Even after you have the first pass correct it's not decidable what you might get in the future, so keep the state around, keep some instrumentation around, keep JIT around. You will need a huge bag of tricks to cover all the bases.
> They would have to, as there would be some applications or at least parts of them (any sort
> of generated or self-modifying code, Linus' example of applications that fiddle with the
> signal stack, and other such niche cases) where static translation isn't feasible.
>
> Though I wonder if a combination of both in the same application is even possible. Maybe it is all or nothing,
> either the static translator is able to do the job but when it can't it goes full JIT for that application.
>
> Maybe they've found a way to slide neatly between the two by maintaining enough "x86 state"
> in the translated binary to let the JIT cut in where necessary, but that sounds like it might
> be MUCH harder than static translation alone, which is already much harder than JIT. There
> are a lot of devils hidden in the details that will be interesting to find out more about.
There used to be tools like Resourcer and a couple others the names of which I cannot remember. They did impressive EXE to ASM conversion, correctly guessing many jump tables and target points. Still, here and there they were wrong, and even not looking at the code including C++ style virtual function tables or libraries doing self modification of code or all the function passing in C style structures that is common in libraries even now. So just recovering all the code entry points is a thing you cannot take for granted.
Then on the first pass you don't really know if any code will be modified in a way obscure to your static analyzer. For indirect jumps you cannot be sure where they will go and whether you have that trace already. So you may want to run an interpreter (and/or some other instrumentation) at quite a few points for the very first pass. Even after you have the first pass correct it's not decidable what you might get in the future, so keep the state around, keep some instrumentation around, keep JIT around. You will need a huge bag of tricks to cover all the bases.