Argonne
Notes
- Discussion
- "OpenMP and automatic parallelization in GCC" describes transformation in gcc
- similar things are taking place in rose.
ToDo
- read paper "Effective src-to-src..." paper.
Think about ...
- Think about leaving out the XOMP-to-GOMP Wrapper.
- This could be interesting in case we use it in dcc in a not known-by-know way. But how this should be done in the middle of compilation. GOMP accessing directly must be at the point where we only have access to GIMPLE Code. And even if we manage to make gimple-to-gimple transformation, we have the problem that gcc at this time does not have any possibility to process GIMPLE.
- If I only use GOMP functions I could link my code to the libgomp from gcc and it would work. This would make a dependence of gcc.
- Go deeper into the thread story: Should one generate pthread code? How about the reapplication of automatic differentiation onto Pthread code. Things as _create() becomes a _join and such things will arrive.
- If one would use a more abstract runtime library one could decide how to implement the parallel regions. Should they be based on pthreads, should they use an OpenMP runtime library as GOMP, Omni, XOMP, etc. Or should they use more than one possibility to split the work. For example it would be possible (at least I believe it is by now) to compile the lib with GOMP and the next time there should be a MPI solution where no threads are being created but processes. There will be some problems because of the different memory and communication things but in a more abstract way it could be a way. It would be interesting if this works since there is more to think of. For example to use a cloud computing solution you could begin a parallel region and the runtime library is responsible to split the work and for sending data into the cloud and getting results back.
Discussion
- xaifBooster should get not the omp pragmas it should get some kind of assertion and dependences that can be made out of the omp pragmas.
- Out of the assertions from the pragmas we could show that these assertions are also valid for the automatic generated adjoint code.
- pragmas -> assertions/dependencies for the forward run -> transform/generate adjoint by transforming the assertions back to omp pragmas that are generated into the reverse run
- 8/4/2010: discussion about further work:
- Feed-in the information about the parallel loop into XAIF. This includes ADIC implementations how to bring the rose SgNode information into XAIF inside of ADIC.
- Then look at the implementation inside of XaifBooster where the AD transformation is taking place. We do not have any information for loops about input variables. We have this information for subroutines. As a shortcut we assume that we only have a subroutine with a parallel loop in it. We can then use this information to decide if there must be a restore during the reverse run in adjoint mode. During the forward sweep we only run the loop in parallel without any taping of values during the execution of iterations. During the reverse run we run the iteration forward and then backward without any parallelism. This could spare a lot of memory.
- Once we have the XaifBooster Transformation we must be able to bring this representation back to C code. Krishna is working on that. Depending what progress he is gonna make the Xaif representation could be the end of my task.
- Keep in mind how we could use the information about the parallel loop for a program analysis. The points mentioned before has nothing to do with any program analysis. But further work should take this into account.