Developing a Full Discretionary Trading System from Scratch.

Manual or discretionary trading still has its place amongs traders. Even with the development and explosion of algorithms, discretionary traders still exist and will not cease to exist. We are often…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Towards a shared AI safety portfolio prioritization framework

Summary

Intro and need for a framework

While becoming more familiar with the field of AI safety research, I noticed a recurring theme: Be it in work on practical, i.e. technical research, strategic or policy research, or close versus far future, I could not find a framework that could easily help sort any undertaken research into an order that would highlight the value levers attached. In my view, such a framework should highlight different failure modes and scenarios according to their impact and, combined with our ability to affect the risks, guide the community’s work. In addition, it should improve transparency within the safety research community on ongoing efforts and enable improved coordination. It may also be used as a guideline to finding double-cruxes and uncover disagreement on research prioritization more quickly.

The framework

The framework follows the importance — neglectedness — solvability approach, adapted by integrating neglectedness and solvability into “marginal impact”. Instead of “importance”, a priority score is used, including urgency and counterfactual development.

This framework is scope insensitive in such a way that it can be used to both evaluate complete problem fields, individual problems, and individual solutions — however only on a shared level of abstraction, i.e. it would not make sense to plot technical solutions with high level problem fields. I would propose to start by mapping problem fields, then move down to individual problems, then to individual solutions. This framework only related to AI Safety actors. Actors focussed on AI performance are only insofar represented as their concern for safety is instrumental to their performance goal, i.e. underlying the projected ability to control risk in 2a).

I believe three dimensions to drive the importance of a failure mode. To get a “Priority score” for the scale of the problem, multiply 1 and 2, and 3.

Secondly, I propose applying a “Marginal impact factor”, consisting of

Combined, these factors give both individual actors (researchers, strategists, developers, policy-makers, etc.) and the AI safety community (coordinators, meta-orgs) a way to prioritize their work. “Marginal impact score” within the community decision mechanism refers to the accumulated capability of all AI Safety actors.

Inclusion of scenario thinking

Note that AGI arrival scenarios are all underlying these judgements — not only in the urgency factor, but also e.g. in the projected ability to control risks without any intervention of the AI safety community and the evaluation of projected solutions. For example, an option to improve intervention ability in a value alignment failure mode may include extending the “time to react”, e.g. by boxing any AGI for prolonged periods of time for security-testing. An AGI performance race or a hard take-off scenario (or both) would strongly reduce the value of that option.

An alternative way to include scenarios is to spell them out on n additional axes, turning this framework from a two to a 2+n dimensional one. Such a choice however comes with technical difficulties:

If adding such an axis removes the probability of each of the scenarios occurring from the priority score/ marginal impact score, one could nicely show how certain problems/ solutions move in their combined importance based on changes in the scenario — however removing the probability of each of the scenarios from the judgement. This is obviously not ideal, as extremely low probability scenarios could lead to overall best possible actions. The other extreme would be to keep the probability of each scenario occurring in the scores. That would however defy the purpose of the scenarios, as no change would be visible in the score moving from scenario to scenario. The remaining option I am seeing is to, as firstly stated, accept the scenario as given and attach a probability to each scenario on the axis, not integrating it into the scores. When eventually deciding on the most important actions to take, this probability will be used to judge each best-in-class option. While that is effectively just excluding the additional axes again, including them in the first place can be a good tool for visualizing core mechanisms and enable discussions about core disagreements.

The right dimensions for including scenarios would, as far as I can see, maximise the distance between different problems/ solutions. For example, if you believe future systems will still be reinforcement learning based, you will, unsurprisingly, weigh research into technical improvements to RL models far more heavily than if you think they’ll most certainly be very different.

Practical implications

This framework has implications both for the individual (researcher) and actors aiming to coordinate AI safety efforts. I am aware that these insights are hardly news but might be valuable as a write-up.

Individual level

Firstly, individuals working on AI safety must exchange with peers, both to discover their comparative (research) advantage and to coordinate on optimal research strategies. The primary aim in this sector is to move problems down the priority axis, either through reducing the problem’s priority through gaining clarity on which problems need to be solved (1), building solutions or enabling counterfactual development (2), or by increasing the time/ resources available to solve the problems (3). (3) Might however be a task that can only be undertaken by the community. This framework may help to map disagreements.

Secondly, they should aim to move problems from the top left to the top right sector by building relevant skills as prioritized by the community (which they may well also be a part of). This could also relate to doing foundational research to find some angles of attack for possible solutions, which would move problems to the right.

Community level

Community decision mechanism

Actors trying to assume a coordinating role within the AI safety scene have two top priorities:

1. Enable cooperation: Establish a common understanding in the community about priority scores; support researchers in coordinating/ cooperating in their work (Think BERI)

2. Safety field management: Prevent important problems from being ignored, e.g. by identifying right scenario dimensions; Identify capital (human and financial) gaps and aim to close them; Influence the time available for closing research gaps (think FHI, CSER)

Generally, coordinators should try move as many problems away from the top left sector. They can do so by building more accurate priority scores (meaning some problems will become less relevant and others more) or by actively influencing their priority. Actively influencing them would include an increase in counterfactual development, e.g. by having commercial researchers work on them, or improving the impact of the community (move problems to the right). Again, the scope insensitivity of the framework allows for plotting solutions to “top-left” problems along the axes and identify best actions (top-right) to them.

Problems of this framework

Two main problems exist with this framework. For one, finding shared levels of abstraction might be very complicated/ unfeasible. As the assumption of a shared level of abstraction underlies the framework to be workable, this could be a large problem. I am currently not able to come up with descriptors of how to cluster certain problems on shared levels, so I would be very happy for any ideas here.

Secondly, differing scenarios might require contrasting solutions. If they are equally probable, unintuitive portfolios will arise. This could only be solved through including additional “scenario” dimensions described above.

Last words

I hope this write-up creates some clarity on how to think about clustering and prioritizing AI safety research. Either way, I enjoyed writing this and hope to post some more thoughts in the near future.

IMPORTANT: Any FEEDBACK is highly appreciated, especially since this is my first post! Thanks so much for reading!

Notes regarding priority score:

“Projected ability” is a counterfactual: If the AI safety community did not get active, how high do you think the chances are that a performance-oriented developer might solve it bc/ the solution is instrumental?
Necessarily, any current assessment of failure controls entails a “probability of a failure control failure”: This control failure in itself constitutes a next-level problem that can also be mapped onto the matrix.

Notes regarding marginal impact score:

1a) might include methods like:

1. Increase control over AI:

a. AI directly (value alignment through reward hacking prevention, stop-button problems, etc.)

b. Goal setting agent

2. Increase time to react before or after action is taken

3. Increase transparency of reasoning/ action

4. Reduce probability of control failure (malfunction)

5. Only for interventions: Increase corrigibility, i.e. recover damages after “bad” action

Add a comment

Related posts:

Recharge yourself with positivity and love

Did you know that happiness is the single best word that captures the construct, or concept, of well-being? When we are happy, we experience positive emotions. Ups and downs are inevitable in life…

Music

I wish I knew how to explain the feeling... I'd articulate and expand on the emotions evoked when I endulge on the highest quality of sonic pleasure. The eract hairs at the back of my neck, the…