Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use heuristics to show why solving is taking a long time #17429

Closed
DartBot opened this issue Mar 12, 2014 · 20 comments
Closed

Use heuristics to show why solving is taking a long time #17429

DartBot opened this issue Mar 12, 2014 · 20 comments
Assignees

Comments

@DartBot
Copy link

DartBot commented Mar 12, 2014

This issue was originally filed by @tomaskulich


What steps will reproduce the problem?

run pub get on attached pubspec.yaml

depending, on which machine I try it, it gives me:

I believe, these symptoms are connected: If I comment out half of my pubspec, pub get runs correctly; after this partial pub get, the whole pub get runs fine. However, since my pubspec.yaml is not so bug, I would expect pub behaving more correctly.

I'm using Dart VM version: 1.2.0 (bug is not closely related to this version) on ArchLinux

Brief googling showed several bugs similar to this, I'm not sure, whether it is duplicate of some other bug already reported; I'm sorry if this is so.


Attachments:
pubspec.yaml (625 Bytes)
pubtrace.zip (0 Bytes)

@DartBot
Copy link
Author

DartBot commented Mar 12, 2014

This comment was originally written by @tomaskulich


oops, the attachment file is empty, I'll upload it once more.


Attachment:
pubtrace.zip (179.34 KB)

@DartBot
Copy link
Author

DartBot commented Mar 12, 2014

This comment was originally written by @tomaskulich


Update: I just read http://news.dartlang.org/2013/04/pubs-new-constraint-solver.html and I'm afraid I understand, what is going on.

Some naive coder produced a (in all other terms, fine quality) package on which I depend, however, he put too restricting version-constraint on some of his dependencies. Now it's hard to find a fit for all these dependencies, and pub instead of shouting this problem all over the place starts to

wait for it

BACKTRACKING THROUGH A WHOLE UNIVERSE OF ALL POSSIBLE COMBINATIONS OF PACKAGE VERSIONS in a desperate attempt to heroically fulfill THE constrains.

Dear Dart creators, most of you probably have CS degrees, so you should understand HOW VAST THIS UNIVERSE IS!

If you really want to help the users, try something like this:
Do some random-dependency-picking (or maybe even some sort of simulated annealing) for FIXED amount of time and FIXED memory usage (does pub memory leak?); collect heuristic information about possible problems (too restrictive dependencies between packages;) and then, eventually, REPORT A PROBLEM!

It's for no use, if pub keeps heroically eating crazy amounts of my time and my RAM.

@munificent
Copy link
Member

This is an unrelated issue. Can you repro this reliably? If so, can you give me some details on your configuration?

  • or, the memory is exhausted. (see zipped output from pub --verbose get)

This probably happens in verbose mode, but not in non-verbose.

However, since my pubspec.yaml is not so bug, I would expect pub behaving more correctly.

The size of your pubspec means very little in the grand scheme of things. It may have only one dependency, but if that dependency in turn has a ton, then you're going to slurp in the world. What matters is the size of your transitive dependency graph.

BACKTRACKING THROUGH A WHOLE UNIVERSE OF ALL POSSIBLE COMBINATIONS OF PACKAGE VERSIONS in a desperate attempt to heroically fulfill THE constrains.

On my machine, it does that successfully and finds a solution in 1m30s. Still slower than I'd like, but some pathological package graphs are bound to be slow.

most of you probably have CS degrees, so you should understand HOW VAST THIS UNIVERSE IS!

I actually don't have a CS degree, but I am aware that constraint solving is NP-complete. In fact, you can generate a set of pub packages that model any 3SAT problem.

Do some random-dependency-picking (or maybe even some sort of simulated annealing)

We can't do that because it would then lead to non-deterministic results. Running pub upgrade shouldn't be like pulling the lever on a slot machine!

for FIXED amount of time

If a user wants to limit the time pub spends, they can always just Ctrl-C if it takes too long. If we add an artificial limit, all that means is that we'll fail for some users who would otherwise be patient enough to wait.

We discussed this, but adding a limit seems to be strictly worse than not having one.

FIXED memory usage (does pub memory leak?);

It doesn't have any leaks that we're aware of. But note that if you run it in verbose mode, it ends up using quite a bit of memory for the log transcript. There are some optimizations we could do there.

collect heuristic information about possible problems (too restrictive dependencies between packages;) and then, eventually, REPORT A PROBLEM!

This is a great idea. I'll leave this bug open for that (and update the description).

In your case, the problem is that clean_backend depends on an old version of path. Path is a widely shared dependency, so constraining that pulls down the entire package graph.

In general, to have a healthier package ecosystem where constraint solving works well, the low level packages that everyone uses need to be pretty stable. This is part of the reason we pushed path to 1.0.0: when it churns, it causes everyone pain.

In this case, I can see that most common disjoint constraint that the solver had to backtrack on was from path, so a simple heuristic to it keep a count of how often we hit a disjoint constraint for each package. We could then either report that to the user or possibly take that into account when backtracking and try to route around it.


Added Triaged label.
Changed the title to: "Use heuristics to show why solving is taking a long time".

@DartBot
Copy link
Author

DartBot commented Mar 12, 2014

This comment was originally written by @tomaskulich


REPRODUCIBILITY:

the pubspec.yaml I originally posted works for me now (it took about scary 10min). It is probably caused by the fact, I played with some of the packages and accidentaly make the task easier.

I'm posting another version of pubspec.yaml that now consistently fails (on my architecture). It is real-world pubspec.yaml, it's not crafted to exploit the weakness of pub, still, it fails.

However, even if you can find the solution in 1.5mins, since the time-complexity of the problem is exponential, this should scare you.

"Connection closed before full header was received" error is much more common then memory exhaustion. However, memory exhaustion happens also without verbose mode. Since the backtracking should has quite flat memory consumption, idea of memory leak came to my mind. However, I can not reproduce it now.

TECHNICAL PARAMETERS:

My machine is Lenovo T410, 8GB RAM, Archlinux with i3 WM, Dart VM 1.2.0. However, this happens to everybody, it really is not issue related to my machine.

WHAT SHOULD DEFAULT BEHAVIOR OF PUB BE:

if you hit ctrl-c, you do not know anything about the cause, why the dependency fetching is so slow! Therefore, I believe, pub should behave as follows:

  1. default timeout should exist and should be less than 10 mins
  2. in the case of timeout, user needs to see some informative message
  3. timeout should be configurable

WHY THIS BOTHERS ME SO MUCH:

last few days were hell for us: whole our company is dealing with pub! No one is able to develop properly, no one is able to deploy! Mac, Archlinux, Debian, Ubuntu, everything run on some fine hardware - no one was able even to fetch dependencies without painfull messing with pubspec!

besides my company's productivity, there is also another reason why this bothers me: I want Dart becoming the future of web development - and this is very painfull obstacle for that goal.

MY IDEA FOR IMPROVEMENT, RANDOMNESS AND EVERYTHING:

It shouldn't be taken so literally. You can implement the exact ideas that I wrote deterministically, you just use the pseudo-randomness contained in pubspec itself. The better you do it, the less WTF moments your users get (the main WTF of getting different results from two pub upgrades running one after another vanishes instantly. Some other WTFs can be eliminate as well. You cannot eliminate all WTFs, but you can make it good enough). Several other improvements exists.

However, the main message is: Please help us, we are stacked, workarounds are complicated, pub - the Dart's cool feature - is making our work miserable.


Attachment:
pubspec.yaml (578 Bytes)

@DartBot
Copy link
Author

DartBot commented Mar 13, 2014

This comment was originally written by @tomaskulich


Did you succeeded in reproducing the bug? If this is problem for you, I can grant you access to one of my Amazon AWS machines, it fails very consistently there.

@lrhn
Copy link
Member

lrhn commented Mar 14, 2014

Added Area-Pub label.

@nex3
Copy link
Member

nex3 commented Mar 14, 2014

This is an unrelated issue. Can you repro this reliably? If so, can you give me some details on your configuration?

This is actually probably not unrelated; we've seen connection errors before on pathological graphs. It's often caused by making too many requests to the server too quickly.

WHAT SHOULD DEFAULT BEHAVIOR OF PUB BE

I like Bob's suggestion of printing this information automatically if version solving is taking a long time. That said, even if we're pretty confident something pathological is going on, it's non-trivial to figure out the specifics. If it were easy, we'd use that to just do version resolution quickly.

It's probably possible to come up with some way of saying "it's likely that this resolution is taking a long time due to constraints on package X", but doing that well will take a considerable amount of work that we're unlikely to be able to spare at the moment. If you want to take a stab at it, please feel free.

WHY THIS BOTHERS ME SO MUCH

I think you may be overstating the severity of the issue here. As far as we've seen, these pathological problems only come up when there are substantial version clashes in a dependency graph. It should be fixable without too much trouble by submitting a few patches to upgrade third-party packages, or even forking them yourself.

MY IDEA FOR IMPROVEMENT, RANDOMNESS AND EVERYTHING

Seeding the randomness from the pubspec doesn't help much. It still trains users that if their version resolution is failing, they can re-order their dependencies (or whatever) and try again. This is not something we want to encourage our users to do.

@nex3
Copy link
Member

nex3 commented Apr 30, 2014

Issue #18543 has been merged into this issue.


cc @keertip.

@munificent
Copy link
Member

Removed Type-Defect, Priority-Unassigned labels.
Added Type-Enhancement, Priority-Medium labels.

@nex3
Copy link
Member

nex3 commented Oct 14, 2014

Issue #21259 has been merged into this issue.

@nex3
Copy link
Member

nex3 commented Oct 14, 2014

Issue #21325 has been merged into this issue.


cc @munificent.
cc @alan-knight.
cc @ricowind.
cc @whesse.
cc @kasperl.

@alan-knight
Copy link
Contributor

I have a theory that the connection issue is that the solver goes away into exponential land without doing any network activity long enough for the network connection to time out. I do see that the progress indicator display stutters when solving a complicated problem like this, so if it stutters long enough....

@kasperl
Copy link

kasperl commented Oct 15, 2014

In my book, this is a pretty serious usability issue reported by multiple people. Issue #21259 and issue #21325 contain reproductions.


Added this to the 1.8 milestone.
Removed Type-Enhancement, Priority-Medium labels.
Added Type-Defect, Priority-High labels.

@nex3
Copy link
Member

nex3 commented Oct 17, 2014

Issue #21340 has been merged into this issue.

1 similar comment
@nex3
Copy link
Member

nex3 commented Oct 18, 2014

Issue #21340 has been merged into this issue.

@nex3
Copy link
Member

nex3 commented Nov 7, 2014

Issue #21522 has been merged into this issue.

@ricowind
Copy link
Contributor

This is marked priority high and milestone 1.8, could we get an owner?

@munificent
Copy link
Member

I know this is an annoyance to users when it occurs, but we don't have immediate plans to try to solve it in pub itself. The simplest fix is and has been to address this at the ecosystem level: make sure packages have correct version ranges and try not to rev breaking changes in popular packages too frequently.


Set owner to @munificent.
Removed this from the 1.8 milestone.
Removed Priority-High label.
Added Priority-Medium label.

@nex3
Copy link
Member

nex3 commented Jan 29, 2015

Issue #22189 has been merged into this issue.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

This issue has been moved to dart-lang/pub#912.

@DartBot DartBot closed this as completed Jun 5, 2015
@kevmoo kevmoo removed the triaged label Mar 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants