Discussion:
[ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
Felix Fietkau
2016-07-06 13:23:49 UTC
Permalink
Thanks for unconfusing me a couple weeks ago, and cluing me into how
the limit on ->pending_frames is not really relevant for the data
packets that go through the new intermediate queues.
But I'm not sure if this would allow us to remove the limit on
pending_frames because even though normal data packets would not
normally build up that many packets, there are other packet types
which bypass the intermediate queues and are transmitted directly
(also in most cases bypassing the ath9k internal queues in the way
ath9k worked before we patched it to use the mac80211 intermediate
queues).
Yes, but, well, since they're not queued they are not going to overflow
anything. The aggregation building logic stops at two queued aggregates,
so the default limit of 123 packets is never going to be hit when the
queue is moved into the mac80211 layer. So keeping the knobs around only
helps people who purposefully want to cripple their ability to do
aggregation; and it won't be doing what it promises (limiting qlen),
since that is now moved out of the driver. So IMO, removing the knobs is
the right thing to do. I have already updated my patch to do so, which
I'll send as a v2 once the other bits are resolved.
I agree.
Channel context based queue handling can be dealt with by
stopping/starting relevant queues on channel context changes.
But I don't see how to handle the case here where packets get passed
to the driver with ath_tx_start() and wind up in the scenario above.
Well, presumably the upper layers won't try to transmit anything through
the old TX path if the start/stop logic is implemented properly. The
chanctx code already seems to call the ieee80211_{start,stop}_queue()
functions when changing context, so not sure what else is needed. Guess
I'll go see if I can provoke an actual triggering of the bug, unless
Felix elaborates on what he means before I get around to it (poke,
Felix? :)).
Then I guess the logic in ath_tx_start was a leftover from a time before
some queue related rework happened to the chanctx code.
In that case you can simply remove the chanctx related software queueing
stuff from ath_tx_start.

- Felix
Kalle Valo
2016-07-08 14:26:54 UTC
Permalink
This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.
The old code path in ath_tx_start that would queue packets has been
removed completely, as has the qlen limit tunables (since there's no
longer a queue in the driver to limit).
Based on Tim's original patch set, but reworked quite thoroughly.
Nice work. Because this is such a significant change, and to maximise testing
time, I'm planning to queue this for 4.9 (so I would apply this to ath-next in
3-4 weeks after the merge window closes). But anyone who wants to test this can
use master-pending branch from my ath.git tree (uses wireless-testing as the
baseline). Sounds good?

Testing and review feedback very welcome!
--
Sent by pwcli
https://patchwork.kernel.org/patch/9216993/
Felix Fietkau
2016-07-08 16:10:11 UTC
Permalink
Post by Kalle Valo
This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.
The old code path in ath_tx_start that would queue packets has been
removed completely, as has the qlen limit tunables (since there's no
longer a queue in the driver to limit).
Based on Tim's original patch set, but reworked quite thoroughly.
Nice work.
Thanks :)
Post by Kalle Valo
Because this is such a significant change, and to maximise testing
time, I'm planning to queue this for 4.9 (so I would apply this to
ath-next in 3-4 weeks after the merge window closes). But anyone who
wants to test this can use master-pending branch from my ath.git tree
(uses wireless-testing as the baseline). Sounds good?
Sounds good to me. I'm planning on backporting this and Michael's
mac80211 FQ-CoDel patches to 4.4 and post them for inclusion in LEDE.
Hopefully that will get it some more testing as well.
I've pushed a backport of this into my LEDE staging tree:
https://git.lede-project.org/?p=lede/nbd/staging.git;a=summary

I don't have time for testing it myself at the moment, but I'll try to
get some people to do so.

- Felix
Felix Fietkau
2016-07-08 16:31:11 UTC
Permalink
Post by Felix Fietkau
Post by Kalle Valo
This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.
The old code path in ath_tx_start that would queue packets has been
removed completely, as has the qlen limit tunables (since there's no
longer a queue in the driver to limit).
Based on Tim's original patch set, but reworked quite thoroughly.
Nice work.
Thanks :)
Post by Kalle Valo
Because this is such a significant change, and to maximise testing
time, I'm planning to queue this for 4.9 (so I would apply this to
ath-next in 3-4 weeks after the merge window closes). But anyone who
wants to test this can use master-pending branch from my ath.git tree
(uses wireless-testing as the baseline). Sounds good?
Sounds good to me. I'm planning on backporting this and Michael's
mac80211 FQ-CoDel patches to 4.4 and post them for inclusion in LEDE.
Hopefully that will get it some more testing as well.
https://git.lede-project.org/?p=lede/nbd/staging.git;a=summary
Awesome! What about the FQ-CoDel mac80211 patches themselves? I have a
tree where I've separated out the needed patches and rebased them on
mainline 4.4.9. Can I post that somewhere (or just email you the series)
and get you to include those as well? Or do I just dump the patch files
into the LEDE patches dir and send that as a patch to LEDE? (I see your
patch also refreshed subsequent patches; is there a script to do that
automatically?)
You don't need to do anything here. LEDE does not use mac80211 and
drivers from the kernel tree, it's built using backports.
It's currently using a backports snapshot that I built myself from
wireless-testing 2016-06-20, which already includes FQ-Codel.

- Felix
Kalle Valo
2016-08-22 15:43:54 UTC
Permalink
This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.
The old code path in ath_tx_start that would queue packets has been
removed completely, as has the qlen limit tunables (since there's no
longer a queue in the driver to limit).
Based on Tim's original patch set, but reworked quite thoroughly.
---
- Correctly notify mac80211 when there are packets in the retry queue
on powersave start/stop.
- Get rid of ath_tx_aggr_resume().
- Some readability changes and additional WARN_ON/BUG_ON in
appropriate places.
This is great work but due to the regressions I'm not sure if this will
be ready for 4.9. To get more testing time I wonder if we should wait
for 4.10? IMHO applying this in the end of the cycle is too risky and we
should try to maximise the time linux-next by applying this just after
-rc1 is released.

Thoughts?
--
Kalle Valo
Kalle Valo
2016-08-22 17:02:16 UTC
Permalink
Post by Kalle Valo
This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.
The old code path in ath_tx_start that would queue packets has been
removed completely, as has the qlen limit tunables (since there's no
longer a queue in the driver to limit).
Based on Tim's original patch set, but reworked quite thoroughly.
---
- Correctly notify mac80211 when there are packets in the retry queue
on powersave start/stop.
- Get rid of ath_tx_aggr_resume().
- Some readability changes and additional WARN_ON/BUG_ON in
appropriate places.
This is great work but due to the regressions I'm not sure if this
will be ready for 4.9. To get more testing time I wonder if we should
wait for 4.10? IMHO applying this in the end of the cycle is too risky
and we should try to maximise the time linux-next by applying this
just after -rc1 is released.
Thoughts?
Well, now that we understand what is causing the throughput regressions,
fixing them should be fairly straight forward (yeah, famous last words,
but still...). I already have a patch for the fast path and will go poke
at the slow path next. It'll probably require another workaround or two,
so I guess it won't be the architecturally clean ideal solution; but it
would make it possible to have something that works for 4.9 and then
iterate for a cleaner design for 4.10.
But if we try to rush this to 4.9 it won't be in linux-next for long. We
are now in -rc3 and let's say that the patches are ready to apply in two
weeks. That would leave us only two weeks of -next time before the merge
window, which I think is not enough for a controversial patch like this
one. There might be other bugs lurking which haven't been found yet.
--
Kalle Valo
Kalle Valo
2016-08-23 06:59:42 UTC
Permalink
Post by Kalle Valo
Post by Kalle Valo
This is great work but due to the regressions I'm not sure if this
will be ready for 4.9. To get more testing time I wonder if we should
wait for 4.10? IMHO applying this in the end of the cycle is too risky
and we should try to maximise the time linux-next by applying this
just after -rc1 is released.
Thoughts?
Well, now that we understand what is causing the throughput regressions,
fixing them should be fairly straight forward (yeah, famous last words,
but still...). I already have a patch for the fast path and will go poke
at the slow path next. It'll probably require another workaround or two,
so I guess it won't be the architecturally clean ideal solution; but it
would make it possible to have something that works for 4.9 and then
iterate for a cleaner design for 4.10.
But if we try to rush this to 4.9 it won't be in linux-next for long. We
are now in -rc3 and let's say that the patches are ready to apply in two
weeks. That would leave us only two weeks of -next time before the merge
window, which I think is not enough for a controversial patch like this
one. There might be other bugs lurking which haven't been found yet.
What, other hidden bugs? Unpossible! :)
Yeah, right ;)
Would it be possible to merge the partial solution (which is ready now,
basically) and fix the slow path in a separate patch later?
What do you mean with partial solution? You mean ath9k users would
suffer from regressions until they are fixed? We can't do that.
(Just spit-balling here; I'm still fairly new to this process. But I am
concerned that we'll hit a catch-22 where we can't get wider testing
before it's "ready" and we can't prove that it's "ready" until we've had
wider testing...)
I understand your point, but I don't want to rush this to 4.9 and then
start getting lots of bug reports and eventually forced to revert it. If
we just found a new serious regression the chances are that there are
more lurking somewhere and this patch is just not ready yet.
--
Kalle Valo
Kalle Valo
2016-10-05 15:50:52 UTC
Permalink
Post by Kalle Valo
I understand your point, but I don't want to rush this to 4.9 and then
start getting lots of bug reports and eventually forced to revert it. If
we just found a new serious regression the chances are that there are
more lurking somewhere and this patch is just not ready yet.
So, the changes to mac80211 that fixes the known regressions of this
patch have gone in.
I guess you mean this commit:

bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue

(Just making sure that I have the same commit in my tree when I apply this)
Any chance of seeing this merged during the current merge window? :)
I sent last new feature ("-next") patches for 4.9 last week, sorry. So
this has to wait for 4.10.

And I assume I need to take v5:

https://patchwork.kernel.org/patch/9311037/
--
Kalle Valo
Kalle Valo
2016-10-05 17:54:27 UTC
Permalink
Post by Kalle Valo
Post by Kalle Valo
I understand your point, but I don't want to rush this to 4.9 and then
start getting lots of bug reports and eventually forced to revert it. If
we just found a new serious regression the chances are that there are
more lurking somewhere and this patch is just not ready yet.
So, the changes to mac80211 that fixes the known regressions of this
patch have gone in.
bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue
(Just making sure that I have the same commit in my tree when I apply this)
Yup, that's the one :)
Post by Kalle Valo
Any chance of seeing this merged during the current merge window? :)
I sent last new feature ("-next") patches for 4.9 last week, sorry. So
this has to wait for 4.10.
Ah, right, I think I got my merge windows confused. You already said you
wouldn't take it for 4.9. So I guess what I'm asking is for you to put
it into the appropriate -next tree so it can get some wider exposure
ahead of the *next* merge window...
Yeah, we have plenty of time for 4.10 :) So my plan is to apply this
after I open wireless-drivers-next in 2-3 weeks or so. That would mean
that the patch would hit Linus' tree when 4.10-rc1 is released
(estimated to happen on 2017-01-01). The timing is actually perfect as
now we get maximal testing time on -next.
Post by Kalle Valo
https://patchwork.kernel.org/patch/9311037/
Yes. Haven't noticed anything that changed since that might conflict
with it, but let me know if I missed something and you want a refreshed
version.
Thanks, I'll let you know if there are any problems.
--
Kalle Valo
Loading...