Похожие чаты

Hi, I have following dataframes A, B, C and X. A, B, C

are independent each containing avg 10 millions of rows,  X is common to be used in join with A, B, C.
Steps:
Join A and X, do processing
Join B and X, do processing
Join C and X, do processing
As part of processing, I am doing calculations on columns.

Here,Will multithreading be beneficial?

6 ответов

26 просмотров

No, you probably want multiprocessing.

That probably depends on the processing that you need to do, since computers are fast. I'd recommend you implement the whole thing without any parallelization first, and just see how long it takes to process maybe 10k rows. There's a good chance everything is quick enough that you don't really need anything else. Even if the process takes a few hours, if you don't have to do it regularly or anything you're probably going to save time by just letting it run and doing something else instead of optimizing your code. And should this end up slow, you'll probably need the parts you've written anyways. If you do end up needing parallelization, as someone else pointed out, I think you'd need to use multi-processing, not multi-threading. Python has the Global Interpreter Lock, which in short means that only one thread of Python code can run in one process (with one interpreter) at once. Multiprocessing runs multiple interpreters which can actually do the work spread out on multiple CPU cores in parallel. There's a lot of nuance here: For example, the Python wiki says that some kinds of work that are not run in Python directly (like disk access or calculations in NumPy) are not affected by the GIL, and so can benefit from multi-threading. Also, it might end up that maybe your storage or even the network end up becoming the bottleneck that take up the most time, in which case parallelization won't do much. TLDR: Do a simple, non-parallelized implementation first. If it's too slow, you'll probably benefit more from multi-processing than multi-threading, but both can be worth a try.

Akshay-Dalvi Автор вопроса
Fayaz Khan
No, you probably want multiprocessing.

Can you please share any tutorial for same?

Akshay-Dalvi Автор вопроса
Akshay Dalvi
Can you please share any tutorial for same?

https://docs.python.org/3/library/multiprocessing.html

Akshay-Dalvi Автор вопроса

Похожие вопросы

Обсуждают сегодня

а через ESC-код ?
Alexey Kulakov
29
30500 за редактор? )
Владимир
47
Чёт не понял, я ж правильной функцией воспользовался чтобы вывести отладочную информацию? но что-то она не ловится
notme
18
У меня есть функция где происходит это: write_bit(buffer, 1); write_bit(buffer, 0); write_bit(buffer, 1); write_bit(buffer, 1); write_bit(buffer, 1); w...
~
13
any reference of this implementation?
BitBuddha
29
Ⓐrtto, [4/23/24 7:02 PM] Please explain more fully how it is not working exactly, and what are the steps you are taking, and what error messages come or what happens. Ⓐrtto, ...
Ezza Kezza
2
sounds like people have lost their kaspa on tradeogre... does this mean tradeogre not trustworthy?
Ezza Kezza
15
Страшнейшая правда про списки ЦБ. С первых дней жизни P2P сферы, молодые человеки, начитавшись законодательной базы и "внутренних" документов, решили, что им противостоит сер...
Foxcool
3
Недавно Google Project Zero нашёл багу в SQLite с помощью LLM, о чём достаточно было шумно в определённых интернетах, которые сопровождались рассказами, что скоро всех "ибешни...
Alex Sherbakov
5
So much speculation in the last week. So much volatility in price. This is because Hedera has a GC that isn't using the network it's governing. Why aren't people asking why a...
Summit Seeker R
9
Карта сайта