Похожие чаты

Hi, I have following dataframes A, B, C and X. A, B, C

are independent each containing avg 10 millions of rows,  X is common to be used in join with A, B, C.
Steps:
Join A and X, do processing
Join B and X, do processing
Join C and X, do processing
As part of processing, I am doing calculations on columns.

Here,Will multithreading be beneficial?

6 ответов

29 просмотров

No, you probably want multiprocessing.

That probably depends on the processing that you need to do, since computers are fast. I'd recommend you implement the whole thing without any parallelization first, and just see how long it takes to process maybe 10k rows. There's a good chance everything is quick enough that you don't really need anything else. Even if the process takes a few hours, if you don't have to do it regularly or anything you're probably going to save time by just letting it run and doing something else instead of optimizing your code. And should this end up slow, you'll probably need the parts you've written anyways. If you do end up needing parallelization, as someone else pointed out, I think you'd need to use multi-processing, not multi-threading. Python has the Global Interpreter Lock, which in short means that only one thread of Python code can run in one process (with one interpreter) at once. Multiprocessing runs multiple interpreters which can actually do the work spread out on multiple CPU cores in parallel. There's a lot of nuance here: For example, the Python wiki says that some kinds of work that are not run in Python directly (like disk access or calculations in NumPy) are not affected by the GIL, and so can benefit from multi-threading. Also, it might end up that maybe your storage or even the network end up becoming the bottleneck that take up the most time, in which case parallelization won't do much. TLDR: Do a simple, non-parallelized implementation first. If it's too slow, you'll probably benefit more from multi-processing than multi-threading, but both can be worth a try.

Akshay-Dalvi Автор вопроса
Fayaz Khan
No, you probably want multiprocessing.

Can you please share any tutorial for same?

Akshay-Dalvi Автор вопроса
Akshay Dalvi
Can you please share any tutorial for same?

https://docs.python.org/3/library/multiprocessing.html

Akshay-Dalvi Автор вопроса

Похожие вопросы

Обсуждают сегодня

Господа, а что сейчас вообще с рынком труда на делфи происходит? Какова ситуация?
Rꙮman Yankꙮvsky
29
А вообще, что может смущать в самой Julia - бы сказал, что нет единого стандартного подхода по многим моментам, поэтому многое выглядит как "хаки" и произвол. Короче говоря, с...
Viktor G.
2
@Benzenoid can you tell me the easiest, and safest way to bu.y HEX now?
Živa Žena
20
This is a question from my wife who make a fortune with memes 😂😂 About the Migration and Tokens: 1. How will the old tokens be migrated to the new $LGCYX network? What is th...
🍿 °anton°
2
30500 за редактор? )
Владимир
47
а через ESC-код ?
Alexey Kulakov
29
What is the Dex situation? Agora team started with the Pnetwork for their dex which helped them both with integration. It’s completed but as you can see from the Pnetwork ann...
Ben
1
Гайс, вопрос для разносторонее развитых: читаю стрим с юарта, нада выделять с него фреймы с определенной структурой, если ли чо готовое, или долбаться с ринг буффером? нада у...
Vitaly
9
Anyone knows where there are some instructions or discort about failed bridge transactions ?
Jochem
21
@lozuk how do I get my phex copies of my ehex from a atomic wallet, to move to my rabby?
Justfrontin 👀
11
Карта сайта