1. extract the queries from the log first, then put in the hashmap, get the top 100
2. what is limitation of hashmap?
might be too limited if the queries are too many. in memory operation
3. what would you do instead?
add more CPUs, distributed into multiple computers
4. what if the file is still too large?
divide the file in chunks, say each file only get queries starting with A.
No comments:
Post a Comment