Thursday, December 23, 2010

Running Win 7 on a EEE PC

I wondered if an EEEPC can run Windows 7 smoothly so after upgrading the HDD and the RAM I tried it out. Well, it does.

The first thing I had to disable was that damn Window indexing service (called Window Search under Win 7) via services.msc in order to make the computer run faster. As the computer is running from a SSD disk the next thing to disable was the scheduled defrag tasks. 

The Windows Experience Index rating is:
  • CPU - 2.3
  • RAM -  4.5
  • Video card -3
  • Drive - 6.8

With FRESH WINDOWS the
  • CPU loading while idle is: 1%
  • Memory utilization (immediately after Win start up): 450MB
  • Boot time: 10 sec

I also disconnected the battery while using the laptop on AC in order to prevent its unnecessary aging.



Upgrading EEEPC 1000 to 2GB RAM, 250 GB SSD and Win 7

I just upgraded  my EEE PC 1000HE to 2GB of RAM (800MHz). It also received a 250GB SSD A-DATA hard drive. The last touch was a brand new and shiny Windows 7 32 bits. Here is how to install Win 7 without a DVD-Rom unit.

How to install Windows 7 - Step by step graphical guide

Place the DVD/Flash stick Into Your Computer & Restart The Computer. Make Sure Your Bios Boot Settings Are Set To Boot From DVD ROM Drive or Flash stick. Once you boot from your DVD you will see this screen:



At this point the Win 7 Setup is running. It will guide you through the process:
























Done

del

Install Windows 7 without a DVD (easy way)

1. Download and install "Windows 7 USB/DVD Download Tool" from Microsoft's web site.  The program has only 2MB. 
Hint: instead of the above mentioned tool you can also use this one.

2.  Get the Windows 7 ISO file

3. Start "Windows 7 USB/DVD Download Tool". It will ask you where is the ISO file. Show it and let it write it on your USB stick. The stick needs to be at least 2.5 GB. All files on it will be deleted. It will take about 3-5 minutes to copy the ISO file on your stick. When done, just take the stick and put it in the computer where Win7 needs to be installed..

4. Boot from your USB stick and start installing Win7. 
Hint: To boot from your USB Flash, usually you need to restart the computer and press the Escape, Delete, Home or F2 (F2 works with Assus EEEPC) key immediately after restart. If it doesn't work, do a Google search to see how you can change boot order for your computer (brand). 

5. Install Windows 7 

-------------------------


There are other ways to write the Win 7 ISO to a stick without using "Windows 7 USB/DVD Download Tool" tool, but it is more complicated: 
http://www.intowindows.com/how-to-install-windows-7vista-from-usb-drive-detailed-100-working-guide/
http://www.blogsdna.com/2016/how-to-install-windows-7-from-usb-drive-without-windows-7-iso-dvd.htm


Sunday, December 19, 2010

Memory management under Window OS

The 4GB limit

In the default Windows configuration, 2 GB of this virtual address space is designated for private use of every process, and the other 2 GB is shared between all processes and the operating system. Typically, applications such as Notepad, Microsoft Office Word, and Adobe Acrobat Reader use only a fraction of the 2 GB of private address space. 

The only way to increase the size of the virtual address space for a process beyond 4 GB is to use 64 bit hardware with a 64 bit version of the operating system and application built for the 64 bit instruction set.
The nub of it is, that no matter how much physical RAM is in the computer, the amount of memory available in the process’ private part of the virtual address space in 32 bit Windows implementations is limited to:

  • 2 GB - without the /3GB switch - this is the normal, default maximum private virtual address space
or
or
  • any physical RAM not used by the OS and other applications by designing the application to use the AWE (Address Windowing Extensions) API. 



How to enable the 3GB support in an application:

“No APIs are required to support application memory tuning. However, it would be ineffective to automatically provide every application with a 3-GB address space. Executables that can use the 3-GB address space are required to have the bit IMAGE_FILE_LARGE_ADDRESS_AWARE set in their image header. If you are the developer of the executable, you can specify a linker flag (/LARGEADDRESSAWARE).
To set this bit, you must use Microsoft Visual Studio Version 6.0 or later and the Editbin.exe utility, which has the ability to modify the image header (/LARGEADDRESSAWARE) flag. For more information on setting this flag, see the Microsoft Visual Studio documentation.”


Memory, Committed Bytes:

This is a measure of the demand for virtual memory. It shows how many bytes have been allocated by processes and to which the operating system has committed a RAM page frame or a page slot in the pagefile (or both).


Process, Working Set, _Total:

The amount of virtual memory in "active" use. It shows how much RAM is required so that the actively used virtual memory for all processes is in RAM.


Paging File, %pagefile in use:

How much of the pagefile is actually being used. This is the counter you should use to determine whether the pagefile is an appropriate size. If this counter reaches 100, the pagefile is completely full and operations stop working. Set the pagefile large enough so that no more than 50 to 75 percent of it is used. If a large part of the pagefile is in use, having more than one pagefile on different physical disks may improve performance. 


Memory, Pages Output/Sec:

This shows how many virtual memory pages were written to the pagefile to free RAM page frames for other purposes each second. This is the best counter to monitor if you suspect that paging is your performance bottleneck. Even if the Committed Bytes value is greater than the installed RAM, a Pages Output/sec value that is low or zero most of the time indicates that there is not a significant performance problem that is caused by not enough RAM.

Wednesday, December 15, 2010

Rent A Coder renamed to VWorker (but still untrustworthy as before)

Even if lots of people complains about Rent A Coder (RentAcoder.com), today I wanted to try it to have some tasks done easily and quickly. So I posted a very high price for a task that can be done in few hours in order to attract many bidders (also called workers). I have also asked for minimum 25% guaranty in order to get only high quality bidders.
The job was very easy so I got many bids for an estimated time of 1 to 5 hours to finish the job.  People were complaining about the workers in poor Arabic counties and India. So, I hired a guy called “RZ Software and Design” to do the job.  Fancy name… right? Later it proved to be just another kid somewhere in the Bangladesh jungle. Also I choose "RZ Software and Design" because he had good reputation - or seemed to have good reputation at that time. He guaranteed to finish the project by the next day. I decided for him also because I wanted to give the worker a realistic time in order to have the job done properly (together with the motivation coming from the high budget offered).

So the day passed and nothing was posted online. No work, no messages, nothing. The worker was completely silent. Finally I decided to contact him over YM. To my great surprise he said that he has some kind of 3 days middle-of-the-week religious holiday and he intend to do the job in 4-5 days. He had no intention to let me know that he was going to delay that long! Instead of giving me the final project he gave me the finger.

Now, I understand why people complained about RAC. I got screwed right from my first project. The main fault here is RentACoder. The rating assigned to each worker is so unrealistic. And there is nothing you can do when one of its coders screw you so bad. RAC allow them to withdraw without giving them a bad grade/review. So the workers receive points when they to a good job but not then they cancel a job. Obviously this is why they ratings are always so high and their behavior so bad. The 20% guaranty I have asked for was also useless in my case.
I hope there is another way to find trusted workers on Rent A Coder web site. Otherwise, I will have to agree with what other are saying about RAC: untrustworthy.

Probably Rent A Coder renamed itself to VWorker.com in order to wash some of the bad reputation it generated, but obviously changing the name is not enough.



Monday, November 1, 2010

"Tpopupmenu not found" error in Delphi while loading a DPK file.

I got recently a nasty “tpopupmenu not found” error while trying to load an old DPK file. I finally figure out that the error was in the DOF file associated with the project. I just deleted the “c:\program files\borland\delphi7\Bin\dclact70.bpl=Borland ActionBar Components” line from [Excluded Packages] section and it worked.

Sunday, October 31, 2010

How to add high-resolution icons to a Delphi 7 application?

This seems to be a tough question for Delphi community. However, recently, after some trials I figured out is not that difficult. So, here is my 1, 2, 3 solution:


1. Prepare the images you want to have them as high res icons. Save each image in a separated “.ICO” file. Use RGB+Alpha color depth. You can use Axialis IconWorkshop Pro 6.5 for this. It is a bit stubborn but it does the job.

2. Open your application RES file in Melander’s ResourceEditor. Go to ‘Icon->MAINICON’. You will see the default Delphi 7 icon there. Right click and say ‘Load variant…’. Navigate and locate your icon files. Load them one by one. Save the RES file back.

3. Open Delphi and compile the project. Done.


Easy right?

-----------------------------------

Resource Hacker 3.5.2 is also a decent resource file viewer. It shows a lot of errors though, when it encounters some ‘home made’ RES files, like the one describe above.






Related article: How difficult is to write a solid software protection (licensing) scheme?


 

Wednesday, October 27, 2010

Microsoft is spying on you

Since Windows Vista and Windows 7 Microsoft started to embrace the "big brother" attitude even harder.

There are a lot of secret software installed by Microsoft on your computer through their magic "Windows Update".


Here is some evidence:

Windows Update updating without permission!
http://cubicspot.blogspot.com/2007/08/windows-update-updating-without.html

True Spyware In Microsoft Windows 7
http://www.techarp.com/showarticle.aspx?artno=670&pgno=1
http://www.techarp.com/showarticle.aspx?artno=670


Microsoft Windows 7 Service Pack 1 Roadmap Rev. 1.3 (more spyware news)
http://www.techarp.com/showarticle.aspx?artno=662&pgno=0


Yet Another Anti-Piracy Update For Windows 7? (Yes more spyware)
http://www.techarp.com/showarticle.aspx?artno=698&pgno=0


Microsoft Silently Rolls Out Anti-Hack Update For Windows 7
http://www.techarp.com/showarticle.aspx?artno=666&pgno=0


The "Black Hole" update - DO NOT INSTALL KB976902
http://cubicspot.blogspot.com/2010/10/kb976902-black-hole-update.html

Wednesday, October 20, 2010

Plimus taking extra charges for transfers from EU

I am really pissed off because Plimus bank applies extra fees to my customers but they will not admit it. I have several cases in which the customer demonstrated (receipt send from bank) that he sent the full amount but only a part of that an amount reaches me.

Just few days ago, I received another email:

I have been trying to buy 2 licenses ($998). Since our institute in EU requires an invoice (a quote is not sufficient) for payment, I have contacted and tried to make the order through Plimus using purchase order. The problem arises in the transfer fees. Our institute will only pay the amount stipulated on the invoice and only pay this sum via a bank transfer. The bank we use waive transfer fees but those charged by the receiving bank are unknown (as they are not on the invoice) and thus the amount received by your company is less than the value of your product.
The Plimus support team cannot/will not make an invoice which include transfer fees and therefore the purchase is blocked. As you are the company using and authorizing Plimus's work, could you please try to think of a solution so I can make the purchase (for instance adding extra charges to an invoice)? In short, I only require an invoice with a sum that once transferred will give me the final keys (licenses) to your program. Needless to say that if no solution can be found then I will have to annul the purchase.

UPDATE: 
I have found even more information about the nasty things Plimus is doing and extra currency conversion fees: http://discuss.joelonsoftware.com/default.asp?biz.5.591223.21
http://discuss.joelonsoftware.com/default.asp?biz.5.589508.16
http://discuss.joelonsoftware.com/default.asp?biz.5.623987.12


This means that Plimus is not charging 15% as they are pretending on their web site but way much more! 

Note:
Those that are no familiar with Plimus, should be informed about other more or less obvious fees that Plimus applies: (up to) $30  fee for wire transfer and $3 manual processing of PO orders. The customer has to pay for it not the vendor but it affects the vendor as basically increases the price with up to $33. There is also a tax for Californian customers.

--------------

Related: http://thesunstroke.blogspot.com/2011/01/software-resellers-how-much-they-really.html

Friday, September 10, 2010

Switch from IDE to AHCI

It looks it is possible to switch from IDE to AHCI after Windows was installed in IDE mode.
All you have to do is to change the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Servic es\msahci\Start key from 3 to 0.
 The details are here: http://ocztechnologyforum.com/forum/showthread.php?69682-Change-from-IDE-to-AHCI-after-Installation

Increase in performance due to AHCI is minor (few percents) while the problems created by this new technology are gigantic (compared with IDE) especially if you use Windows XP!
 Think twice if you want to invest the time to activate AHCI.

How to install Win XP on a computer that already has Win 7?

1. Note: If you already have an empty partition to install XP on it, use it and jump to step 2.
Shrink the existing partition (use Computer Management-> Disk Management) to make room for the second partition. For Windows XP you need minimum 4GB. In the space that you just freed, create the new partition. Format this partition as NTFS.

2. Install Win XP. You will see that your computer will boot automatically to Win XP and you cannot access Win 7. Don’t worry.

3. Insert the Windows 7 Installation CD and restart your PC. Boot from CD.  It will show you a first menu. Select "System Recovery", and then select "Startup Repair". Reboot again and log on to Windows 7. Windows 7 will automatically recognize the Win XP installation/partition. Done.

You have both operations systems now.

Paragon Partition Manager v11 - Free key

Paragon decided to give a free (lite) version of Partition Manager.

Here is the activation key:

PRODUCT KEY:   PSG-161-FRE-PL-9735770
SERIAL NUMBER: 06747-5B5D1-BBFB5-A23B6


____________________

Recommended download:

 

Scareware and other unnecessary software programs

Scareware

Scareware comprises several classes of scam software with malicious payloads, or of limited or no benefit, that are sold to consumers via certain unethical marketing practices. The selling approach uses social engineering to cause shock, anxiety, or the perception of a threat, generally directed at an unsuspecting user.
Some forms of spyware and adware also use scareware tactics.

Scareware and Fake Antivirus

A tactic frequently used by criminals involves convincing users that a virus has infected their computer, then suggesting that they download (and pay for) fake antivirus software to remove it.
Usually the virus is entirely fictional and the software is non-functional or malware itself.
According to the Anti-Phishing Working Group (APWG), the number of scareware packages in circulation rose from 2,850 to 9,287 in the second half of 2008. In the first half of 2009, the APWG identified a 583% increase in scareware programs.
The "scareware" label can also apply to any application or virus (not necessarily sold as above) which pranks users with intent to cause anxiety or panic.


Software that is not good for you

There are also many "companies" that are selling you software which was supposed to fix your registry and magically boost your computer speed. This is usually garbage software which pretends to find and fix errors in your computer but actually it does nothing (good) for you.

Examples:
www.optimize-your-pc.com
www.fix-pc-errors.com
http://registrywizard.com/faq.php
http://www.solvepcerrors.com

We all know DotNet sucks

We all know DotNet sucks. But how bad it sucks?
James S. Gibbons put this to the test. And the results are impressive. DotNet is 'coding horrors' came to life.

I cannot image the person that really chooses to use in interpreted language like DotNet.

Friday, August 27, 2010

Plimus versus Avangate

I got tired with Plimus and theirs nonexistent support so I decided to look for something better. Here are the results of a short survey of mine (the topics below are the most important for me):




  

What happens in the case of chargebacks?

If a customer makes a chargeback, Avangate offers the necessary assistance to solve the incident with the issuing bank free of charge.  Avangate will return the perceived commission in all cases except the case where the chargeback is done due to a customer complaint about the purchased product.




What happens if a client returns the products or requests a refund?

In such cases stipulated by law, the client may return the purchased products and solicit the refund. In these situations, Avangate will assure the transaction reversal, returning the money to the customer, retaining only the commission perceived for the initial payment transaction.  All refunds are done only after prior approval from software author.




How do you deal with VAT?

Avangate acts as a reseller for your products. According to EU regulations, independent of seller location, all customers located within the EU are required to pay VAT unless they have a valid VAT ID, in which case they are exempt from VAT. The typical VAT rate is 19%, but this may vary from 15% to 25%, depending on the buyer location and Avangate location. All customers outside the EU are exempt from VAT.


Do you have additional taxes for customers?

No



Do you accept payments in a foreign currency?

Yes. 36 currencies



Pricing                   

Standard Pack:  4.9% + 1.95 EUR/2.5 USD   or  8% (minimum charge is 1.95 EUR or 2.5 USD)
Business Pack:  Secret/Not disclosed  (it must be bad since they keep it secret)







How do you deal with VAT?

As of July 1st 2003 we collect VAT from EU consumers who do not have a valid VAT ID for electronically delivered products. We collect this VAT and pay it to the authorities; you do not need to do anything special for this to happen.



Do you have additional taxes for customers?

Yes. We have sales taxes for California customers.



What happens in the case of chargebacks?

Plimus deletes the details of the order. The vendor cannot see the name of the customer and has no clue why the charge back appeared or why the bank accepted the charge back.

Plimus fines the vendor with $25 for each chargeback if the charge back ratio is higher than 0.1%!



What happens if a client returns the products or requests a refund?

Plimus lets the vendor deal with the customer. If the vendor does not solve the issue in 2 weeks it takes over and refunds the order "in order to prevent an imminent chargeback".



Do you accept payments in a foreign currency?

Yes. Over 70 currency types.


WireTrasfer fee!!!!!!!!!

The bank on Plimus' side will charge a fee for wire transfer (up to $30) even if the customer is from EU where the wire transfer fee is 0. The fee will be deducted from the amount the customer sent for the product it purchased. This means that when amount arrives to you it will be short with about $30.


_______________

All started from here:

http://foliovision.com/2010/01/21/paypal-google-checkout-digital-river
http://successfulsoftware.net/2009/10/12/a-survey-of-ecommerce-providers-for-software-vendors/
http://www.huffingtonpost.com/aaron-greenspan/why-i-sued-google-and-won_b_172403.html




____________________

Recommended download:

 

Friday, August 20, 2010

Search engine for Delphi developers

Delphi Developer Search Engine

This page provides a Google search engine specifically focused on Delphi related content. The engine includes over 60 different Delphi related websites and may provide more accurate results than a standard Google search.

Click here

How fast is your compiler?

Is your compiler like this: 



or like this:


?

Thursday, August 12, 2010

Putting the computer in energy-saving mode is evil?

On Toshiba support forum a kid name Bob sustains that two Toshiba tech guys sais to him that is dangerous to keep your computer in stand-by/hibernate mode. So he would like to start his computer faster but without using the stand-by/hibernate mode:
> The holy grail for me would be to use sleep mode 100% but I'm worried that would be too much extra wear and tear on the machine (energy consumption is not a factor). Opinions please.

So, here is my answer to that kid:
Bob is time for you to learn some stuff about computers. Go buy a good book instead of listening to Toshiba support tech.
First of all if you really care about your computer you should care about ENERGY CONSUPTION! Decrease the energy consumption to half and you will expand the life of your computer with few years! How? Simple. Keep your processor to 100% utilization and it will start to overheat so badly that your mainboard will turn brownish. Probably I should post some pictures from my old Toshiba laptop (though this is true for all laptops) to see how the heat affected the main board. The next to fail is the coolers because they will start to work at maximum speed in order to remove the heat from your laptop. Also, don't care about power saving and keep your hard drive always on and if possible, working (transferring data). The heat literally kills the hard drive! The tubes that are illuminating the screen have also a limited life. Keep the laptop on overnight and I guaranty that you will get a "nice" milky/dark image in only one year.
If you have a powerful graphic card just don't use the utility provided and keep it at maximum power all the time (even if you don't play games) and if will suck A LOT of power which of course will generate even more heat.
All these power sucking devices will drain a lot of power from your power adapter which will also start to overheat and when you are mobile they will empty your battery in 50 minutes instead of 2-3 hours. The more discharge/charge cycles you put on your battery the sooner it will die.

So, all this was the receipt for killing your laptop by not carrying about power consumption. Now, if you make your computer to live long (and prosper) all you have to do is to choose a power saving profile in control panel. Just few clicks and you have hundreds of dollars on energy and hundreds of dollar on hardware (you won't have to buy a new laptop next year).

Toshiba, Energy Star program, Microsoft are struggling to make computers to consume less energy for a reason. IF you think you are better than them you should write a book on how bad is to have a eco-friendly computer.

If two Toshiba guys said that conserving power (by keeping the computer in standby/hibernate) is bad (which I doubt) it was just a mistake. Some other people here are saying the other way. Actually my Toshiba laptop has a green button which puts the laptop in eco-friendly mode where the hibernate and stand-by modes are active. In hibernate mode the computer is 100% switched off. In stand-by mode only the RAM is receiving power. However, the RAM does not wear and tear and the power consumption is so small that almost doesn't count.

Wednesday, August 11, 2010

Toshiba Qosmio – Fire hazard

One month ago, my OS froze and I had to forcefully turn off my laptop by pressing the power button for 4 seconds. The laptop cut the power off but after few seconds it rebooted and started to load the BIOS, then Windows. So, I pressed the power button (and keep it pressed for few seconds) again and again. It only rebooted. I just let the laptop to fully load Windows and then turn it off, this time using the 'Turn off computer' software button in Windows.

I was not really bothered by this, so I totally ignored. However, few days ago, in the plain, at landing my OS froze again so again I pushed the power button and put the laptop in my backpack. Some 30 minutes later I heard noise from my back pack. The laptop was running and it was hot like hell. The entire case, including the screen (the lid) was terribly hot and the coolers were screaming while trying to cool the laptop - which was obviously impossible in the well enclosed compartment of the backpack.

So, today I have managed to reproduce the issue. It seems to be a bug in BIOS (Power Management) - a dangerous one. The only way to turn off the laptop is to press the Pause key immediately after you press the Power button. This will make the laptop not to load the OS. Disconnect the power cord (if connected). Flip over the laptop and remove the battery.

Lesson learned:
  • After turning off the laptop, wait next to it and see if it is really off.
  • Don't let your backpack with the laptop inside overnight
  • Don't let the laptop connected to AC overnight (in case of accident, it will not run more than 2 hours)
  • Make sure your house insurance covers you for fire hazard.

This should not be treated lightly. I had a Toshiba laptop before (a Satellite) and I remember that my model was also a fire hazard. Something with the power adapter connector. I think they I think they recalled some power adapters.

Tuesday, July 27, 2010

Make your Firefox faster, safer and stable with few mouse clicks

When did you looked last time in your 'Plugins' section of your Firefox browser?
Did you know that you have lots of garbage there? Many companies such us Microsoft, Yahoo and Google stealthily  install plugins that you don't want and without your permission. Some of them are of GREAT danger for your security. See these 6! articles about a Microsoft plugin:
Java installs also an EXTREMELY dangerous plugin, so dangerous that Firefox decided to automatically disable it!!!!!!!! See this:
Even if you have the upgraded version of that plugin, better disable it especially if you are not a Java developer!


Let's start

Click 'Tools -> Add-ons' and see how many crapware is there? Have you ever install all that crap? No? Then is time to clean it. 
I have done some research to see what is needed and what is safe to disable:

Disable:
  • Yahoo Application State Plugin - it may crash the browser if you have Adblock Plus installed
  • Microsoft DRM Plugin - both of them
  • Windows presentation foundation plugin
  • Google Talk Plugin - if you don't use Google Talk in your browser
  • Google Talk Video Accelerator Plugin
  • Google Update  Plugin 
  • Java development toolkit plugin - Extremely high risk!
Some plugins are adding huge overhead to your browser. After disable them my Firefox loads in only 2 seconds!

    The ONLY plugins that are useful are:
    • Mozilla default Plug-in  -  This is safe
    • Shockwave Flash plug-in  -  This may be unsafe; it has a nasty bug-related history also. Unfortunately we need that dreadful Flash plugin to see SOME websites.
    • Windows media player  -  Needed for some websites that have video content - Very unsafe
    • Java Platform - Needed rarely for some websites - Unsafe







    Monday, July 26, 2010

    DNA sample compression test

    Few months ago I had to run a test to see if DNA sequence files (FASTA files) compress better than simple text files. DNA files contains only 4 characters (A, C, G, T) so you will expect that they will compress really well compared with text files. However, the DNA code is pretty random (well there are some exceptions where the code follows some patterns or have repetitions – but there regions are rare).
    So, here are the results.


    FileSizeuncompSize
    (ZIP comp)
    Size
    (RAR comp)
    FASTA FILE - no cumments (3.0 KB).fasta3.0 KB715 B614 B
    TEXT FILE - random text (3.0 KB).txt3.0 KB1544 B1275 B


    What I have seen later is that if you pack multiple samples (even if they are from relatively different bacteria) together, the compression ratio can be better.




    Test files used in this experiment:

    a) FASTA FILE - no cumments (3.0 KB).fasta

    TGGCGGCGTGCTTAACACATGCAAGTCGAACGAGAAATTCCCTGCTTGCAGGGAAGAGTAAAGTGGCGCA
    CGGGTGAGTAACGCGTGGGTAACCTACCTTTGAATTCGGAATAGCCCGTCGAAAGGTGGATTAATACCGG
    ATACGGTTTAAGGATCTTCGGATTTTTAAATTAAAGGTGACCTCTTCATGAAAGTTGCCGTTCATAGATG
    GGCCCGCGTACCATTAGCTTGTTGGTGGGGTAATGGCCTACCAAGGCGACGATGGTTAGCTGGTCTGAGA
    GGATGATCAGCCACACTGGAACTGGAACACGGTCCAGACTCCTACGGGAGGCAGCAGTGAGGAATTTTGC
    GCAATGGGGGAAACCCTGACGCAGCAACGCCGCGTGAGCGAAGAAGGCCTTCGGGTCGTAAAGCTCTGTC
    AAGTGGGAAAAAAATCTTTTGATGAATAGTTAAAAGACTTGATGGTACCACTGGAGGAAGCACCGGCTAA
    CTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTGTTCGGAATCACTGGGCGTAAAGAGCGT
    GTAGGCGGTTTGACAAGTCAGATGTGAAAGCCCCCGGGCTCAACCCGGGAAGTGCATTTGAAACTGTCTC
    ACTAGAGTATGGGAGAGGAGATTGGAATTCCTGGTGTAGAGGTGAAATTCGTAGATATCAGGAGGAACAC
    CCGTGGCGAAGGCGATTCTCTGGACCAATACTGACGCTGAGACGCGAAAGCGTGGGGAGCAAACAGGATT
    AGATACCCTGGTAGTCCACGCCGTAAACGATGAGAACTAGGTGTAGTGGGTATTGACCCCTGCTGTGCCG
    AAGTTAACGCATTAAGTTCTCCGCCCTGGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGG
    GGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGACGCAACGCGAAGAACCTTACCTGGGTTTGACA
    TCCTTTGACCGTCTGTGAAAGCAGATTTTTCCGGCTTTGCCGGAACAGAGTGACAGGTGCTGCATGGCTG
    TCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCAGCAACGAGCGTAACCCTTGTCTTTAGTTGCCAT
    TATTAAGTTAGGCACTCTAAAGAGACTGCCTCGGTTAACGGGGAGGAAGGTGGGGATGACGTCAAGTCCC
    TCATGGCCTTTATATCCAGGGCTACACACGTGCTACAATGGGCTGTACAAAGGGTTGCTATCCCGCGAGG
    GGGCGCTAATCCCAAAAAGCAGTTCTCAGTTCGGATTGAAGTCTGCAACTCGACTTCATGAAGGTGGAAT
    CGCTAGTAATCGTGGATCAGCATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACAC
    CACGAAAGTCGACTGTACCAGAAGTTGCTGGGCTAACCTTTTCGGAGGAGGCAGGTACCTAAGGTACGGC
    TGGCGGCGTGCTTAACACATGCAAGTCGAACGAGAAATTCCCTGCTTGCAGGGAAGAGTAAAGTGGCGCA
    CGGGTGAGTAACGCGTGGGTAACCTACCTTTGAATTCGGAATAGCCCGTCGAAAGGTGGATTAATACCGG
    ATACGGTTTAAGGATCTTCGGATTTTTAAATTAAAGGTGACCTCTTCATGAAAGTTGCCGTTCATAGATG
    GGCCCGCGTACCATTAGCTTGTTGGTGGGGTAATGGCCTACCAAGGCGACGATGGTTAGCTGGTCTGAGA
    GGATGATCAGCCACACTGGAACTGGAACACGGTCCAGACTCCTACGGGAGGCAGCAGTGAGGAATTTTGC
    GCAATGGGGGAAACCCTGACGCAGCAACGCCGCGTGAGCGAAGAAGGCCTTCGGGTCGTAAAGCTCTGTC
    AAGTGGGAAAAAAATCTTTTGATGAATAGTTAAAAGACTTGATGGTACCACTGGAGGAAGCACCGGCTAA
    CTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTGTTCGGAATCACTGGGCGTAAAGAGCGT
    GTAGGCGGTTTGACAAGTCAGATGTGAAAGCCCCCGGGCTCAACCCGGGAAGTGCATTTGAAACTGTCTC
    ACTAGAGTATGGGAGAGGAGATTGGAATTCCTGGTGTAGAGGTGAAATTCGTAGATATCAGGAGGAACAC
    CCGTGGCGAAGGCGATTCTCTGGACCAATACTGACGCTGAGACGCGAAAGCGTGGGGAGCAAACAGGATT
    AGATACCCTGGTAGTCCACGCCGTAAACGATGAGAACTAGGTGTAGTGGGTATTGACCCCTGCTGTGCCG
    AAGTTAACGCATTAAGTTCTCCGCCCTGGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGG
    GGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGACGCAACGCGAAGAACCTTACCTGGGTTTGACA
    TCCTTTGACCGTCTGTGAAAGCAGATTTTTCCGGCTTTGCCGGAACAGAGTGACAGGTGCTGCATGGCTG
    TCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCAGCAACGAGCGTAACCCTTGTCTTTAGTTGCCAT
    TATTAAGTTAGGCACTCTAAAGAGACTGCCTCGGTTAACGGGGAGGAAGGTGGGGATGACGTCAAGTCCC
    TCATGGCCTTTATATCCAGGGCTACACACGTGCTACAATGGGCTGTACAAAGGGTTGCTATCCCGCGAGG
    GGGCGCTAATCCCAAAAAGCAGTTCTCAGTTCGGATTGAAGTCTGCAACTCGACTTCATGAAGGTGGAAT
    CGCTAGTAATCGTGGATCAGCATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACAC
    CACGAAAGTCGACTGTACCAGAAGTTGCTGGGCTAACCTTTTCGGAGGAGGCAGGTACCTAAGGTACGGC
    CGGTAATTGGGGTGAAGTCGTAACAAGGTATCATTCAGTGATACTCGG


    ----------------------------------------------------------------------------------------------------------------

    Test files used in this experiment:
    b) TEXT FILE - random text (3.0 KB).txt

    Cluster (computing)
    A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.[1]

    Cluster categorizations

    High-availability (HA) clusters
    High-availability clusters (also known as Failover Clusters) are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant nodes, which are then used to provide service when system components fail. The most common size for an HA cluster is two nodes, which is the minimum requirement to provide redundancy. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure.
    There are many commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux OSs.

    Load-balancing clusters
    Load-balancing is when multiple computers are linked together to share computational workload or function as a single virtual computer. Logically, from the user side, they are multiple machines, but function as a single virtual machine. Requests initiated from the user are managed by, and distributed among, all the standalone computers to form a cluster. This results in balanced computational work among different machines, improving the performance of the cluster system.

    Compute clusters
    Often clusters are used for primarily computational purposes, rather than handling IO-oriented operations such as web service or databases. For instance, a cluster might support computational simulations of weather or vehicle crashes. The primary distinction within compute clusters is how tightly-coupled the individual nodes are. For instance, a single compute job may require frequent communication among nodes - this implies that the cluster shares a dedicated network, is densely located, and probably has homogenous nodes. This cluster design is usually referred to as Beowulf Cluster. The other extreme is where a compute job uses one or few nodes, and needs little or no inter-node communication. This latter category is sometimes called "Grid" computing. Tightly-coupled compute clusters are designed for work that might traditionally have been called "supercomputing". Middleware such as MPI (Message Passing Interface) or PVM (Parallel Virtual Machine) permits compute clustering programs to be portable to a wide variety of clusters.

    Grid computing
    Grids are usually computer clusters, but more focused on throughput like a computing utility rather than running fewer, tightly-coupled jobs. Often, grids will incorporate heterogeneous collections of computers, possibly distributed xxxxx

    How we spend Moor dividend

    How we spend Moor dividend


    During the last 28 years, the clock speed increased 586 times. The Intel Pentium processor, introduced in 1995, achieved a SPECint95 benchmark score of 2.9, while the Intel Core 2 Duo achieved a SPECint2000 benchmark score of 3108.0, a 375 times increase in performance in 11 years.
    One indication is that array bounds and null pointer checks impose a time overhead of approximately 4.5% in the Singularity OS.

    high-level programming languages that provide more safety, easy, and higher level of abstraction.  Managed languages (such as VB/C# and Java) further increased the level of programming by introducing garbage collection, richer class libraries (such as .NET and the Java Class Library), just-in-time compilation, and runtime reflection. All these features provide powerful abstractions for developing software but also consume memory and processor resources in nonobvious ways.

    The second is that high-level languages hide details of a machine beneath a more abstract programming model. This leaves developers less aware of performance considerations and less able to understand and correct problems.
    I conducted simple programming experiments to compare the cost of implementing the archetypical Hello World program using C, and C Sharp. Without any comment it’s so clear that higher level of abstraction have affect the performance.



    Today many developers don’t care about the performance, and rely on the hardware, and the other layers performance.


    Abundant machine resources have allowed developers to become lazy, and complacent about performance and refusing the optimization idea and less aware of resource consumption in their code. Bill Gates 30 years ago famously changed the prompt in Altair Basic from “READY” to “OK” to save 5B of memory.
     






    Parallel/distributed databases (raw notes)

    Some raw notes about distributed databases.



    If you want to start with parallel databases then you need some good knowledge about traditional (non-distributed) data bases. Here are some books I personally recommend.:

    Books about regular DB


    http://proquest.safaribooksonline.com/0321290933
    http://proquest.safaribooksonline.com
    http://proquest.safaribooksonline.com/078972569X
    http://proquest.safaribooksonline.com/9781593271909 (For true beginners. You may not want to start working with parallel DB systems if this is the book you are reading now :) )

    ==================================================

    Comparison between several parallel DB systems


    Hive (HBase)
    Developer: Facebook/Apache
    Source code available: Yes
    Free: Yes
    Link :


    PostgreSQL to HBase Replication
    Developer: ?
    Source code available: ?
    Free:


    HadoopDB
    Developer: Daniel Adabi - Yale
    Source code available: Yes
    Free: Yes
    Quote: "Is just an academic prototype"


    Yahoo! Data
    Developer: ?
    Source code available: ?
    Free: Yes


    BigData
    Developer: Google
    Source code available: ?
    Free: ?
    Link : see BigTable paper 2006 http://labs.google.com/papers/bigtable.html


    ----------------------------------

    More about HadoopDB
    1. A hybrid of DBMS and MapReduce technologies targeting analytical query workloads
    2. Designed to run on a shared-nothing cluster of commodity machines, or in the cloud
    3. An attempt to fill the gap in the market for a free and open source parallel DBMS
    4. Much more scalable than currently available parallel database systems and DBMS/MapReduce hybrid systems (see longer blog post).
    5. As scalable as Hadoop, while achieving superior performance on structured data analysis workloads

    Source: http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-shorter.html

    • HadoopDB is primarily focused on high scalability and the required availability at scale.  Daniel questions current MPP’s ability to truly scale past 100 nodes whereas Hadoop has real examples on 3000+ nodes.
    • HadoopDB like many MPP analytical database platforms uses shared nothing relational database as processing units. HadoopDB uses Postgres.  Unlike other MPP databases, HadoopDB uses Hadoop as the distributed mechanism.
    • Daniel doesn’t dispute DeWitt & Stonebrakers (and his) paper which claims Map/Reduce underperforms when compared to current MPP DBMSHadoopDB however is focused on massive scale, hundreds or thousands of nodes.  Currently the largest MPP database we know of is 96 nodes.
    • Early benchmarking shows HadoopDB outperforms Hadoop but is slower than current MPP databases under normal circumstances.  However when simulating node failure mid query HadoopDB outperformed current MPP databases significantly.
    • The higher the scalability the higher the possibility of node failure mid query.  Very large Hadoop deployments may experience at least 1 node failure per query (job).
    • HadoopDB is usable today, but should not be considered an “out of the box” solution.  HadoopDB is an outcome from a database research initiative, not a commercial venture.  Anyone planning to use HapoopDB will require the appropriate systems & development skills to effectively deploy.



    Hadoop DB - How it works

    Database Connector
    The Database Connector is the interface between independent database systems residing on nodes in the cluster and TaskTrackers.

    Catalog
    The catalog maintains metainformation about the databases. This includes the following: (i) connection parameters such as database location, driver class and credentials, (ii) metadata such as data sets contained in the cluster, replica locations, and data partitioning properties.

    Data Loader
    The Data Loader is responsible for (i) globally repartitioning data on a given partition key upon loading, (ii) breaking apart single node data into multiple smaller partitions or chunks and (iii) finally bulk-loading the single-node databases with the chunks.

    SQL to MapReduce to SQL (SMS) Planner
    HadoopDB provides a parallel database front-end to data analysts enabling them to process SQL queries. The SMS planner extends Hive. Hive transforms HiveQL, a variant of SQL, into MapReduce jobs that connect to tables stored as files in HDFS.

    Since each table is stored as a separate file in HDFS, Hive assumes no collocation of tables on nodes. Therefore, operations that involve multiple tables usually require most of the processing to occur in the Reduce phase of a MapReduce job. This assumption does not completely hold in HadoopDB as some tables are collocated and if partitioned on the same attribute, the join operation can be pushed entirely into the database layer.

    Quote: "Hadoop simply scales better than any currently available parallel DBMS product."



    Final words.
    So you need to use a parallel database? Here were the choices I had for my project:

    1. Purchase a parallel DB like Greenplum and Vertica 
    Price: $250K. 
    http://www.dbms2.com/2008/02/07/vertica-update-2
    Thoughts: Everything about this solution is nice except the price.

    2. Reduce the amount of data that DB system must process. For this: Use the existent DB (MySQL). Write the results from Blast MapReduce jobs to disk and then use a script to upload them to DB. This way we won't flood the DB with too much data. 
    Thoughts:  Cheap, some programming required. Not a definitive solution.

    3. Use the DB engine to perform the SQL searches then throw away the data from DB.
    Thoughts:  Cheap, smart, some programming required. Not a definitive solution.

    4. Use the DB provided by Hadoop -> HBase/Hive. It is slower but more computers can be used to improve speed.
    Thoughts: Cheap (actually free). Unstable (Hadoop is early beta). Difficult to install and maintain. 


    Parallel programming (raw notes)

    Here are some raw notes and books I recommend about parallel programming that I used recently for a project. You will find really interesting that the definition of "parallel computing", "cluster", "cloud computing" are extremely loose. Each book will define the terms in a very different way.




    Definitions

    Parallel computing = program parts running simultaneously on multiple processors in the same computer.


    Distributed computing = a form of parallel computing but in multiple computers. Distributed computing differs from cluster computing in that computers in a distributed computing environment are typically not exclusively running "group" tasks, whereas clustered computers are usually much more tightly coupled. Distributed computing also often consists of machines which are widely separated geographically.


    Grid computing = uses the resources of many separate computers, loosely connected (needs little or no inter-node communication), by a network usually the Internet. Grid computing is optimized for workloads which consist of many independent jobs or packets of work, which do not have to share data between the jobs during the computation process.
    CPU-scavenging, creates a “grid” from the unused resources in a network of participants


    Computer cluster = a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks


    Parallel Virtual Machine = The Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor.


    Message Passing Interface (MPI) is a specification for an API that allows many computers to communicate with one another. It is used in computer clusters and supercomputers. MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today.
    Documentatie: Chapter 1. Introduction to Parallel Programming - http://www.redbooks.ibm.com/redbooks/pdfs/sg245380.pdf


    Amdahl's law

    The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a particular portion of 1 hour cannot be parallelized, while the remaining promising portion of 19 hours (95%) can be parallelized, then regardless of how many processors we devote to a parallelized execution of this program, the minimal execution time cannot be less than that critical 1 hour. Hence the speed up is limited up to 20x, as the diagram illustrates.



    Beowulf (computer cluster)

    Beowulf is a multi-computer architecture which can be used for parallel computations. It is a system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other network. It is a system built using commodity hardware components, like any PC capable of running a Unix-like operating system, with standard Ethernet adapters, and switches. It does not contain any custom hardware components and is trivially reproducible. Beowulf also uses commodity software like the Linux or Solaris operating system, Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The server node controls the whole cluster and serves files to the client nodes. It is also the cluster's console and gateway to the outside world. Large Beowulf machines might have more than one server node, and possibly other nodes dedicated to particular tasks, for example consoles or monitoring stations. In most cases client nodes in a Beowulf system are dumb, the dumber the better. Nodes are configured and controlled by the server node, and do only what they are told to do. In a disk-less client configuration, client nodes don't even know their IP address or name until the server tells them what it is.
    The typical setup of a beowulf cluster


    One of the main differences between Beowulf and a Cluster of Workstations (COW) is the fact that Beowulf behaves more like a single machine rather than many workstations. In most cases client nodes do not have keyboards or monitors, and are accessed only via remote login or possibly serial terminal. Beowulf nodes can be thought of as a CPU + memory package which can be plugged in to the cluster, just like a CPU or memory module can be plugged into a motherboard. Beowulf is not a special software package, new network topology or the latest kernel hack. Beowulf is a technology of clustering computers to form a parallel, virtual supercomputer. Although there are many software packages such as kernel modifications, PVM and MPI libraries, and configuration tools which make the Beowulf architecture faster, easier to configure, and much more usable, one can build a Beowulf class machine using standard Linux distribution without any additional software. If you have two networked computers which share at least the /home file system via Network File System (protocol), and trust each other to execute remote shells (rsh), then it could be argued that you have a simple, two node Beowulf machine.





    Parallel programming books

    Introduction to Parallel Computing, 2003
     Introduction to Parallel Computing, Second Edition 
    Very good


    Principles of Concurrent and Distributed Programming, 2006
    Principles of Concurrent and Distributed Programming, Second Edition 
    http://proquest.safaribooksonline.com/9780321312839




    Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, and Storage Solutions 

    Price: 27 pounds
    2004